Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

EGI-InSPIRE:SA1.7-QR4

From EGIWiki
Jump to navigation Jump to search

1. Task Meetings

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome
26-01-2010 https://www.egi.eu/indico/conferenceDisplay.py?confId=315 CODOC meeting with COO https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=315
26-01-2010 https://www.egi.eu/indico/conferenceDisplay.py?confId=314 CODOC https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=314

2. Main Achievements

Grid Oversight

1. ROD teams news letter

The transition from EGEE to EGI InSPIRE came about with a lot of changes. For Operations, the EGEE Regional Operations Centres, called ROCs, are in the process of being dismantled and their responsibilities transferred to the NGIs, or have already completed this process. In the EGI era, ROD teams will monitor the quality of sites in their country or region, whereas COD is responsible for the global oversight over the whole EGI infrastructure. This is to provide a high-quality grid infrastructure to the user communities. These changes have also leaded us to think about how COD and ROD are going to interact with each other in this new setting. During the Grid Oversight session at the EGI Tech Forum it was made clear to us that people find it cumbersome to travel in order to have regular face to face meetings. Nevertheless, we do feel the need to create and maintain a coherent and alive Grid Oversight community and to have interaction between ROD and COD that goes beyond the dashboards. This is necessary, in our view, to create a top-quality grid infrastructure for our users. For this reason we have created this newsletter. The purpose of this newsletter is to inform you about recent and upcoming developments related to Grid Oversight and to show to you the metrics indicating how well we did the past month. It is our intention to publish a newsletter every month.

2. Input given on approved Procedures

New NGI creation process coordination The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for integrating a NGI (or a group of NGIs) into the EGI operational structure. The newest version became effective as of Dec 1st .

Operations Centre decommission The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for decommission of an Operations Centre. This procedure became effective as of Dec 1st.

COD escalation procedure The purpose of this document is to define an escalation procedure for operational problems. The newest version became effective as of Dec 1st. This procedure is essential for ROD work and we encourage you to read it.

Making a Nagios test an operations test The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for making a Nagios tests an operations test. A Nagios test is set as operations test to enable the operations dashboard to display an alarm in case the test fails. This procedure will become effective as of Jan 1st.

3. Renaming of "critical" tests

“Operations test” should be used for tests raising alarms for ROD. Recently it was decided that a new name should be assigned to a test which is raising alarms in operations dashboard. COD used to call it “critical test” but it was causing confusion with critical Nagios test status. In a poll the name which gained the majority was “operations test”.

Network Support

3. Issues and Mitigation

Issue Description Mitigation Description
Grid Oversight: None

4. Plans for the next period

Grid Oversight

1. Continue ROC transition to NGIs.

2. Initiate investigation on how to have a consistent and coherent integration of nonproduction resources in the infrastructure.

3. Initiate investigation of the impact on operations support model related to new middlewares in EGI.

4. Initiate the investigation on how to improve availability and reliability metrics.

5. Evaluation of upcoming new releases of the operational dashboard.

Network Support