Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI-InSPIRE:SA1.7-QR4"

From EGIWiki
Jump to navigation Jump to search
Line 84: Line 84:


== Network Support ==
== Network Support ==
The main achievements are the outcome of the EGI Network Support proposal Task Force, i.e. a structured proposal around seven identified Use Cases formalized, and discussed on the face to face meeting in Amsterdam on January 24, 2011.
( https://www.egi.eu/indico/conferenceTimeTable.py?confId=153#20110124)
The community has been introduced to the seven use cases:  the GGUS Support System, the PERT team,  Scheduled maintenances, Network troubleshooting on demand, e2e Scheduled Monitoring,  DownCollector, Policy and Collaboration.
For each one of them the specific proposal from the task force has been described and discussed within the EGI operations community.
The proposal from the TF has been based on a previously distributed questionnaire to the NGIs.
Results are published on the EGI Operations Wiki on  https://wiki.egi.eu/wiki/NST.
A roadmap ahed has been agreed upon for each one of them.
In particular the task committed to:
1)  set up a Network Support unit within GGUS for Network Related issues, and GARR has agreed to start exploiting the provisioning of the corresponding required effort (voluntarily) , at least in a prelimiray way, in order to assess its loing term sustainability and reconfirm this committment in the next months. The GGUS workflow has been identified and agreed.
2) Provide, maintain and support a Network Troubleshooting tool on demand, called HINTS, voluntarily provided (unfunded) by the French NGI. A central HINTS server instance will be made available at GARR and the French NGI will start a pilot deployment of the tool  after the central server will be made available.
3) Provide and maintain a perfSONAR-based live CD distribution for on demand and scheduled e2e monitoring, based on perfSONAR-MDM, voluntarily contributed by the Spanish NGI and NREN  RedIRIS. Later on, a dedicated GUI will be made available. Historical Data will be stored in a DB.
4) Keep a permanent liaison with the GN3 PerfSONAR communuity, and assess the tools provide by pS, provide feedback to the GN3 community. Periodically reporting about the new features and progress around the pS based tools.
5) Further refine the NetJobs tool w.r.t. provided functionality and usability of the Web Interface, providing a central server instance at GARR; promote the tool within the EGI Net Sup operations comminity, especially for the basic metrics ( n.hops, RTT, available bandwidth).
6)  Organize a general questionnaire for the NRENs, aimed at better understanding their interaction model with the NGIs, the best practices, the tools they are familair with, and asking about theri availability to provide a PERT contact point for the EGI project.


= 3. Issues and Mitigation =
= 3. Issues and Mitigation =

Revision as of 16:41, 28 April 2011

1. Task Meetings

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome
26-01-2010 https://www.egi.eu/indico/conferenceDisplay.py?confId=315 CODOC meeting with COO https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=315
26-01-2010 https://www.egi.eu/indico/conferenceDisplay.py?confId=314 CODOC https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=314
01-12-2010 https://www.egi.eu/indico/conferenceDisplay.py?confId=227 CODOC https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=227
15-12-2010 https://www.egi.eu/indico/conferenceDisplay.py?confId=235 CODOC https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=235
13-01-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=273 CODOC https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=273
19-01-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=249 CODOC https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=249
14-01-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=271 EGI Net Sup proposal Task Force meeting nr. 5 https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=271
10-12-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=226 EGI Net Sup proposal Task Force meeting nr. 4
22-11-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=222 EGI Net Sup proposal Task Force meeting nr. 3 https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=222
10-11-2011 EGI Net Sup proposal Task Force meeting nr. 2

2. Main Achievements

Grid Oversight

1. ROD teams news letter

The transition from EGEE to EGI InSPIRE came about with a lot of changes. For Operations, the EGEE Regional Operations Centres, called ROCs, are in the process of being dismantled and their responsibilities transferred to the NGIs, or have already completed this process. In the EGI era, ROD teams will monitor the quality of sites in their country or region, whereas COD is responsible for the global oversight over the whole EGI infrastructure. This is to provide a high-quality grid infrastructure to the user communities. These changes have also leaded us to think about how COD and ROD are going to interact with each other in this new setting. During the Grid Oversight session at the EGI Tech Forum it was made clear to us that people find it cumbersome to travel in order to have regular face to face meetings. Nevertheless, we do feel the need to create and maintain a coherent and alive Grid Oversight community and to have interaction between ROD and COD that goes beyond the dashboards. This is necessary, in our view, to create a top-quality grid infrastructure for our users. For this reason we have created this newsletter. The purpose of this newsletter is to inform you about recent and upcoming developments related to Grid Oversight and to show to you the metrics indicating how well we did the past month. It is our intention to publish a newsletter every month.

2. Input given on approved Procedures

New NGI creation process coordination The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for integrating a NGI (or a group of NGIs) into the EGI operational structure. The newest version became effective as of Dec 1st .

Operations Centre decommission The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for decommission of an Operations Centre. This procedure became effective as of Dec 1st.

COD escalation procedure The purpose of this document is to define an escalation procedure for operational problems. The newest version became effective as of Dec 1st. This procedure is essential for ROD work and we encourage you to read it.

Making a Nagios test an operations test The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for making a Nagios tests an operations test. A Nagios test is set as operations test to enable the operations dashboard to display an alarm in case the test fails. This procedure will become effective as of Jan 1st.

3. Renaming of "critical" tests

“Operations test” should be used for tests raising alarms for ROD. Recently it was decided that a new name should be assigned to a test which is raising alarms in operations dashboard. COD used to call it “critical test” but it was causing confusion with critical Nagios test status. In a poll the name which gained the majority was “operations test”.

Network Support

3. Issues and Mitigation

Issue Description Mitigation Description
Grid Oversight: None

4. Plans for the next period

Grid Oversight

1. Continue ROC transition to NGIs.

2. Initiate investigation on how to have a consistent and coherent integration of nonproduction resources in the infrastructure.

3. Initiate investigation of the impact on operations support model related to new middlewares in EGI.

4. Initiate the investigation on how to improve availability and reliability metrics.

5. Evaluation of upcoming new releases of the operational dashboard.

Network Support