EGI-InSPIRE:SA1.4-QR6
1. Task Meetings
There are no specific SA1.4 meetings. It was agreed to discuss all deployment issues with operational tool representatives at the JRA1 meetings. Below is the list of JRA1 meetings and subjects relevant for SA1.4 which were discussed.
Date (dd/mm/yyyy) | Url Indico Agenda | Title | Outcome |
---|---|---|---|
01/09/2011 | https://www.egi.eu/indico/conferenceDisplay.py?confId=577 | InSPIRE-JRA1 phone conf | Regionalization plans for all tools. |
15/09/2011 | https://www.egi.eu/indico/conferenceDisplay.py?confId=608 | InSPIRE-JRA1 phone conf | Metric portal status. Technical forum planning. |
20/09/2011 | https://www.egi.eu/indico/sessionDisplay.py?sessionId=78&confId=452#20110920 | Operations Tools and Availability Calculation (EGI TF) | Dedicated EGI TF session on operational tools monitoring and availability calculation. |
30/09/2011 | https://www.egi.eu/indico/conferenceDisplay.py?confId=648 | A/R calculation TF session follow up | Continued discussion on availability and reliability calculation. |
20/10/2011 | https://www.egi.eu/indico/conferenceDisplay.py?confId=608 | InSPIRE-JRA1 phone conf | Status of VO SAM instance support. |
2. Main Achievements
Two new versions of Operations portal were deployed in this quarter: 2.6.3 on August 5th and 2.6.4 on September 29th. Detailed list of new features can be found in JRA1 section. At the end of the quarter there were four NGI instances: NGI_BY, NGI_CZ, NGI_GRNET and NGI_IBERGRID. Decommission of the old CIC portal (cic.egi.eu) was postponed and is planned for the next quarter.
Two new versions of SAM were deployed in this quarter: SAM-Update13 on September 7th and SAM-Update14 on October 22th. SAM/Nagios deployment of NGI instances continued. As part of the NGI UK creation UKI ROC SAM instance was switched to NGI instance covering two NGIs: NGI_IE (Ireland) and the new NGI_UK. At the end of the quarter following SAM/Nagios instances were in production:
- 26 NGI instances covering 37 EGI partners
- 2 ROC instances covering 2 EGI partners
- 1 project instances covering 1 EGI partners
- 3 external ROC instances covering the following regions: Canada, IGALC and LA.
Detailed list of SAM/Nagios instances can be found on the following page: SAM Instances.
Development of the new SAM instance for operational tools monitoring started. The first step was reorganization of operational tools in the GOCDB:
- all central operational tools are grouped in EGI.eu group
- new service types were added for each operational tool (https://rt.egi.eu/rt/Ticket/Display.html?id=2587)
- all regional operational tools instances are associated with sites.
Additional details can be found in the following slides: https://www.egi.eu/indico/conferenceDisplay.py?confId=549. This reorganization will enable automatic bootstrap of SAM instance for operational tools and integration with MyEGI web interface and ACE system for A/R calculation.
Reorganization of NGI core services in the GOCDB was proposed at the OMB (https://www.egi.eu/indico/conferenceDisplay.py?confId=615). This reorganization will enable NGI-level A/R calculation.
During the EGI Technical Forum in Lyon several sessions related to operational tools were organized. The most important one was "Operations Tools and Availability Calculation" (https://www.egi.eu/indico/sessionDisplay.py?sessionId=78&confId=452#20110920). The main topics were:
- Presentation of SAM and ACE architecture with discussion about NGI and EGI core services ACE profiles. This discussion was continued on dedicated session on September 30th.
- Presentation of the new SAM instance for operational tools monitoring.
- Discussion about issue of UNKNOWN status (https://www.egi.eu/indico/contributionDisplay.py?contribId=395&confId=452).
Several side meetings were held at the EGI TF:
- Meeting with EMI and SAM representatives where the integration of EMI probes and the future SAM release process were discussed.
- Meeting with EDGI representatives where integration of DesktopGrids resources into EGI infrastructure was discussed.
3. Issues and Mitigation
Issue Description | Mitigation Description |
---|---|
High availability of central operational tools is needed. | GOCDB: dynamic loadbalancing DNS setup is provided for the address goc.egi.eu. Secondary instance in Fraunhofer institute is still being deployed. Delay is caused by the development and deployment of the new GOCDB version. |
Monitoring of underperforming sites. | COD team has proposed monitoring of availability and reliability of sites. In case of decreased A/R alarm would be raised against the site. Such approach would enable sites to correct A/R figures before the end of the month and stay within OLA thresholds. |
4. Plans for the next period
Decommission of the old CIC portal (cic.egi.eu) is planned for the next quarter.
Track and perform planned tests of failover configurations of centralized tools.
Deployment of the new SAM instance dedicated for monitoring operational tools with the new probes provided by operational tools developers.