EGI-InSPIRE:SA1.4-QR7
1. Task Meetings
Date (dd/mm/yyyy) | Url Indico Agenda | Title | Outcome |
---|---|---|---|
2. Main Achievements
Two new versions of ActiveMQ were deployed on the production broker network: 5.5 on November 29th and 5.5.1 on January 30/31st. The following changes were implemented with new versions:
- Camel routes switched off, SAM to use wildcard subscriptions instead.
- Automatic closing of inactive STOMP connections after one hour. Inactive connections were overloading brokers. Clients will reconnect automatically.
Two new versions of Operations portal were deployed in this quarter: 2.7 on November 9th and 2.8 on December 21st. Detailed list of new features can be found in JRA1 section. At the end of the quarter there were four NGI instances: NGI_BY, NGI_CZ, NGI_GRNET and NGI_IBERGRID. Decommission of the old CIC portal (cic.egi.eu) was postponed and is planned for the May 2012. List of operations tests was extended on January 2nd with the following tests:
- org.nagios.BDII-Check
- org.sam.CREMCE-DirectJobSubmit
- hr.srce.LB-CertLifetime
- hr.srce.MyProxy-Store
- org.nagios.GridFTP-Check
- org.sam.WMS-JobSubmit.
One new versions of SAM was deployed in this quarter: SAM-Update15 on November 29th. At the end of the quarter following SAM/Nagios instances were in production:
- 26 NGI instances covering 37 EGI partners
- 2 ROC instances covering 2 EGI partners
- 1 project instances covering 1 EGI partners
- 3 external ROC instances covering the following regions: Canada, IGALC and LA.
Detailed list of SAM/Nagios instances can be found on the following page: SAM Instances.
3. Issues and Mitigation
Issue Description | Mitigation Description |
---|---|
High availability of central operational tools is needed. | GOCDB: dynamic loadbalancing DNS setup is provided for the address goc.egi.eu. Secondary instance in Fraunhofer institute is still being deployed. Delay is caused by the development and deployment of the new GOCDB version. |
Monitoring of underperforming sites. | COD team has proposed monitoring of availability and reliability of sites. In case of decreased A/R alarm would be raised against the site. Such approach would enable sites to correct A/R figures before the end of the month and stay within OLA thresholds. Discussions have started on defining implementation details. |