Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

EGI-InSPIRE:SA1.4-QR7

From EGIWiki
Jump to navigation Jump to search
EGI Inspire Main page


Inspire reports menu: Home SA1 weekly Reports SA1 Task QR Reports NGI QR Reports NGI QR User support Reports



1. Task Meetings

There are no specific SA1.4 meetings. It was agreed to discuss all deployment issues with operational tool representatives at the JRA1 meetings. Below is the list of JRA1 meetings and subjects relevant for SA1.4 which were discussed.

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome
03/11/2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=670 InSPIRE-JRA1 phone conf Deployment of operational tools next releases.
17/11/2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=690 InSPIRE-JRA1 phone conf Synchronization between SAM and OPS.
15/11/2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=707 InSPIRE-JRA1 phone conf Integration of GOCDB scoping in SAM. Report on deployment of Nagios instance for Fedcloud task force.
19/12/2011 A/R probe meeting Discussion about probe for site A/R monitoring.
12/1/2013 https://www.egi.eu/indico/conferenceDisplay.py?confId=724 InSPIRE-JRA1 phone conf Report on under performing sites probe.

2. Main Achievements

Two new versions of ActiveMQ were deployed on the production broker network: 5.5 on November 29th and 5.5.1 on January 30/31st. The following changes were implemented with new versions:

  • Camel routes switched off, SAM to use wildcard subscriptions instead.
  • Automatic closing of inactive STOMP connections after one hour. Inactive connections were overloading brokers. Clients will reconnect automatically.

AUTH partner provided protected wiki pages with documentation and configuration description (https://trac.hellasgrid.gr/trac/broker-network.egi.eu/wiki/broker-network%20configuration).

GOCDB version 4.2 was released on November 25th. Major change in this version was scoping of sites and service endpoints into EGI and Local categories. Sites and service endpoints marked as being part of the EGI grid are exposed to the central operational tools while Local entities are not considered part of the EGI infrastructure. Detailed list of new features can be found in JRA1 section.

Two new versions of Operations portal were deployed in this quarter: 2.7 on November 9th and 2.8 on December 21st. Detailed list of new features can be found in JRA1 section. At the end of the quarter there were four NGI instances: NGI_BY, NGI_CZ, NGI_GRNET and NGI_IBERGRID. Decommission of the old CIC portal (cic.egi.eu) was postponed and is planned for the May 2012. List of operations tests was extended on January 2nd with the following tests:

  • org.nagios.BDII-Check
  • org.sam.CREMCE-DirectJobSubmit
  • hr.srce.LB-CertLifetime
  • hr.srce.MyProxy-Store
  • org.nagios.GridFTP-Check
  • org.sam.WMS-JobSubmit.

Operations portal team started testing transition from topic to virtual destination in order to improve synchronization between SAM instances and Operations portal.

One new versions of SAM was deployed in this quarter: SAM-Update15 on November 29th. At the end of the quarter following SAM/Nagios instances were in production:

  • 26 NGI instances covering 37 EGI partners
  • 2 ROC instances covering 2 EGI partners
  • 1 project instances covering 1 EGI partners
  • 3 external ROC instances covering the following regions: Canada, IGALC and LA.

Detailed list of SAM/Nagios instances can be found on the following page: SAM Instances.

3. Issues and Mitigation

Issue Description Mitigation Description
High availability of central operational tools is needed. GOCDB: dynamic loadbalancing DNS setup is provided for the address goc.egi.eu. Secondary instance in Fraunhofer institute is still being deployed. Delay is caused by the development and deployment of the new GOCDB version.
Monitoring of underperforming sites. COD team has proposed monitoring of availability and reliability of sites. In case of decreased A/R alarm would be raised against the site. Such approach would enable sites to correct A/R figures before the end of the month and stay within OLA thresholds. Discussions have started on defining implementation details.

4. Plans for the next period

The decommission of the old CIC Portal is planned for May 2012.

Security Dashboard will be released to production in the next quarter. VO Dashboard will be released to production between March and April 2012.

Deployment of the refactored Operations Dashboard is planned for April / May 2012.

Track and perform planned tests of failover configurations of centralized tools.

Deployment of the new SAM instance dedicated for monitoring operational tools with the new probes provided by operational tools developers.

Integration of DesktopGrids resources into EGI infrastructure.