Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @


From EGIWiki
Jump to navigation Jump to search
EGI Inspire Main page

Inspire reports menu: Home SA1 weekly Reports SA1 Task QR Reports NGI QR Reports NGI QR User support Reports

1. Task Meetings

There are no specific SA1.4 meetings. It was agreed to discuss all deployment issues with operational tool representatives at the JRA1 meetings. Below is the list of JRA1 meetings and subjects relevant for SA1.4 which were discussed.

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome
11/11/2010 InSPIRE-JRA1 phone conf SAM Update-06 release analysis. Operations portal release.
02/12/2010 InSPIRE-JRA1 phone conf Adopting GLUE 2.0 naming in GOCDB discussion.
22/12/2010 InSPIRE-JRA1 phone conf Operational tools deployment analysis (Operational tools deployment plans). Operations portal release.
20/01/2011 InSPIRE-JRA1 phone conf Finalizing migration from to
26/01/2011 InSPIRE-JRA1 f2f in Amsterdam Operational tools milestones and regionalization plans.

2. Main Achievements

Revision of deployment plans of NGI instances of individual operational tools was performed. Information was collected from MS406, MS703, EGEE-III DNA1.6.2 documents and direct response from NGIs. Responses were tracked through the following RT ticket: RT #831.

All operational tools were assigned addresses in domain. It was agreed that all tools will correct URLs in their code to point to the addresses. Decommission of domain was scheduled for March 14th 2011. Further details can be found in the RT ticket:

Two new version of Operations portal were released in this quarter: 2.4 on November 17th and 2.4.1 on December 16th. Detailed list of new features can be found in JRA1 section. One new NGI instance of Operations portal was deployed in Belarus NGI (NGI_BY). At the end of the quarter there were four NGI instances: NGI_BY, NGI_CZ, NGI_GRNET and NGI_IBERGRID.

SAM/Nagios deployment of NGI instances continued. Two big ROCs finalized migration to NGI instances:

  • Northern Europe: NGI NDGF finalized validation of NGI instance on January 25th 2011
  • Southeast Europe (9 NGIs):
    • Romania (NGI_RO): validated on November 19th 2010
    • Cyprus (NGI_CYGRID): validated on December 7th 2010
    • Georgia (NGI_GE): monitoring was taken over by Serbian NGI (NGI_AEGIS) on December 13th 2010
    • Macedonia (NGI_MARGI): validated on December 20th 2010
    • Bosnia and Herzegovina (NGI_BA): validated on January 10th 2011
    • Montenegro (NGI_ME): validated on January 12th 2011
    • Bulgaria (NGI_BG): validated on January 13th 2011
    • Armenia (NGI_ARMGRID): validated on January 18th 2011
    • Israel (NGI_IL): validated on January 19th 2011

At the end of the quarter following SAM/Nagios instances were in production:

  • 23 NGI instances covering 34 EGI partners
  • 3 ROC instances covering 4 EGI partners
  • 2 project instances covering 2 EGI partners
  • 3 external ROC instances covering the following regions: Canada, IGALC and LA.

Detailed list of SAM/Nagios instances can be found on the following page: SAM Instances.

Accounting enforcement section of accounting portal was obsoleted when new APEL tests were integrated into SAM/Nagios. The enforcement section was decommissioned on December 21st 2010.

Monitoring of sites is performed by using OPS virtual organization. At the face to face OMB in Amsterdam it was decided that all services will be monitored by OPS VO (see details in the following talk). At the end of 2010 it was agreed that CERN will continue running the VOMRS service and that the management of VO will be transferred to EGI. At the OMB in Amsterdam it was agreed that VO managers will be Emir Imamagic and Peter Solagna. Initial plan was that there will be a manager per NGI, equivalent to dteam VO. At the OMB it was concluded that this schema is too heavyweight as each NGI can have only 2 DNs registered in VO. Decision was made that all operations will be performed by the two VO managers.

Work on three procedures relevant for operational tools started:

  • Procedure for unscheduled downtimes of central operations tools - defines uniform way of announcing of outages of central operations tools. Details can be found in the RT ticket: RT #537
  • Procedure for adding new probes to SAM release - defines steps needed for inclusion of new probes into SAM. Details can be found in the RT ticket: RT #1051
  • Procedure for modification of Availability tests - defines steps needed for inclusion of new tests to group of availability tests used for A/R calculations. Details can be found in the RT ticket: RT #1052

Drafts of all three procedures were presented at the face to face OMB in Amsterdam. Talks can be found on the following page.

The following wiki pages relevant for operational tools were created:

3. Issues and Mitigation

Issue Description Mitigation Description
High availability of central operational tools is needed. GOCDB: dynamic loadbalancing DNS setup is provided for the address, secondary instance will be set up in Fraunhofer institute in the next quarter.
SAM: April release of SAM will contain option to install secondary instance, this will be deployed based on depending on NGI size and resources.
Operations, accounting portal and metrics portal: services are deployed on virtualization platforms, backups performed regularily, recovery in case of failure can be performed quickly.
GOCDB database hardware issues. GOCDB has recently experienced problems caused by bad database hardware. On January 27th new instance was deployed and all tools were requested to validate the test instance. As the validation was successful migration of GOCDB to new hardware was scheduled for February 2nd.

4. Plans for the next period

Central MyEGI instance which provides access to data from all NGIs will be deployed at CERN. In addition SAM team will provide specific version of SAM which will enable easy installation of such central MyEGI instance. This activity will be finalized by the end of February 2011.

GOCDB will be migrated to new hardware on February 2nd (see Issues above).

Decommission of domain is scheduled for March 14th 2001. All addresses have already been migrated. In case of any issues reported by external tools this date will be moved, but not later than end of March.

Deploy correct web certificates on all central operational tools for the new addresses in order to avoid web browser certificate pop-up problem. This activity will be finalized before the decommission of domain.

Decommission of the old CIC portal ( will be performed between April and June 2011 depending on development of the new Operations portal. The main remaining functionalities which need to be migrated to Operations Portal are broadcast and VO ID cards.

Procedures related to operational tools will be finalized and presented for approval at the OMB in the next quarter.

Contribute and follow discussions of the new task force on regionalization. Update deployment plans of individual NGI instances of tools which will provide regionalized versions in the following period.

Track deployment and validation of remaining regional and NGI Nagioses. Deployment plans of the remaining NGIs are the following:

  • UK and Ireland plan to perform NGI creation in the next quarter.
  • Asia Pacific ROC Nagios instance has been validated and finalization is planned for the next quarter. For details see GGUS #57154

Track deployment of other operational tools according to their roadmap.

Track development of probes for monitoring operational tools and integration into ops-monitor Nagios instance.

Track and perform planned tests of failover configurations of centralized tools. The ideal customer is GOCDB which will implement failover in the next quarter.