Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI-InSPIRE:SA1.4-QR4"

From EGIWiki
Jump to navigation Jump to search
(Created page with '__NOTOC__ = 1. Task Meetings = There are no specific SA1.4 meetings. It was agreed to discuss all deployment issues with operational tool representatives at the JRA1 meetings. B…')
 
 
(19 intermediate revisions by 2 users not shown)
Line 1: Line 1:
__NOTOC__
{{Template:EGI-Inspire menubar}}
 
{{Template:Inspire_reports_menubar}}
{{TOC_right}}
= 1. Task Meetings =
= 1. Task Meetings =


Line 14: Line 17:
! style="width: 35%" | Outcome
! style="width: 35%" | Outcome
|-
|-
|11/11/2010
|17/02/2011
|https://www.egi.eu/indico/conferenceDisplay.py?confId=209
|https://www.egi.eu/indico/conferenceDisplay.py?confId=352
|InSPIRE-JRA1 phone conf
|InSPIRE-JRA1 phone conf
|SAM Update-06 release analysis. Operations portal release.
|SAM Update-09 release analysis and SAM DMSU activity setup.
|-
|-
|02/12/2010
|16/03/2011
|https://www.egi.eu/indico/conferenceDisplay.py?confId=210
|https://www.egi.eu/indico/conferenceDisplay.py?confId=427
|InSPIRE-JRA1 phone conf
|InSPIRE-JRA1 phone conf
|Adopting GLUE 2.0 naming in GOCDB discussion. 
|Operational tools updates deployment.
|-
|-
|22/12/2010
|28/04/2010
|https://www.egi.eu/indico/conferenceDisplay.py?confId=242
|https://www.egi.eu/indico/conferenceDisplay.py?confId=426
|InSPIRE-JRA1 phone conf
|InSPIRE-JRA1 phone conf
|Operational tools deployment analysis ([[Operational tools deployment plans]]). Operations portal release.
|Handling of nodes in non-production state.
|-
|20/01/2011
|https://www.egi.eu/indico/conferenceDisplay.py?confId=212
|InSPIRE-JRA1 phone conf
|Finalizing migration from gridops.org to egi.eu.
|-
|26/01/2011
|https://www.egi.eu/indico/conferenceDisplay.py?confId=244
|InSPIRE-JRA1 f2f in Amsterdam
|Operational tools milestones and regionalization plans.
|-
|-
|}
|}
Line 46: Line 39:
-->
-->


Deployment plans of NGI instances of individual operational tools were finalized. Relevant ticket ([https://rt.egi.eu/rt/Ticket/Display.html?id=831 RT #831])
GOCDB was migrated to the new platform at February 2nd. After the migration GOCDB service has been working without outages. GOCDB failover instance deployment started. Process was not finalized in this quarter and it is planned to be finished at the beginning of the next quarter.
Revision of deployment plans of NGI instances of individual operational tools was performed. Information was collected from MS406, MS703, EGEE-III DNA1.6.2 documents and direct response from NGIs. Responses were tracked through the following RT ticket: [https://rt.egi.eu/rt/Ticket/Display.html?id=831 RT #831].


All operational tools were assigned addresses in egi.eu domain. It was agreed that all tools will correct URLs in their code to point to the egi.eu addresses. Decommission of gridops.org domain was scheduled for March 14th 2011. Further details can be found in the RT ticket: https://rt.egi.eu/rt/Ticket/Display.html?id=187.
Three new versions of Operations portal were deployed in this quarter: 2.5 on February 2nd, 2.5.1 on March 7th and 2.6 on April 7th. Detailed list of new features can be found in JRA1 section. At the end of the quarter there were four NGI instances: NGI_BY, NGI_CZ, NGI_GRNET and NGI_IBERGRID.


Two new version of Operations portal were released in this quarter: 2.4 on November 17th and 2.4.1 on December 16th. Detailed list of new features can be found in JRA1 section. One new NGI instance of Operations portal was deployed in Belarus NGI (NGI_BY). At the end of the quarter there were four NGI instances: NGI_BY, NGI_CZ, NGI_GRNET and NGI_IBERGRID.
Two new versions of SAM were deployed in this quarter: SAM-Update09 on March 8th and SAM-Update10 at the end of April. Central MyEGI instance was deployed in February after SAM-Update09 release. Due to bugs spotted in software release it was not broadcasted to wide audience.


SAM/Nagios deployment of NGI instances continued. Two big ROCs finalized migration to NGI instances:
SAM/Nagios deployment of NGI instances continued. Two big ROCs finalized migration to NGI instances:
* Northern Europe: NGI NDGF finalized validation of NGI instance on January 25th 2011
* Asia Pacific: validated on February 17th 2011
* Southeast Europe (9 NGIs):
* Italy: validated on April 4th 2011
** Romania (NGI_RO): validated on November 19th 2010
** Cyprus (NGI_CYGRID): validated on December 7th 2010
** Georgia (NGI_GE): monitoring was taken over by Serbian NGI (NGI_AEGIS) on December 13th 2010
** Macedonia (NGI_MARGI): validated on December 20th 2010
** Bosnia and Herzegovina (NGI_BA): validated on January 10th 2011
** Montenegro (NGI_ME): validated on January 12th 2011
** Bulgaria (NGI_BG): validated on January 13th 2011
** Armenia (NGI_ARMGRID): validated on January 18th 2011
** Israel (NGI_IL): validated on January 19th 2011
At the end of the quarter following SAM/Nagios instances were in production:
At the end of the quarter following SAM/Nagios instances were in production:
* 23 NGI instances covering 34 EGI partners
* 24 NGI instances covering 35 EGI partners
* 3 ROC instances covering 4 EGI partners
* 3 ROC instances covering 4 EGI partners
* 2 project instances covering 2 EGI partners
* 1 project instances covering 1 EGI partners
* 3 external ROC instances covering the following regions: Canada, IGALC and LA.
* 3 external ROC instances covering the following regions: Canada, IGALC and LA.
Detailed list of SAM/Nagios instances can be found on the following page: [[SAM Instances]].
Detailed list of SAM/Nagios instances can be found on the following page: [[SAM Instances]].


Accounting enforcement section of accounting portal was obsoleted when new APEL tests were integrated into SAM/Nagios. The enforcement section was decommissioned on December 21st 2010.
Deployment plans of NGI instances of individual operational tools were finalized. Relevant ticket ([https://rt.egi.eu/rt/Ticket/Display.html?id=831 RT #831]) is closed.


Monitoring of sites is performed by using OPS virtual organization. At the face to face OMB in Amsterdam it was decided that all services will be monitored by OPS VO (see details in the following [https://www.egi.eu/indico/contributionDisplay.py?contribId=27&confId=153 talk]).
Decommission of gridops.org domain was postponed due to external dependencies (i.e. Top BDII). Decommission of gridops.org domain was rescheduled for June 30th 2011.  
At the end of 2010 it was agreed that CERN will continue running the VOMRS service and that the management of VO will be transferred to EGI. At the OMB in Amsterdam it was agreed that VO managers will be Emir Imamagic and Peter Solagna. Initial plan was that there will be a manager per NGI, equivalent to dteam VO. At the OMB it was concluded that this schema is too heavyweight as each NGI can have only 2 DNs registered in VO. Decision was made that all operations will be performed by the two VO managers.


Work on three procedures relevant for operational tools started:
Two procedures relevant for operational tools were approved at the Operations Management Board on March 15th 2011:
* Procedure for unscheduled downtimes of central operations tools - defines uniform way of announcing of outages of central operations tools. Details can be found in the RT ticket: [https://rt.egi.eu/rt/Ticket/Display.html?id=537 RT #537]
* Adding new probes to SAM ([[PROC07]])
* Procedure for adding new probes to SAM release - defines steps needed for inclusion of new probes into SAM. Details can be found in the RT ticket: [https://rt.egi.eu/rt/Ticket/Display.html?id=1051 RT #1051]
* Management of the EGI OPS Availability and Reliability Profile ([[PROC08]])
* Procedure for modification of Availability tests - defines steps needed for inclusion of new tests to group of availability tests used for A/R calculations. Details can be found in the RT ticket: [https://rt.egi.eu/rt/Ticket/Display.html?id=1052 RT #1052]
Drafts of all three procedures were presented at the face to face OMB in Amsterdam. Talks can be found on the [https://www.egi.eu/indico/sessionDisplay.py?sessionId=7&confId=153#20110125 following page].
 
The following wiki pages relevant for operational tools were created:
* [[Operational tools information]] - page contains brief description about each tool, main links to the tools interfaces and documentation links.
* [[Operational tools deployment plans]] - page contains NGI plans regarding deployment of regionalised versions of operations tools.


= 3. Issues and Mitigation =
= 3. Issues and Mitigation =
Line 95: Line 71:
!scope="col"| Mitigation Description
!scope="col"| Mitigation Description
|-
|-
|High availability of central operational tools is needed. || '''GOCDB''': dynamic loadbalancing DNS setup is provided for the address goc.egi.eu, secondary instance will be set up in Fraunhofer institute in the next quarter.<br> '''SAM''': April release of SAM will contain option to install secondary instance, this will be deployed based on depending on NGI size and resources.<br> '''Operations, accounting portal and metrics portal''':  services are deployed on virtualization platforms, backups performed regularily, recovery in case of failure can be performed quickly.
|High availability of central operational tools is needed. || '''GOCDB''': dynamic loadbalancing DNS setup is provided for the address goc.egi.eu, secondary instance will be set up in Fraunhofer institute in the next quarter.<br> '''SAM''': SAM-Update11 release of SAM will contain option to install secondary instance, this will be deployed based on depending on NGI size and resources.<br> '''Operations, accounting portal and metrics portal''':  services are deployed on virtualization platforms, backups performed regularily, recovery in case of failure can be performed quickly.
|-
|GOCDB database hardware issues. || GOCDB has recently experienced problems caused by bad database hardware. On January 27th new instance was deployed and all tools were requested to validate the test instance. As the validation was successful migration of GOCDB to new hardware was scheduled for February 2nd.
|-
|-
|}
|}
Line 104: Line 78:
<!-- provide your text below -->
<!-- provide your text below -->


Central MyEGI instance which provides access to data from all NGIs will be deployed at CERN. In addition SAM team will provide specific version of SAM which will enable easy installation of such central MyEGI instance. This activity will be finalized by the end of February 2011.
Central MyEGI instance will reach production quality.  
 
GOCDB will be migrated to new hardware on February 2nd (see Issues above).
 
Decommission of gridops.org domain is scheduled for March 14th 2001. All addresses have already been migrated. In case of any issues reported by external tools this date will be moved, but not later than end of March.
 
Deploy correct web certificates on all central operational tools for the new egi.eu addresses in order to avoid web browser certificate pop-up problem. This activity will be finalized before the decommission of gridops.org domain.


Decommission of the old CIC portal (cic.egi.eu) will be performed between April and June 2011 depending on development of the new Operations portal. The main remaining functionalities which need to be migrated to Operations Portal are broadcast and VO ID cards.
Decommission of the old CIC portal (cic.egi.eu) will be performed between April and June 2011 depending on development of the new Operations portal. The main remaining functionalities which need to be migrated to Operations Portal are broadcast and VO ID cards.


Procedures related to operational tools will be finalized and presented for approval at the OMB in the next quarter.
Remaining procedures and manuals related to operational tools will be finalized and presented for approval at the OMB in the next quarter.


Contribute and follow discussions of the new task force on regionalization. Update deployment plans of individual NGI instances of tools which will provide regionalized versions in the following period.
Contribute and follow discussions of the new task force on regionalization. Update deployment plans of individual NGI instances of tools which will provide regionalized versions in the following period.
Line 120: Line 88:
Track deployment and validation of remaining regional and NGI Nagioses. Deployment plans of the remaining NGIs are the following:
Track deployment and validation of remaining regional and NGI Nagioses. Deployment plans of the remaining NGIs are the following:
* UK and Ireland plan to perform NGI creation in the next quarter.
* UK and Ireland plan to perform NGI creation in the next quarter.
* Asia Pacific ROC Nagios instance has been validated and finalization is planned for the next quarter. For details see [https://gus.fzk.de/ws/ticket_info.php?ticket=57154 GGUS #57154]
Track deployment of other operational tools according to their roadmap.


Track development of probes for monitoring operational tools and integration into ops-monitor Nagios instance.
Track development of probes for monitoring operational tools and integration into ops-monitor Nagios instance.


Track and perform planned tests of failover configurations of centralized tools. The ideal customer is GOCDB which will implement failover in the next quarter.
Track and perform planned tests of failover configurations of centralized tools.

Latest revision as of 18:59, 6 January 2015

EGI Inspire Main page


Inspire reports menu: Home SA1 weekly Reports SA1 Task QR Reports NGI QR Reports NGI QR User support Reports



1. Task Meetings

There are no specific SA1.4 meetings. It was agreed to discuss all deployment issues with operational tool representatives at the JRA1 meetings. Below is the list of JRA1 meetings and subjects relevant for SA1.4 which were discussed.

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome
17/02/2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=352 InSPIRE-JRA1 phone conf SAM Update-09 release analysis and SAM DMSU activity setup.
16/03/2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=427 InSPIRE-JRA1 phone conf Operational tools updates deployment.
28/04/2010 https://www.egi.eu/indico/conferenceDisplay.py?confId=426 InSPIRE-JRA1 phone conf Handling of nodes in non-production state.

2. Main Achievements

GOCDB was migrated to the new platform at February 2nd. After the migration GOCDB service has been working without outages. GOCDB failover instance deployment started. Process was not finalized in this quarter and it is planned to be finished at the beginning of the next quarter.

Three new versions of Operations portal were deployed in this quarter: 2.5 on February 2nd, 2.5.1 on March 7th and 2.6 on April 7th. Detailed list of new features can be found in JRA1 section. At the end of the quarter there were four NGI instances: NGI_BY, NGI_CZ, NGI_GRNET and NGI_IBERGRID.

Two new versions of SAM were deployed in this quarter: SAM-Update09 on March 8th and SAM-Update10 at the end of April. Central MyEGI instance was deployed in February after SAM-Update09 release. Due to bugs spotted in software release it was not broadcasted to wide audience.

SAM/Nagios deployment of NGI instances continued. Two big ROCs finalized migration to NGI instances:

  • Asia Pacific: validated on February 17th 2011
  • Italy: validated on April 4th 2011

At the end of the quarter following SAM/Nagios instances were in production:

  • 24 NGI instances covering 35 EGI partners
  • 3 ROC instances covering 4 EGI partners
  • 1 project instances covering 1 EGI partners
  • 3 external ROC instances covering the following regions: Canada, IGALC and LA.

Detailed list of SAM/Nagios instances can be found on the following page: SAM Instances.

Deployment plans of NGI instances of individual operational tools were finalized. Relevant ticket (RT #831) is closed.

Decommission of gridops.org domain was postponed due to external dependencies (i.e. Top BDII). Decommission of gridops.org domain was rescheduled for June 30th 2011.

Two procedures relevant for operational tools were approved at the Operations Management Board on March 15th 2011:

  • Adding new probes to SAM (PROC07)
  • Management of the EGI OPS Availability and Reliability Profile (PROC08)

3. Issues and Mitigation

Issue Description Mitigation Description
High availability of central operational tools is needed. GOCDB: dynamic loadbalancing DNS setup is provided for the address goc.egi.eu, secondary instance will be set up in Fraunhofer institute in the next quarter.
SAM: SAM-Update11 release of SAM will contain option to install secondary instance, this will be deployed based on depending on NGI size and resources.
Operations, accounting portal and metrics portal: services are deployed on virtualization platforms, backups performed regularily, recovery in case of failure can be performed quickly.

4. Plans for the next period

Central MyEGI instance will reach production quality.

Decommission of the old CIC portal (cic.egi.eu) will be performed between April and June 2011 depending on development of the new Operations portal. The main remaining functionalities which need to be migrated to Operations Portal are broadcast and VO ID cards.

Remaining procedures and manuals related to operational tools will be finalized and presented for approval at the OMB in the next quarter.

Contribute and follow discussions of the new task force on regionalization. Update deployment plans of individual NGI instances of tools which will provide regionalized versions in the following period.

Track deployment and validation of remaining regional and NGI Nagioses. Deployment plans of the remaining NGIs are the following:

  • UK and Ireland plan to perform NGI creation in the next quarter.

Track development of probes for monitoring operational tools and integration into ops-monitor Nagios instance.

Track and perform planned tests of failover configurations of centralized tools.