Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI-InSPIRE:SA1.4-QR6"

From EGIWiki
Jump to navigation Jump to search
Line 1: Line 1:
[[Category:SA1 Task QR Reports]]
__NOTOC__  
__NOTOC__  


= 1. Task Meetings  =
= 1. Task Meetings  =


There are no specific SA1.4 meetings. It was agreed to discuss all deployment issues with operational tool representatives at the JRA1 meetings. Below is the list of JRA1 meetings and subjects relevant for SA1.4 which were discussed.
There are no specific SA1.4 meetings. It was agreed to discuss all deployment issues with operational tool representatives at the JRA1 meetings. Below is the list of JRA1 meetings and subjects relevant for SA1.4 which were discussed.  


{| cellspacing="0" cellpadding="5" border="1" align="center"
{| cellspacing="0" cellpadding="5" border="1" align="center"
Line 13: Line 12:
! style="width: 40%;" | Outcome
! style="width: 40%;" | Outcome
|-
|-
|01/09/2011
| 01/09/2011  
|https://www.egi.eu/indico/conferenceDisplay.py?confId=577
| https://www.egi.eu/indico/conferenceDisplay.py?confId=577  
|InSPIRE-JRA1 phone conf
| InSPIRE-JRA1 phone conf  
|Regionalization plans for all tools.
| Regionalization plans for all tools.
|-
|-
|15/09/2011
| 15/09/2011  
|https://www.egi.eu/indico/conferenceDisplay.py?confId=608
| https://www.egi.eu/indico/conferenceDisplay.py?confId=608  
|InSPIRE-JRA1 phone conf
| InSPIRE-JRA1 phone conf  
|Metric portal status. Technical forum planning.
| Metric portal status. Technical forum planning.
|-
|-
|20/09/2011
| 20/09/2011  
|https://www.egi.eu/indico/sessionDisplay.py?sessionId=78&confId=452#20110920
| https://www.egi.eu/indico/sessionDisplay.py?sessionId=78&confId=452#20110920  
|Operations Tools and Availability Calculation (EGI TF)
| Operations Tools and Availability Calculation (EGI TF)  
|Dedicated EGI TF session on operational tools monitoring and availability calculation.
| Dedicated EGI TF session on operational tools monitoring and availability calculation.
|-
|-
|30/09/2011
| 30/09/2011  
|https://www.egi.eu/indico/conferenceDisplay.py?confId=648
| https://www.egi.eu/indico/conferenceDisplay.py?confId=648  
|A/R calculation TF session follow up
| A/R calculation TF session follow up  
|Continued discussion on availability and reliability calculation.
| Continued discussion on availability and reliability calculation.
|-
|-
|19/10/2011
| 19/10/2011  
|
|  
|A/R probe meeting
| A/R probe meeting  
|Discussion about probe for site A/R monitoring.
| Discussion about probe for site A/R monitoring.
|-
|-
|20/10/2011
| 20/10/2011  
|https://www.egi.eu/indico/conferenceDisplay.py?confId=608
| https://www.egi.eu/indico/conferenceDisplay.py?confId=608  
|InSPIRE-JRA1 phone conf
| InSPIRE-JRA1 phone conf  
|Status of VO SAM instance support.
| Status of VO SAM instance support.
|}
|}


= 2. Main Achievements  =
= 2. Main Achievements  =


== Operational tools progress ==
== Operational tools progress ==


The new version of messaging broker ActiveMQ 5.5 was tested in October. For testing purposes additional broker network was set up. The testing network consisted of 4 brokers (2 at AUTH and one at CERN and SRCE) and passed all the tests. The main issue with the new broker is the lack of proper packaging and Yaim module which needs to be resolved prior to upgrade of production instances.
The new version of messaging broker ActiveMQ 5.5 was tested in October. For testing purposes additional broker network was set up. The testing network consisted of 4 brokers (2 at AUTH and one at CERN and SRCE) and passed all the tests. The main issue with the new broker is the lack of proper packaging and Yaim module which needs to be resolved prior to upgrade of production instances.  


Metrics portal reached stable version and it was used in QR6 generation.
Metrics portal reached stable version and it was used in QR6 generation.  


Two new versions of Operations portal were deployed in this quarter: 2.6.3 on August 5th and 2.6.4 on September 29th. Detailed list of new features can be found in JRA1 section. At the end of the quarter there were four NGI instances: NGI_BY, NGI_CZ, NGI_GRNET and NGI_IBERGRID. Decommission of the old CIC portal (cic.egi.eu) was postponed and is planned for the next quarter.  
Two new versions of Operations portal were deployed in this quarter: 2.6.3 on August 5th and 2.6.4 on September 29th. Detailed list of new features can be found in JRA1 section. At the end of the quarter there were four NGI instances: NGI_BY, NGI_CZ, NGI_GRNET and NGI_IBERGRID. Decommission of the old CIC portal (cic.egi.eu) was postponed and is planned for the next quarter.  


Two new versions of SAM were deployed in this quarter: SAM-Update13 on September 7th and SAM-Update14 on October 22th. SAM/Nagios deployment of NGI instances continued. As part of the NGI UK creation UKI ROC SAM instance was switched to NGI instance covering two NGIs: NGI_IE (Ireland) and the new NGI_UK. At the end of the quarter following SAM/Nagios instances were in production:
Two new versions of SAM were deployed in this quarter: SAM-Update13 on September 7th and SAM-Update14 on October 22th. SAM/Nagios deployment of NGI instances continued. As part of the NGI UK creation UKI ROC SAM instance was switched to NGI instance covering two NGIs: NGI_IE (Ireland) and the new NGI_UK. At the end of the quarter following SAM/Nagios instances were in production:  
* 26 NGI instances covering 37 EGI partners
* 2 ROC instances covering 2 EGI partners
* 1 project instances covering 1 EGI partners
* 3 external ROC instances covering the following regions: Canada, IGALC and LA.
Detailed list of SAM/Nagios instances can be found on the following page: [[SAM Instances]].


Starting from September 12th SAM uses the new test [http://wiki.cro-ngi.hr/en/index.php/Hr.srce.CADist-Check hr.srce.CADist-Check] for monitoring EGI Trust Anchor version on WNs. The new test is included in [[Operations:Operations_tests|operations tests]] and [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC_CRITICAL availability and reliability tests]. The main new feature of the new CA test is: metadata provided in CA release is used so there is no need for manual update of CA probe package after CA releases.
*26 NGI instances covering 37 EGI partners
*2 ROC instances covering 2 EGI partners
*1 project instances covering 1 EGI partners
*3 external ROC instances covering the following regions: Canada, IGALC and LA.


== Monitoring of core services and operational tools ==
Detailed list of SAM/Nagios instances can be found on the following page: [[SAM Instances]].


Development of the new SAM instance for operational tools monitoring started. The first step was reorganization of operational tools in the GOCDB:
Starting from September 12th SAM uses the new test [http://wiki.cro-ngi.hr/en/index.php/Hr.srce.CADist-Check hr.srce.CADist-Check] for monitoring EGI Trust Anchor version on WNs. The new test is included in [[Operations SAM tests|operations tests]] and [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC_CRITICAL availability and reliability tests]. The main new feature of the new CA test is: metadata provided in CA release is used so there is no need for manual update of CA probe package after CA releases.  
* all central operational tools are grouped in EGI.eu group
* new service types were added for each operational tool (https://rt.egi.eu/rt/Ticket/Display.html?id=2587)
* all regional operational tools instances are associated with sites.
Additional details can be found in the following slides: https://www.egi.eu/indico/conferenceDisplay.py?confId=549. This reorganization will enable automatic bootstrap of SAM instance for operational tools and integration with MyEGI web interface and ACE system for A/R calculation.


Reorganization of NGI core services in the GOCDB was proposed at the OMB (https://www.egi.eu/indico/conferenceDisplay.py?confId=615). This reorganization will enable NGI-level A/R calculation.
== Monitoring of core services and operational tools  ==


== EGI Technical Forum ==
Development of the new SAM instance for operational tools monitoring started. The first step was reorganization of operational tools in the GOCDB:


During the EGI Technical Forum in Lyon several sessions related to operational tools were organized. The most important one was "Operations Tools and Availability Calculation" (https://www.egi.eu/indico/sessionDisplay.py?sessionId=78&confId=452#20110920). The main topics were:
*all central operational tools are grouped in EGI.eu group
* Presentation of SAM and ACE architecture with discussion about NGI and EGI core services ACE profiles. This discussion was continued on dedicated session on September 30th.
*new service types were added for each operational tool (https://rt.egi.eu/rt/Ticket/Display.html?id=2587)  
* Presentation of the new SAM instance for operational tools monitoring.
*all regional operational tools instances are associated with sites.
* Discussion about issue of UNKNOWN status (https://www.egi.eu/indico/contributionDisplay.py?contribId=395&confId=452).  


Several side meetings were held at the EGI TF:
Additional details can be found in the following slides: https://www.egi.eu/indico/conferenceDisplay.py?confId=549. This reorganization will enable automatic bootstrap of SAM instance for operational tools and integration with MyEGI web interface and ACE system for A/R calculation.
* Meeting with EMI and SAM representatives where the integration of EMI probes and the future SAM release process were discussed.
 
* Meeting with EDGI representatives where integration of DesktopGrids resources into EGI infrastructure was discussed.
Reorganization of NGI core services in the GOCDB was proposed at the OMB (https://www.egi.eu/indico/conferenceDisplay.py?confId=615). This reorganization will enable NGI-level A/R calculation.
 
== EGI Technical Forum  ==
 
During the EGI Technical Forum in Lyon several sessions related to operational tools were organized. The most important one was "Operations Tools and Availability Calculation" (https://www.egi.eu/indico/sessionDisplay.py?sessionId=78&confId=452#20110920). The main topics were:
 
*Presentation of SAM and ACE architecture with discussion about NGI and EGI core services ACE profiles. This discussion was continued on dedicated session on September 30th.
*Presentation of the new SAM instance for operational tools monitoring.
*Discussion about issue of UNKNOWN status (https://www.egi.eu/indico/contributionDisplay.py?contribId=395&confId=452).
 
Several side meetings were held at the EGI TF:  
 
*Meeting with EMI and SAM representatives where the integration of EMI probes and the future SAM release process were discussed.  
*Meeting with EDGI representatives where integration of DesktopGrids resources into EGI infrastructure was discussed.


= 3. Issues and Mitigation  =
= 3. Issues and Mitigation  =


{| border="1" cellspacing="0" cellpadding="2"
{| cellspacing="0" cellpadding="2" border="1"
|-
!scope="col"| Issue Description
!scope="col"| Mitigation Description
|-
|-
|High availability of central operational tools is needed. || '''GOCDB''': dynamic loadbalancing DNS setup is provided for the address goc.egi.eu. Secondary instance in Fraunhofer institute is still being deployed. Delay is caused by the development and deployment of the new GOCDB version.
! scope="col" | Issue Description
! scope="col" | Mitigation Description
|-
|-
|Monitoring of underperforming sites. || COD team has proposed monitoring of availability and reliability of sites. In case of decreased A/R alarm would be raised against the site. Such approach would enable sites to correct A/R figures before the end of the month and stay within OLA thresholds. Discussions have started on defining implementation details.
| High availability of central operational tools is needed.  
| '''GOCDB''': dynamic loadbalancing DNS setup is provided for the address goc.egi.eu. Secondary instance in Fraunhofer institute is still being deployed. Delay is caused by the development and deployment of the new GOCDB version.
|-
|-
|ActiveMQ broker is not fully packaged and Yaim module is missing. There is no support unit for ActiveMQ broker. || Discussion with EMI messaging product team started in order to agree on package format. Once the package format is agreed, AUTH partner will provide additional documentation and secure SVN repository for storing configuration files. This approach will be used only for broker network used by operational tools. If any other EMI service requires messaging infrastructure, proper support unit and Yaim modules will need to be provided by EMI.
| Monitoring of underperforming sites.  
| COD team has proposed monitoring of availability and reliability of sites. In case of decreased A/R alarm would be raised against the site. Such approach would enable sites to correct A/R figures before the end of the month and stay within OLA thresholds. Discussions have started on defining implementation details.
|-
|-
| ActiveMQ broker is not fully packaged and Yaim module is missing. There is no support unit for ActiveMQ broker.
| Discussion with EMI messaging product team started in order to agree on package format. Once the package format is agreed, AUTH partner will provide additional documentation and secure SVN repository for storing configuration files. This approach will be used only for broker network used by operational tools. If any other EMI service requires messaging infrastructure, proper support unit and Yaim modules will need to be provided by EMI.
|}
|}


= 4. Plans for the next period  =
= 4. Plans for the next period  =


Decommission of the old CIC portal (cic.egi.eu) is planned for the next quarter.
Decommission of the old CIC portal (cic.egi.eu) is planned for the next quarter.
 
Track and perform planned tests of failover configurations of centralized tools.  


Track and perform planned tests of failover configurations of centralized tools.
Deployment of the new SAM instance dedicated for monitoring operational tools with the new probes provided by operational tools developers.  


Deployment of the new SAM instance dedicated for monitoring operational tools with the new probes provided by operational tools developers.
Integration of DesktopGrids resources into EGI infrastructure.  


Integration of DesktopGrids resources into EGI infrastructure.
[[Category:SA1_Task_QR_Reports]]

Revision as of 10:50, 19 October 2012


1. Task Meetings

There are no specific SA1.4 meetings. It was agreed to discuss all deployment issues with operational tool representatives at the JRA1 meetings. Below is the list of JRA1 meetings and subjects relevant for SA1.4 which were discussed.

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome
01/09/2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=577 InSPIRE-JRA1 phone conf Regionalization plans for all tools.
15/09/2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=608 InSPIRE-JRA1 phone conf Metric portal status. Technical forum planning.
20/09/2011 https://www.egi.eu/indico/sessionDisplay.py?sessionId=78&confId=452#20110920 Operations Tools and Availability Calculation (EGI TF) Dedicated EGI TF session on operational tools monitoring and availability calculation.
30/09/2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=648 A/R calculation TF session follow up Continued discussion on availability and reliability calculation.
19/10/2011 A/R probe meeting Discussion about probe for site A/R monitoring.
20/10/2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=608 InSPIRE-JRA1 phone conf Status of VO SAM instance support.

2. Main Achievements

Operational tools progress

The new version of messaging broker ActiveMQ 5.5 was tested in October. For testing purposes additional broker network was set up. The testing network consisted of 4 brokers (2 at AUTH and one at CERN and SRCE) and passed all the tests. The main issue with the new broker is the lack of proper packaging and Yaim module which needs to be resolved prior to upgrade of production instances.

Metrics portal reached stable version and it was used in QR6 generation.

Two new versions of Operations portal were deployed in this quarter: 2.6.3 on August 5th and 2.6.4 on September 29th. Detailed list of new features can be found in JRA1 section. At the end of the quarter there were four NGI instances: NGI_BY, NGI_CZ, NGI_GRNET and NGI_IBERGRID. Decommission of the old CIC portal (cic.egi.eu) was postponed and is planned for the next quarter.

Two new versions of SAM were deployed in this quarter: SAM-Update13 on September 7th and SAM-Update14 on October 22th. SAM/Nagios deployment of NGI instances continued. As part of the NGI UK creation UKI ROC SAM instance was switched to NGI instance covering two NGIs: NGI_IE (Ireland) and the new NGI_UK. At the end of the quarter following SAM/Nagios instances were in production:

  • 26 NGI instances covering 37 EGI partners
  • 2 ROC instances covering 2 EGI partners
  • 1 project instances covering 1 EGI partners
  • 3 external ROC instances covering the following regions: Canada, IGALC and LA.

Detailed list of SAM/Nagios instances can be found on the following page: SAM Instances.

Starting from September 12th SAM uses the new test hr.srce.CADist-Check for monitoring EGI Trust Anchor version on WNs. The new test is included in operations tests and availability and reliability tests. The main new feature of the new CA test is: metadata provided in CA release is used so there is no need for manual update of CA probe package after CA releases.

Monitoring of core services and operational tools

Development of the new SAM instance for operational tools monitoring started. The first step was reorganization of operational tools in the GOCDB:

Additional details can be found in the following slides: https://www.egi.eu/indico/conferenceDisplay.py?confId=549. This reorganization will enable automatic bootstrap of SAM instance for operational tools and integration with MyEGI web interface and ACE system for A/R calculation.

Reorganization of NGI core services in the GOCDB was proposed at the OMB (https://www.egi.eu/indico/conferenceDisplay.py?confId=615). This reorganization will enable NGI-level A/R calculation.

EGI Technical Forum

During the EGI Technical Forum in Lyon several sessions related to operational tools were organized. The most important one was "Operations Tools and Availability Calculation" (https://www.egi.eu/indico/sessionDisplay.py?sessionId=78&confId=452#20110920). The main topics were:

Several side meetings were held at the EGI TF:

  • Meeting with EMI and SAM representatives where the integration of EMI probes and the future SAM release process were discussed.
  • Meeting with EDGI representatives where integration of DesktopGrids resources into EGI infrastructure was discussed.

3. Issues and Mitigation

Issue Description Mitigation Description
High availability of central operational tools is needed. GOCDB: dynamic loadbalancing DNS setup is provided for the address goc.egi.eu. Secondary instance in Fraunhofer institute is still being deployed. Delay is caused by the development and deployment of the new GOCDB version.
Monitoring of underperforming sites. COD team has proposed monitoring of availability and reliability of sites. In case of decreased A/R alarm would be raised against the site. Such approach would enable sites to correct A/R figures before the end of the month and stay within OLA thresholds. Discussions have started on defining implementation details.
ActiveMQ broker is not fully packaged and Yaim module is missing. There is no support unit for ActiveMQ broker. Discussion with EMI messaging product team started in order to agree on package format. Once the package format is agreed, AUTH partner will provide additional documentation and secure SVN repository for storing configuration files. This approach will be used only for broker network used by operational tools. If any other EMI service requires messaging infrastructure, proper support unit and Yaim modules will need to be provided by EMI.

4. Plans for the next period

Decommission of the old CIC portal (cic.egi.eu) is planned for the next quarter.

Track and perform planned tests of failover configurations of centralized tools.

Deployment of the new SAM instance dedicated for monitoring operational tools with the new probes provided by operational tools developers.

Integration of DesktopGrids resources into EGI infrastructure.