|EGI Inspire Main page|
|Inspire reports menu:||Home •||SA1 weekly Reports •||SA1 Task QR Reports •||NGI QR Reports •||NGI QR User support Reports|
1. Task Meetings
There are no specific SA1.4 meetings. It was agreed to discuss all deployment issues with operational tool representatives at the JRA1 meetings. Below is the list of JRA1 meetings and subjects relevant for SA1.4 which were discussed.
|Date (dd/mm/yyyy)||Url Indico Agenda||Title||Outcome|
|01/09/2011||https://www.egi.eu/indico/conferenceDisplay.py?confId=577||InSPIRE-JRA1 phone conf||Regionalization plans for all tools.|
|15/09/2011||https://www.egi.eu/indico/conferenceDisplay.py?confId=608||InSPIRE-JRA1 phone conf||Metric portal status. Technical forum planning.|
|20/09/2011||https://www.egi.eu/indico/sessionDisplay.py?sessionId=78&confId=452#20110920||Operations Tools and Availability Calculation (EGI TF)||Dedicated EGI TF session on operational tools monitoring and availability calculation.|
|30/09/2011||https://www.egi.eu/indico/conferenceDisplay.py?confId=648||A/R calculation TF session follow up||Continued discussion on availability and reliability calculation.|
|19/10/2011||A/R probe meeting||Discussion about probe for site A/R monitoring.|
|20/10/2011||https://www.egi.eu/indico/conferenceDisplay.py?confId=608||InSPIRE-JRA1 phone conf||Status of VO SAM instance support.|
2. Main Achievements
Operational tools progress
The new version of messaging broker ActiveMQ 5.5 was tested in October. For testing purposes additional broker network was set up. The testing network consisted of 4 brokers (2 at AUTH and one at CERN and SRCE) and passed all the tests. The main issue with the new broker is the lack of proper packaging and Yaim module which needs to be resolved prior to upgrade of production instances.
Metrics portal reached stable version and it was used in QR6 generation.
Two new versions of Operations portal were deployed in this quarter: 2.6.3 on August 5th and 2.6.4 on September 29th. Detailed list of new features can be found in JRA1 section. At the end of the quarter there were four NGI instances: NGI_BY, NGI_CZ, NGI_GRNET and NGI_IBERGRID. Decommission of the old CIC portal (cic.egi.eu) was postponed and is planned for the next quarter.
Two new versions of SAM were deployed in this quarter: SAM-Update13 on September 7th and SAM-Update14 on October 22th. SAM/Nagios deployment of NGI instances continued. As part of the NGI UK creation UKI ROC SAM instance was switched to NGI instance covering two NGIs: NGI_IE (Ireland) and the new NGI_UK. At the end of the quarter following SAM/Nagios instances were in production:
- 26 NGI instances covering 37 EGI partners
- 2 ROC instances covering 2 EGI partners
- 1 project instances covering 1 EGI partners
- 3 external ROC instances covering the following regions: Canada, IGALC and LA.
Detailed list of SAM/Nagios instances can be found on the following page: SAM Instances.
Starting from September 12th SAM uses the new test hr.srce.CADist-Check for monitoring EGI Trust Anchor version on WNs. The new test is included in operations tests and availability and reliability tests. The main new feature of the new CA test is: metadata provided in CA release is used so there is no need for manual update of CA probe package after CA releases.
Monitoring of core services and operational tools
Development of the new SAM instance for operational tools monitoring started. The first step was reorganization of operational tools in the GOCDB:
- all central operational tools are grouped in EGI.eu group
- new service types were added for each operational tool (https://rt.egi.eu/rt/Ticket/Display.html?id=2587)
- all regional operational tools instances are associated with sites.
Additional details can be found in the following slides: https://www.egi.eu/indico/conferenceDisplay.py?confId=549. This reorganization will enable automatic bootstrap of SAM instance for operational tools and integration with MyEGI web interface and ACE system for A/R calculation.
Reorganization of NGI core services in the GOCDB was proposed at the OMB (https://www.egi.eu/indico/conferenceDisplay.py?confId=615). This reorganization will enable NGI-level A/R calculation.
EGI Technical Forum
During the EGI Technical Forum in Lyon several sessions related to operational tools were organized. The most important one was "Operations Tools and Availability Calculation" (https://www.egi.eu/indico/sessionDisplay.py?sessionId=78&confId=452#20110920). The main topics were:
- Presentation of SAM and ACE architecture with discussion about NGI and EGI core services ACE profiles. This discussion was continued on dedicated session on September 30th.
- Presentation of the new SAM instance for operational tools monitoring.
- Discussion about issue of UNKNOWN status (https://www.egi.eu/indico/contributionDisplay.py?contribId=395&confId=452).
Several side meetings were held at the EGI TF:
- Meeting with EMI and SAM representatives where the integration of EMI probes and the future SAM release process were discussed.
- Meeting with EDGI representatives where integration of DesktopGrids resources into EGI infrastructure was discussed.
3. Issues and Mitigation
|Issue Description||Mitigation Description|
|High availability of central operational tools is needed.||GOCDB: dynamic loadbalancing DNS setup is provided for the address goc.egi.eu. Secondary instance in Fraunhofer institute is still being deployed. Delay is caused by the development and deployment of the new GOCDB version.|
|Monitoring of underperforming sites.||COD team has proposed monitoring of availability and reliability of sites. In case of decreased A/R alarm would be raised against the site. Such approach would enable sites to correct A/R figures before the end of the month and stay within OLA thresholds. Discussions have started on defining implementation details.|
|ActiveMQ broker is not fully packaged and Yaim module is missing. There is no support unit for ActiveMQ broker.||Discussion with EMI messaging product team started in order to agree on package format. Once the package format is agreed, AUTH partner will provide additional documentation and secure SVN repository for storing configuration files. This approach will be used only for broker network used by operational tools. If any other EMI service requires messaging infrastructure, proper support unit and Yaim modules will need to be provided by EMI.|
4. Plans for the next period
Decommission of the old CIC portal (cic.egi.eu) is planned for the next quarter.
Track and perform planned tests of failover configurations of centralized tools.
Deployment of the new SAM instance dedicated for monitoring operational tools with the new probes provided by operational tools developers.
Integration of DesktopGrids resources into EGI infrastructure.