EGI-InSPIRE:WP7 Operational Tools DoW summary
- 1 WP7-JRA1
- 1.1 Involved partners
- 1.2 TJRA1.1 Activity Management (4yr) (Daniele Cesini, INFN)
- 1.3 TJRA1.2 Maintenance and development of the deployed operational tools (4yr) (Torsten Antoni, KIT)
- 1.4 National deployment models (first year only) (COO, EGI.eu)
- 1.5 TJRA1.4 Accounting for different resource types (from the 2nd yr) (John Gordon, STFC)
- 1.6 TJRA1.5: Integrated Operations Portal (first 3 yr) (Cyril L'Orphelin, CNRS)
- 1.7 First year Milestones and Deliverables
This activity provides for the continual evolution of the operational tools used by the production infrastructure, including:
- The ongoing maintenance and further development of the deployed operational tools
- The development of the operational tools to support a national deployment model (tool regionalisation)
- Accounting for the use of different resources within the production infrastructure
- Providing an integrated operations portal for the staff running the production infrastructure
Germany - KIT-G, LUH
Spain - CSIC, FCTSG
France - CNRS,
Greece - GRNET
Croatia - SRCE
Italy - INFN
UK - STFC
TJRA1.1 Activity Management (4yr) (Daniele Cesini, INFN)
IGI/INFN 24PM (Daniele Cesini)
1. coordination of the tool development work;
2. definition and follow-up of the software development roadmaps, in collaboration with the Operational Tools Advisory Group;
3. representation of the activity within EGI.eu‘s management boards;
4. overseeing the testing and release preparation of software before deployment;
5. reporting on status and open issues related to the activity;
6. OTAG and USAG participation.
TJRA1.2 Maintenance and development of the deployed operational tools (4yr) (Torsten Antoni, KIT)
The reference tools are: the operations portal, the EGI Helpdesk, the Grid configuration repository (GOCDB), the accounting repository and portal, and the Service Availability Monitoring framework.
TJRA1.2.1 Operations portal
Operations Portal (Cyril Lorphelin, ??)
The VO registration tool, the broadcast and downtime system, the periodic, operations report submission system, the regional dashboard, etc. The programme of work includes tool maintenance (bug fixing and enhancement for the failover configuration)
TJRA1.2.2 EGI Helpdesk
GGUS (Torsten Antoni, Guenter Grein, Helmut Dres)
- A standard interface to the EGI Helpdesk instance using web services or messaging that allows its integration with independent national helpdesks.
- New ticket workflows that support the integration of different support units (e.g.middleware-related product teams, e-Infrastructure providers, etc.)
- A customisable view of the EGI Helpdesk system will be provided so that support units (located within an NGI, a VRC or a collaborating project) that do not wish to set up their own national helpdesk system can effectively use the central EGI Helpdesk.
TJRA1.2.3 Grid configuration repository: GOCDB
GOCDB (John Gordon , Gilles Mathieu, Cristina del Cano Novales)
- Change the internal data structure to support new resource type
- Port the GOCDB data storage layer from Oracle to MySQL to support non-Oracle deployments within the NGIs
- Enhancing the presentation layer to enable integration with the monitoring and operations portals, using a common interoperable toolkit for grid operations
- Schemas for storing information (e.g. about grid sites) and the definition and implementation of interfaces between static/dynamic models will be contributed to OGF for standardisation
TJRA1.2.4 Accounting repository
STFC 24PM - (John Gordon, Gilles Mathieu)
The repository will be adapted to implement the Resource Usage Service (RUS) interface from the OGF as it defines the communication channels for the exchange of usage records, and the structure of the usage records to be exchanged.
TJRA1.2.5 Accounting portal
CSIC 12PM - (Carlos Fernandez, Javier Lopez) Maintenance work includes technical activities such as local job accounting, and the migration to a new compute power benchmark, to a new transport mechanism for the exchange of information (ActiveMQ) and to a standard interface for importing usage records from the accounting repository. The portal will be extended to include new views for displaying the usage of different resources and services (e.g. storage accounting, parallel job accounting, etc.).
TJRA1.2.6 Service Availability Monitoring
CERN 12PM (David Horat, David Collados, Wojciech Lapka)
GRNET 12PM (Kostas Koumantaros)
SRCE 12PM (Emir Imamagic)
Includes the following components:
1. probes: a test execution framework (based on the open source monitoring framework Nagios) and the Nagios Configuration Generator (NCG)
2. the Aggregated Topology Provider (ATP), the Metrics Description Database (MDDB), and the Metrics Results Database (MRDB)
3. the message bus to publish results and a programmatic interface
4. the visualization portal (MyEGI).
The reporting capabilities of MyEGI (e.g. GridView) will be adapted from ROCs to NGIs. Work will be needed on the Nagios probes to:
- adapt to the capabilities of the evolving middleware
- integrate new middleware distributions and components into the monitoring system
- integrate new resources.
The NCG component and the site/regional Nagios will be extended to support these new local probes.
TJRA1.2.7 Metrics Portal
CERN 12PM (James Casey, David Horat)
The portal will be maintained to include new metrics (e.g. storage, job success rate, EGI Helpdesk ticket response, etc.) as they become available from the infrastructure. Additionally, new views will be added to the Metrics Portal to satisfy the requirements of VRCs, allowing them to keep track of the production infrastructure from their own perspective.
The incorporation of new metrics will be determined by the OTAG following input from the user community.
National deployment models (first year only) (COO, EGI.eu)
This task includes effort for tool ―regionalisation‖, i.e. software programming activities needed by the operational tools (or tool components) that currently can only run centrally, to adapt them to a fully distribution model for deployment at the NGI level.
TJRA1.3.1 Operations Portal
- from a federation-based to a NGI-based structure of operations
- the central operation portal will be a catch-all instance providing NGI-customised views at a central level
STFC 3PM This task will provide an instance of the GOCDB service that could be deployed nationally (going beyond the regional deployment model covered in TJRA1.2.3) and information federated into a central instance.
TJRA1.3.3 Accounting portal
CSIC 3PM The central accounting portal will be extended to support regional national deployments that allow NGIs to operate a full and standalone national accounting infrastructure using a national deployment of the accounting repository
TJRA1.3.4 Service Availability Monitoring
The visualization portal MyEGI will be adapted to a new pluggable framework, and provide EGI-specific NGI views. MyEGI will be the place to see availability, reliability, service status of NGI resources. 'GridMap style' TreeMap views will be added, showing both regional and global views of the stored data.
Details for the Nagios Regional Taransition
TJRA1.4 Accounting for different resource types (from the 2nd yr) (John Gordon, STFC)
LUH (18PM) (Jan Wiebelitz,Michael Brenner)
TJRA1.4.2 Accounting of application usage
TJRA1.4.3 Accounting of data usage
TJRA1.4.4 Accounting of capacity and cloud computing usage
TJRA1.5: Integrated Operations Portal (first 3 yr) (Cyril L'Orphelin, CNRS)
- will be ported to the Symfony open-source web-development
- extended to support messaging (downtime and broadcast tools)
- harmonized with other portal framework
- pluggable through portlet/widget technologies into other portals such as scientific gateways, myEGI, iGoogle, etc
- incorporates other DCIs technologies through the development of new plug-ins and procedures
- GOCDB and CIC portal will also be harmonized at the front-end and back-end level
see DoW pag 91
First year Milestones and Deliverables
Due Month 1:
Define the roadmap for the CIC Operations Portal taking into account the CIC Operations regionalisation of the Portal work plan operational tools and new resource types being used on the infrastructure.
A report describing the different operational tool product team’s development infrastructure and procedures including details of their development infrastructure.
Due Month 2:
Specify a work plan Operational Tool identifying the upcoming releases and associated plan functionality.
Due Month 3:
A public report describing the roadmap for all the deployed operational tools over the next 18 months defining release tools and deployment dates.
Due Month 11:
Annual Report on Operational Tool maintenance and development activity