EGI-InSPIRE:WP7 Operational Tools DoW summary

From EGIWiki
Revision as of 12:56, 13 May 2010 by Cesini (talk | contribs)
Jump to: navigation, search


This activity provides for the continual evolution of the operational tools used by the production infrastructure, including:

  • The ongoing maintenance and further development of the deployed operational tools
  • The development of the operational tools to support a national deployment model (tool regionalisation)
  • Accounting for the use of different resources within the production infrastructure
  • Providing an integrated operations portal for the staff running the production infrastructure

Involved partners

Germany - KIT-G
Spain - CSIC
France - CNRS
Greece - GRNET
Croatia - SRCE
Italy - INFN

TJRA1.1 Activity Management (4yr)

IGI/INFN 24PM (Daniele Cesini)

TJRA1.2 Maintenance and development of the deployed operational tools (4yr)

The reference tools are: the operations portal, the EGI Helpdesk, the Grid configuration repository (GOCDB), the accounting repository and portal, and the Service Availability Monitoring framework.

TJRA1.2.1 Operations portal

CNRS 12PM - Operations Portal (Cyril Lorphelin, Gilles Mathieu ??)
The VO registration tool, the broadcast and downtime system, the periodic, operations report submission system, the regional dashboard, etc. The programme of work includes tool maintenance (bug fixing and enhancement for the failover configuration)

TJRA1.2.2 EGI Helpdesk

KIT 47PM - GGUS (Torsten, Gunter, Helmut) - A standard interface to the EGI Helpdesk instance using web services or messaging that allows its integration with independent national helpdesks.
- New ticket workflows that support the integration of different support units (e.g.middleware-related product teams, e-Infrastructure providers, etc.)
- A customisable view of the EGI Helpdesk system will be provided so that support units (located within an NGI, a VRC or a collaborating project) that do not wish to set up their own national helpdesk system can effectively use the central EGI Helpdesk.

TJRA1.2.3 Grid configuration repository: GOCDB

STFC 24PM - GOCDB (John Gordon ??) - Change the internal data structure to support new resource type
- Port the GOCDB data storage layer from Oracle to MySQL to support non-Oracle deployments within the NGIs
- Enhancing the presentation layer to enable integration with the monitoring and operations portals, using a common interoperable toolkit for grid operations
- Schemas for storing information (e.g. about grid sites) and the definition and implementation of interfaces between static/dynamic models will be contributed to OGF for standardisation

TJRA1.2.4 Accounting repository

STFC 24PM - GOCDB (John Gordon ??)
The repository will be adapted to implement the Resource Usage Service (RUS) interface from the OGF as it defines the communication channels for the exchange of usage records, and the structure of the usage records to be exchanged.

TJRA1.2.5 Accounting portal

CSIC 12PM - (???) Maintenance work includes technical activities such as local job accounting, and the migration to a new compute power benchmark, to a new transport mechanism for the exchange of information (ActiveMQ) and to a standard interface for importing usage records from the accounting repository. The portal will be extended to include new views for displaying the usage of different resources and services (e.g. storage accounting, parallel job accounting, etc.).

TJRA1.2.6 Service Availability Monitoring


Includes the following components:
1. probes: a test execution framework (based on the open source monitoring framework Nagios) and the Nagios Configuration Generator (NCG)
2. the Aggregated Topology Provider (ATP), the Metrics Description Database (MDDB), and the Metrics Results Database (MRDB)
3. the message bus to publish results and a programmatic interface
4. the visualization portal (MyEGI).

The reporting capabilities of MyEGI (e.g. GridView) will be adapted from ROCs to NGIs. Work will be needed on the Nagios probes to:

  • adapt to the capabilities of the evolving middleware
  • integrate new middleware distributions and components into the monitoring system
  • integrate new resources.

The NCG component and the site/regional Nagios will be extended to support these new local probes.