Difference between revisions of "EGI-InSPIRE:WP7 Operational Tools DoW summary"

From EGIWiki
Jump to: navigation, search
(TJRA1.2.5 Accounting portal)
(TJRA1.2.6 Service Availability Monitoring)
Line 62: Line 62:
===TJRA1.2.6 Service Availability Monitoring===
===TJRA1.2.6 Service Availability Monitoring===
CERN 12PM <br>  
CERN 12PM (James Casey)<br>  
GRNET 12PM <br>
GRNET 12PM (??)<br>
SRCE  12PM <br>
SRCE  (Emir )12PM <br>
Includes the following components:<br>
Includes the following components:<br>

Revision as of 13:56, 13 May 2010


This activity provides for the continual evolution of the operational tools used by the production infrastructure, including:

  • The ongoing maintenance and further development of the deployed operational tools
  • The development of the operational tools to support a national deployment model (tool regionalisation)
  • Accounting for the use of different resources within the production infrastructure
  • Providing an integrated operations portal for the staff running the production infrastructure

Involved partners

Germany - KIT-G , Fraunhofer, LUH
France - CNRS,
Greece - GRNET
Croatia - SRCE
Italy - INFN

TJRA1.1 Activity Management (4yr) (Daniele Cesini, INFN)

IGI/INFN 24PM (Daniele Cesini)

1. coordination of the tool development work;
2. definition and follow-up of the software development roadmaps, in collaboration with the Operational Tools Advisory Group;
3. representation of the activity within EGI.eu‘s management boards;
4. overseeing the testing and release preparation of software before deployment;
5. reporting on status and open issues related to the activity;
6. OTAG and USAG participation.

TJRA1.2 Maintenance and development of the deployed operational tools (4yr) (Torsten Antoni, KIT)

The reference tools are: the operations portal, the EGI Helpdesk, the Grid configuration repository (GOCDB), the accounting repository and portal, and the Service Availability Monitoring framework.

TJRA1.2.1 Operations portal

CNRS 12PM - Operations Portal (Cyril Lorphelin, ??)
The VO registration tool, the broadcast and downtime system, the periodic, operations report submission system, the regional dashboard, etc. The programme of work includes tool maintenance (bug fixing and enhancement for the failover configuration)

TJRA1.2.2 EGI Helpdesk

KIT 47PM - GGUS (Torsten, Gunter, Helmut) - A standard interface to the EGI Helpdesk instance using web services or messaging that allows its integration with independent national helpdesks.
- New ticket workflows that support the integration of different support units (e.g.middleware-related product teams, e-Infrastructure providers, etc.)
- A customisable view of the EGI Helpdesk system will be provided so that support units (located within an NGI, a VRC or a collaborating project) that do not wish to set up their own national helpdesk system can effectively use the central EGI Helpdesk.

TJRA1.2.3 Grid configuration repository: GOCDB

STFC 24PM - GOCDB (John Gordon , Gilles Mathieu??) - Change the internal data structure to support new resource type
- Port the GOCDB data storage layer from Oracle to MySQL to support non-Oracle deployments within the NGIs
- Enhancing the presentation layer to enable integration with the monitoring and operations portals, using a common interoperable toolkit for grid operations
- Schemas for storing information (e.g. about grid sites) and the definition and implementation of interfaces between static/dynamic models will be contributed to OGF for standardisation

TJRA1.2.4 Accounting repository

STFC 24PM - GOCDB (John Gordon, Gilles Mathieu)
The repository will be adapted to implement the Resource Usage Service (RUS) interface from the OGF as it defines the communication channels for the exchange of usage records, and the structure of the usage records to be exchanged.

TJRA1.2.5 Accounting portal

CSIC 12PM - (Alvaro Simon Garcia,) Maintenance work includes technical activities such as local job accounting, and the migration to a new compute power benchmark, to a new transport mechanism for the exchange of information (ActiveMQ) and to a standard interface for importing usage records from the accounting repository. The portal will be extended to include new views for displaying the usage of different resources and services (e.g. storage accounting, parallel job accounting, etc.).

TJRA1.2.6 Service Availability Monitoring

CERN 12PM (James Casey)
GRNET 12PM (??)
SRCE (Emir )12PM

Includes the following components:
1. probes: a test execution framework (based on the open source monitoring framework Nagios) and the Nagios Configuration Generator (NCG)
2. the Aggregated Topology Provider (ATP), the Metrics Description Database (MDDB), and the Metrics Results Database (MRDB)
3. the message bus to publish results and a programmatic interface
4. the visualization portal (MyEGI).

The reporting capabilities of MyEGI (e.g. GridView) will be adapted from ROCs to NGIs. Work will be needed on the Nagios probes to:

  • adapt to the capabilities of the evolving middleware
  • integrate new middleware distributions and components into the monitoring system
  • integrate new resources.

The NCG component and the site/regional Nagios will be extended to support these new local probes.

TJRA1.2.7 Metrics Portal

The portal will be maintained to include new metrics (e.g. storage, job success rate, EGI Helpdesk ticket response, etc.) as they become available from the infrastructure. Additionally, new views will be added to the Metrics Portal to satisfy the requirements of VRCs, allowing them to keep track of the production infrastructure from their own perspective.
The incorporation of new metrics will be determined by the OTAG following input from the user community.

National deployment models (first year only) (COO, EGI.eu)

TJRA1.3.1 Operations Portal