Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

EGI-InSPIRE:WP7 Operational Tools DoW summary

From EGIWiki
Jump to navigation Jump to search

wp7-jra1

This activity provides for the continual evolution of the operational tools used by the production infrastructure, including:

  • The ongoing maintenance and further development of the deployed operational tools
  • The development of the operational tools to support a national deployment model (tool regionalisation)
  • Accounting for the use of different resources within the production infrastructure
  • Providing an integrated operations portal for the staff running the production infrastructure

Involved partners

Germany - KIT-G , Fraunhofer, LUH
Spain - CSIC, FCTSG
France - CNRS,
Greece - GRNET
Croatia - SRCE
Italy - INFN
UK - STFC
CERN

TJRA1.1 Activity Management (4yr) (Daniele Cesini, INFN)

IGI/INFN 24PM (Daniele Cesini)

1. coordination of the tool development work;
2. definition and follow-up of the software development roadmaps, in collaboration with the Operational Tools Advisory Group;
3. representation of the activity within EGI.eu‘s management boards;
4. overseeing the testing and release preparation of software before deployment;
5. reporting on status and open issues related to the activity;
6. OTAG and USAG participation.

TJRA1.2 Maintenance and development of the deployed operational tools (4yr) (Torsten Antoni, KIT)

The reference tools are: the operations portal, the EGI Helpdesk, the Grid configuration repository (GOCDB), the accounting repository and portal, and the Service Availability Monitoring framework.

TJRA1.2.1 Operations portal

CNRS 12PM - Operations Portal (Cyril Lorphelin, ??)
The VO registration tool, the broadcast and downtime system, the periodic, operations report submission system, the regional dashboard, etc. The programme of work includes tool maintenance (bug fixing and enhancement for the failover configuration)

TJRA1.2.2 EGI Helpdesk

KIT 47PM - GGUS (Torsten Antoni, Gunter Grein, Helmut Dres) - A standard interface to the EGI Helpdesk instance using web services or messaging that allows its integration with independent national helpdesks.
- New ticket workflows that support the integration of different support units (e.g.middleware-related product teams, e-Infrastructure providers, etc.)
- A customisable view of the EGI Helpdesk system will be provided so that support units (located within an NGI, a VRC or a collaborating project) that do not wish to set up their own national helpdesk system can effectively use the central EGI Helpdesk.

TJRA1.2.3 Grid configuration repository: GOCDB

STFC 24PM - GOCDB (John Gordon , Gilles Mathieu??) - Change the internal data structure to support new resource type
- Port the GOCDB data storage layer from Oracle to MySQL to support non-Oracle deployments within the NGIs
- Enhancing the presentation layer to enable integration with the monitoring and operations portals, using a common interoperable toolkit for grid operations
- Schemas for storing information (e.g. about grid sites) and the definition and implementation of interfaces between static/dynamic models will be contributed to OGF for standardisation

TJRA1.2.4 Accounting repository

STFC 24PM - (John Gordon, Gilles Mathieu)
The repository will be adapted to implement the Resource Usage Service (RUS) interface from the OGF as it defines the communication channels for the exchange of usage records, and the structure of the usage records to be exchanged.

TJRA1.2.5 Accounting portal

CSIC 12PM - (Alvaro Simon Garcia,) Maintenance work includes technical activities such as local job accounting, and the migration to a new compute power benchmark, to a new transport mechanism for the exchange of information (ActiveMQ) and to a standard interface for importing usage records from the accounting repository. The portal will be extended to include new views for displaying the usage of different resources and services (e.g. storage accounting, parallel job accounting, etc.).

TJRA1.2.6 Service Availability Monitoring

CERN 12PM (James Casey, David Horat)
GRNET 12PM (Kostas Koumantaros)
SRCE 12PM (Emir Imamagic)

Includes the following components:
1. probes: a test execution framework (based on the open source monitoring framework Nagios) and the Nagios Configuration Generator (NCG)
2. the Aggregated Topology Provider (ATP), the Metrics Description Database (MDDB), and the Metrics Results Database (MRDB)
3. the message bus to publish results and a programmatic interface
4. the visualization portal (MyEGI).

The reporting capabilities of MyEGI (e.g. GridView) will be adapted from ROCs to NGIs. Work will be needed on the Nagios probes to:

  • adapt to the capabilities of the evolving middleware
  • integrate new middleware distributions and components into the monitoring system
  • integrate new resources.

The NCG component and the site/regional Nagios will be extended to support these new local probes.

TJRA1.2.7 Metrics Portal

CERN 12PM (James Casey, David Horat)
The portal will be maintained to include new metrics (e.g. storage, job success rate, EGI Helpdesk ticket response, etc.) as they become available from the infrastructure. Additionally, new views will be added to the Metrics Portal to satisfy the requirements of VRCs, allowing them to keep track of the production infrastructure from their own perspective.
The incorporation of new metrics will be determined by the OTAG following input from the user community.

National deployment models (first year only) (COO, EGI.eu)

This task includes effort for tool ―regionalisation‖, i.e. software programming activities needed by the operational tools (or tool components) that currently can only run centrally, to adapt them to a fully distribution model for deployment at the NGI level.

TJRA1.3.1 Operations Portal

CNRS 3PM

  • from a federation-based to a NGI-based structure of operations
  • the central operation portal will be a catch-all instance providing NGI-customised views at a central level

TJRA1.3.2 GOCDB

STFC 3PM This task will provide an instance of the GOCDB service that could be deployed nationally (going beyond the regional deployment model covered in TJRA1.2.3) and information federated into a central instance.

TJRA1.3.3 Accounting portal

CSIC 3PM The central accounting portal will be extended to support regional national deployments that allow NGIs to operate a full and standalone national accounting infrastructure using a national deployment of the accounting repository

TJRA1.3.4 Service Availability Monitoring

CERN 6PM
SRCE 3PM
The visualization portal MyEGI will be adapted to a new pluggable framework, and provide EGI-specific NGI views. MyEGI will be the place to see availability, reliability, service status of NGI resources. 'GridMap style' TreeMap views will be added, showing both regional and global views of the stored data.

TJRA1.4 Accounting for different resource types (from the 2nd yr) (John Gordon, STFC)

TJRA1.4.1 Billing

TJRA1.4.2 Accounting of application usage

TJRA1.4.3 Accounting of data usage

TJRA1.4.4 Accounting of capacity and cloud computing usage



TJRA1.5: Integrated Operations Portal (first 3 yr) (Cyril L'Orphelin, CNRS)

CNRS 53PM

  • will be ported to the Symfony open-source web-development
  • extended to support messaging (downtime and broadcast tools)
  • harmonized with other portal framework
  • pluggable through portlet/widget technologies into other portals such as scientific gateways, myEGI, iGoogle, etc
  • incorporates other DCIs technologies through the development of new plug-ins and procedures
  • GOCDB and CIC portal will also be harmonized at the front-end and back-end level

see DoW pag 91

First year Milestones and Deliverables

Due Month 1:
MS701
Define the roadmap for the CIC Operations Portal taking into account the CIC Operations regionalisation of the Portal work plan operational tools and new resource types being used on the infrastructure.

MS702
A report describing the different operational tool product team’s development infrastructure and procedures product teams including details of their development infrastructure.

Due Month 2:
MS703
Specify a work plan Operational Tool identifying the upcoming releases and associated plan functionality.

Due Month 3:
MS704
A public report describing the roadmap for all the deployed operational tools over the next 18 months defining release tools and deployment dates.

Due Month 11:
D7.1
Annual Report on Operational Tool maintenance and development activity