Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

EGI-InSPIRE:WP7 Operational Tools DoW summary

From EGIWiki
Jump to navigation Jump to search

wp7-jra1

This activity provides for the continual evolution of the operational tools used by the production infrastructure, including:

  • The ongoing maintenance and further development of the deployed operational tools
  • The development of the operational tools to support a national deployment model (tool regionalisation)
  • Accounting for the use of different resources within the production infrastructure
  • Providing an integrated operations portal for the staff running the production infrastructure

Involved partners

Germany - KIT-G
Spain - CSIC
France - CNRS
Greece - GRNET
Croatia - SRCE
Italy - INFN
UK - STFC
CERN

TJRA1.1 Activity Management (4yr)

IGI/INFN 24PM (Daniele Cesini)

1. coordination of the tool development work;
2. definition and follow-up of the software development roadmaps, in collaboration with

  the Operational Tools Advisory Group;

3. representation of the activity within EGI.eu‘s management boards;
4. overseeing the testing and release preparation of software before deployment;
5. reporting on status and open issues related to the activity.

TJRA1.2 Maintenance and development of the deployed operational tools (4yr)

The reference tools are: the operations portal, the EGI Helpdesk, the Grid configuration repository (GOCDB), the accounting repository and portal, and the Service Availability Monitoring framework.

TJRA1.2.1 Operations portal

CNRS 12PM - Operations Portal (Cyril Lorphelin, Gilles Mathieu ??)
The VO registration tool, the broadcast and downtime system, the periodic, operations report submission system, the regional dashboard, etc. The programme of work includes tool maintenance (bug fixing and enhancement for the failover configuration)

TJRA1.2.2 EGI Helpdesk

KIT 47PM - GGUS (Torsten, Gunter, Helmut) - A standard interface to the EGI Helpdesk instance using web services or messaging that allows its integration with independent national helpdesks.
- New ticket workflows that support the integration of different support units (e.g.middleware-related product teams, e-Infrastructure providers, etc.)
- A customisable view of the EGI Helpdesk system will be provided so that support units (located within an NGI, a VRC or a collaborating project) that do not wish to set up their own national helpdesk system can effectively use the central EGI Helpdesk.

TJRA1.2.3 Grid configuration repository: GOCDB

STFC 24PM - GOCDB (John Gordon ??) - Change the internal data structure to support new resource type
- Port the GOCDB data storage layer from Oracle to MySQL to support non-Oracle deployments within the NGIs
- Enhancing the presentation layer to enable integration with the monitoring and operations portals, using a common interoperable toolkit for grid operations
- Schemas for storing information (e.g. about grid sites) and the definition and implementation of interfaces between static/dynamic models will be contributed to OGF for standardisation

TJRA1.2.4 Accounting repository

STFC 24PM - GOCDB (John Gordon ??)
The repository will be adapted to implement the Resource Usage Service (RUS) interface from the OGF as it defines the communication channels for the exchange of usage records, and the structure of the usage records to be exchanged.

TJRA1.2.5 Accounting portal

CSIC 12PM - (???) Maintenance work includes technical activities such as local job accounting, and the migration to a new compute power benchmark, to a new transport mechanism for the exchange of information (ActiveMQ) and to a standard interface for importing usage records from the accounting repository. The portal will be extended to include new views for displaying the usage of different resources and services (e.g. storage accounting, parallel job accounting, etc.).

TJRA1.2.6 Service Availability Monitoring

CERN 12PM
GRNET 12PM
SRCE 12PM

Includes the following components:
1. probes: a test execution framework (based on the open source monitoring framework Nagios) and the Nagios Configuration Generator (NCG)
2. the Aggregated Topology Provider (ATP), the Metrics Description Database (MDDB), and the Metrics Results Database (MRDB)
3. the message bus to publish results and a programmatic interface
4. the visualization portal (MyEGI).

The reporting capabilities of MyEGI (e.g. GridView) will be adapted from ROCs to NGIs. Work will be needed on the Nagios probes to:

  • adapt to the capabilities of the evolving middleware
  • integrate new middleware distributions and components into the monitoring system
  • integrate new resources.

The NCG component and the site/regional Nagios will be extended to support these new local probes.

TJRA1.2.7 Metrics Portal

???
The portal will be maintained to include new metrics (e.g. storage, job success rate, EGI Helpdesk ticket response, etc.) as they become available from the infrastructure. Additionally, new views will be added to the Metrics Portal to satisfy the requirements of VRCs, allowing them to keep track of the production infrastructure from their own perspective.
The incorporation of new metrics will be determined by the OTAG following input from the user community.