Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI Core activities:2015-bidding Monitoring"

From EGIWiki
Jump to navigation Jump to search
Line 53: Line 53:
*Response to incident records in GGUS within support hours: Medium (see https://wiki.egi.eu/wiki/FAQ_GGUS-PT-QoS-Levels#Medium_service)  
*Response to incident records in GGUS within support hours: Medium (see https://wiki.egi.eu/wiki/FAQ_GGUS-PT-QoS-Levels#Medium_service)  
=Effort=
=Effort=
The estimated effort to efficiently provide this service is between: 20-24 PM/year
Bids requesting a contribution between 20 and 24 PM/year would allow these services and activities to be addressed appropriately.

Revision as of 16:34, 1 July 2015

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


EGI Core services menu: Services PHASE I Services PHASE II Services PHASE III Bids Payments Travel procedure Performance



Go back to the EGI Core Activities Bidding page.

  • Service name: Monitoring services


Introduction

Central systems are needed for accessing and archiving infrastructure monitoring results of the services provided at many levels (Resource Centres, Operations Centres and EGI.EU), for the generation of service level reports, and for the central monitoring of EGI.eu operational tools and other central monitoring needs.

Infrastructure operations require in some cases monitoring activities to be conducted centrally to support specific service and capability monitoring, like UserDN publishing in accounting records, GLUE information validation, and of software versions of deployed middleware.

Technical description

Monitoring (SAM) is distributed system supporting EGI/NGI operations. It provides remote monitoring of services, visualization of the service status, dashboard interfacing, notification system and generation of availability and reliability reports. The central monitoring services are needed to ensure the aggregation of all EGI metric results and the access to the data at a EGI-wide scope through the central ARGO user interface. These results are exposed through the central ARGO web service and its programmatic interface (XML & JSON supported). On top of that, the ARGO Reporting System generates monthly availability reports about sites and operational tools for use of the service owners. In addition to the central services described above, the activity provides also:

  • Monitoring of EGI.eu technical services: a centralised SAM installation is currently running in production to monitor the performance of EGI.eu operations tools and user community support tools.
  • A central Nagios service is provided to support specific operations activities like User DN publishing in accounting records, GLUE information validation and monitoring of deployed software versions. New specific monitoring needs will emerge depending on the operations technical activities, and the central monitoring Nagios will be configured to address them. The Nagios infrastructure needs to be scaled accordingly.
  • When the monitoring infrastructure of EGI will move to a full central deployment, the Monitoring service will include a high availability deployment of Nagios services to monitor the entire EGI Feration (more than 5000 services). The deployment must support the size of the infrastructure.
  • Development of nagios probes:
    • Maintenance of existing operations probes
    • Development of new probes as required to support operations activities
    • Requirements gathering

Coordination

This activity is responsible for the coordination of the system operations and upgrade activities with those partners that are in charge of operating other systems that depend on it.

Operations

  • Daily running of the system
  • Provisioning of a high availability configuration
    • Min. three Nagios boxes for the monitoring of the services. The Nagios’es cannot be deployed all in the same site.
    • Multiple consumers of monitoring data
  • A test infrastructure to verify interoperability and the impact of software upgrades on depending systems
  • Deployment in production of the releases of the monitoring system (ARGO) produced in EGI-Engage

Maintenance

This activity includes:

  • bug fixing, proactive maintenance, improvement of the system
  • maintenance of probes to test the functionality of the service
  • integration (configuration and packaging) of new probes into SAM
  • coordination of software maintenance activities with other technology providers that provide software for the EGI Core Infrastructure or remote systems deployed by integrated and peer *infrastructures that interoperate with the central EGI components of the system.
  • maintenance of probes to test the functionality of the service
  • Producing the monthly reports on the performances of the resource centres, NGI central services and EGI central tools
  • requirements gathering
  • documentation

Support

Support through the EGI helpdesk about the functionality of the service and the monitoring data gathered.

Support hours: eight hours a day , Monday to Friday – excluding public holidays of the hosting organization.

Service level targets

Effort

Bids requesting a contribution between 20 and 24 PM/year would allow these services and activities to be addressed appropriately.