Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

EGI Core activities:2015-bidding Monitoring

From EGIWiki
Revision as of 15:00, 1 July 2015 by Psolagna (talk | contribs)
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


EGI Core services menu: Services PHASE I Services PHASE II Services PHASE III Bids Payments Travel procedure Performance



Go back to the EGI Core Activities Bidding page.

  • Service name:


Introduction

Central systems are needed for accessing and archiving infrastructure monitoring results of the services provided at many levels (Resource Centres, NGIs and EGI.EU), for the generation of service level reports, and for the central monitoring of EGI.eu operational tools and other central monitoring needs.

Infrastructure operations requires in some cases monitoring activities to be conducted centrally to support specific service and capability monitoring, like UserDN publishing in accounting records, GLUE information validation, and of software versions of deployed middleware.

Technical description

Service Availability Monitoring is a monitoring distributed system supporting EGI/NGI operations. It provides remote monitoring of services, visualization of the service status, dashboard interfacing, notification system and generation of availability and reliability reports. The central monitoring services are needed to ensure the aggregation of all EGI metric results and the access to the data at a EGI-wide scope through the central ARGO user interface. These results are exposed through the SAM central ARGO web service and its programmatic interface (XML & JSON supported). On top of that, the ARGO Reporting System generates monthly availability reports about sites and operational tools for use of the service owners. In addition to the central services described above, the activity provides also:

  • Monitoring of EGI.eu technical services: A centralised SAM installation is currently running in production to monitor the performance of EGI.eu operations tools () and user community support tools ().
  • A central Nagios service is provided to support specific operations activities like User DN publishing in accounting records, GLUE information validation and monitoring of deployed software versions. New specific monitoring needs emerge depending on the operations technical activities. The Nagios infrastructure needs to be scaled accordingly.
  • When the monitoring infrastructure of EGI will move to a full central deployment the Monitoring service will include a high availability deployment of Nagios services to monitor all the EGI services (More than 5000 services). The deployment must support the size of the infrastructure.
  • Development of nagios probes
    • Maintenance of existing operations probes
    • Development of new probes as required to support operations activities
    • Requirements gathering

Coordination

Operations

Maintenance

Support

Service level targets

Effort