Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

2016-bidding/monitoring

From EGIWiki
Revision as of 17:10, 19 October 2016 by Psolagna (talk | contribs) (→‎Support)
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


EGI Core services menu: Services PHASE I Services PHASE II Services PHASE III Bids Payments Travel procedure Performance



Go back to the EGI Core Activities Bidding page.

  • Service name: Monitoring (ARGO)

Introduction

Monitoring services archive and provide access to the infrastructure monitoring results of the services. These data are accessible at many levels (Resource Centres, Operations Centres and EGI.EU), and it is used for the generation of service level reports, and for the central monitoring of EGI.eu operational tools and other central monitoring needs. Infrastructure operations require in some cases monitoring activities created ad-hoc to support specific operational activities, for example UserDN publishing in accounting records and of software versions of deployed middleware.

Technical description

Monitoring (SAM) is a centralized system supporting EGI/NGI operations. It provides remote monitoring of services, computation of the monitoring data, visualization of the service status, dashboard interfacing, notification system and generation of availability and reliability reports. The monitoring services ensure the aggregation of all EGI metric results and the access to the data at a EGI-wide scope through the central ARGO user interface. These results are exposed through the central ARGO web service and its programmatic interface (XML & JSON supported). On top of that, the ARGO Reporting System generates monthly availability reports about sites and operational tools for use of the service owners. In addition to the central services described above, the activity provides also:

  • Monitoring probes submission engines: a distributed, high available centralised installation is required to submit and run the monitoring probes for the availability computation profiles and for the other profiles required by the EGI operations.The deployment must support the size of the infrastructure.
  • Development of nagios probes:
  • Maintenance of existing operations probes
  • Development of new probes as required to support operations activities
  • Requirements gathering


Coordination

The activity will have to coordinate with:

  • EGI Operations for the the support of the operational activities with monitoring data, and for the planning of new releases and updates of the monitoring system
  • With the service developers to support them in the development of probes for their services
  • With the other operational tools where interaction is necessary (for example messaging network, GOCDB)


Operations

  • Daily running of the system
    • Monitoring probes submission enginges
    • Availability/Reliability computation engine
    • User interface to browse the data
  • Provisioning of a high availability configuration
    • Min. two distributed reduntant instances of monitoring engines Nagios boxes for the monitoring of the services.
    • Multiple consumers of monitoring data
  • The monitoring infastructure must allow to test new probes without affecting the production monitoring

Maintenance

Support

Support through the EGI helpdesk about the functionality of the service and the monitoring data gathered.

Support hours: eight hours a day , Monday to Friday – excluding public holidays of the hosting organization.

Service level targets

  • Monitoring probes submission engines must be available at least 99% on a monthly basis
  • User interfaces to browse monitoring results must be avialable at least 95% on a monthly basis

Effort