Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

2019-bidding/monitoring

From EGIWiki
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


EGI Core services menu: Services PHASE I Services PHASE II Services PHASE III Bids Payments Travel procedure Performance



Go back to the EGI Core Activities Bidding page.

Service name: Monitoring

Introduction

Monitoring services archive and provide access to the monitoring data of the services of the whole infrastructure. These data are accessible at many levels (Resource Centres, Operations Centres and EGI Services), and it is used for the generation of service level reports, and for the monitoring of the EGI central services and other central monitoring needs. Infrastructure operations require in some cases monitoring activities created ad-hoc to support specific operational activities, for example UserDN publishing in accounting records and of software versions of deployed middleware.

Technical description

The Monitoring service consist of a distributed system supporting EGI/NGI operations. It provides remote monitoring of services, visualisation of the services status, Operations portal interfacing and generation of availability and reliability reports. The central monitoring services are needed to ensure the aggregation of all EGI metric results and the access to the data at an EGI-wide scope through the central monitoring user interface. These results should be exposed through a central web service and its programmatic interface. A reporting system should generate monthly availability reports about sites and operational tools for use of the service owners. In addition to the central services described above, the activity provides also:

  • Maintenance of existing operations probes and deployment of new ones as required to support operations activities as requested by EGI Operations coordination
  • A notification service to inform Service Providers for possible errors/problems.
  • Requirements gathering

Coordination

The activity will have to coordinate with:

  • EGI Operations for the the support of the operational activities with monitoring data, and for the planning of new releases and updates of the monitoring system
  • With the service developers to support them in the development of probes for their services
  • With the other operational tools where interaction is necessary (for example messaging service, service registry, Operations Portal)

Operations

  • Daily running of the system
    • Monitor Services (Sites, NGIs, Service_Groups)
    • Availability/Reliability computation engine
    • User interface to browse the data
  • Provisioning of a high availability configuration
    • Min. two ARGO Monitoring boxes for the monitoring of the services, deployed in different locations
  • The monitoring infrastructure must allow to test new probes without affecting the production monitoring
  • Creating an Availability and Continuity Plan and implementing countermeasures to mitigate the risks defined in the related risk assessment
  • Documentation

Software as a service

In the bid, please provide also information about the possibility to provide the service to external consumers as a Software as a Service. If the provisioning of the activity as a SaaS implies additional effort or other costs, please report these costs separately, not as part of the overall budget of the bid.

Maintenance

This activity includes:

  • bug fixing
  • maintenance of probes to test the functionality of the service itself
  • integration (configuration and packaging) of new probes into ARGO
  • coordination of software maintenance activities with other technology providers of the Operational tools part of the EGI Core Infrastructure or remote systems deployed by integrated and peer infrastructures that interoperate with the central EGI components of the system (on a best effort basis for the peer infrastructures providers interoperability).
  • Producing the monthly reports on the performances of the resource centres, NGI central services and EGI central tools requirements gathering
  • documentation

Software Compliance

  • Unless explicitly agreed, software being used and developed to provide the service should:
    • Be licensed under an open source and permissive license (like MIT, BSD, Apache 2.0,...).
      • The license should provide unlimited access rights to the EGI Foundation and EGI federation member organisations.
    • Have source code publicly available via a public source code repository (if needed a mirror can be put in place under the EGI organisation in GitHub.) All releases should be appropriately tagged.
    • Adopt best practices:
      • Defining and enforcing code style guidelines.
      • Using Semantic Versioning.
      • Using a Configuration Management frameworks such as Ansible.
      • Taking security aspects into consideration through at every point in time.
      • Having automated testing in place.
      • Using code reviewing.
      • Treating documentation as code.
        • Documentation should be available for Developers, administrators and end users.

IT Service Management compliance

  • Key staff who deliver services should have foundation or basic level ITSM training and certification.
    • ITSM training and certification could include FitSM, ITIL, ISO 20000 etc.
  • Key staff and service owners should have advanced/professional training and certification covering the key processes for their services.
  • Providers should have clear interfaces with the EGI SMS processes and provide the required information.
  • Providers should commit to improving their management system used to support the services they provide.

Support

Support through the EGI helpdesk about the functionality of the service and the monitoring data gathered.

Support hours: eight hours a day , Monday to Friday – excluding public holidays of the hosting organization.

Service level targets

  • Monitoring probes submission engines must be available at least 99% on a monthly basis
  • User interfaces to browse monitoring results must be available at least 95% on a monthly basis

Effort (EGI-related activities)

Bids planning a effort of about 18 Person Months/year (STC) would allow these services and activities to be addressed appropriately. Effort may be provided as part of either the INFRAEOSC-07 and INFRAEOSC-03 projects.

Effort (EOSC-related activities)

Partners are encouraged to submit details of activities and proposed costing of effort for EOSC Hub related activities. This may include activities related to development of new functionality required by EOSC communities in addition to activities delivering services to these communities.