Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

2019-bidding/monitoring

From EGIWiki
Revision as of 10:37, 6 November 2019 by Mviljoen (talk | contribs)
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


EGI Core services menu: Services PHASE I Services PHASE II Services PHASE III Bids Payments Travel procedure Performance



Go back to the EGI Core Activities Bidding page.

to clarify the effort

Service name: Monitoring (ARGO)

Introduction

Monitoring services archive and provide access to the infrastructure monitoring results of the services. These data are accessible at many levels (Resource Centres, Operations Centres and EGI.EU), and it is used for the generation of service level reports, and for the central monitoring of EGI.eu operational tools and other central monitoring needs. Infrastructure operations require in some cases monitoring activities created ad-hoc to support specific operational activities, for example UserDN publishing in accounting records and of software versions of deployed middleware.

Technical description

Monitoring (ARGO) is a distributed system supporting EGI/NGI operations. It provides remote monitoring of services, visualization of the service status, Operations portal interfacing and generation of availability and reliability reports. The central monitoring services are needed to ensure the aggregation of all EGI metric results and the access to the data at an EGI-wide scope through the central ARGO user interface. These results are exposed through the central ARGO web service and its programmatic interface (JSON supported). On top of that, the ARGO Reporting System generates monthly availability reports about sites and operational tools for use of the service owners. In addition to the central services described above, the activity provides also:

  • Monitoring of EGI.eu technical services: a centralised installation in high availability is currently running in production to monitor the performance of EGI.eu operations tools and user community support tools.
  • Maintenance of existing operations probes and deployment of new ones as required to support operations activities as requested by EGI Operations coordination
  • A notification service to inform Service Providers for possible errors/problems.
  • Requirements gathering

Coordination

The activity will have to coordinate with:

  • EGI Operations for the the support of the operational activities with monitoring data, and for the planning of new releases and updates of the monitoring system
  • With the service developers to support them in the development of probes for their services
  • With the other operational tools where interaction is necessary (for example messaging network, GOCDB)

Operations

  • Daily running of the system
    • Monitor Services (Sites, NGIs, Service_Groups)
    • Availability/Reliability computation engine
    • User interface to browse the data
  • Provisioning of a high availability configuration
    • Min. two ARGO Monitoring boxes for the monitoring of the services, deployed in different locations
  • The monitoring infrastructure must allow to test new probes without affecting the production monitoring
  • Creating an Availability and Continuity Plan and implementing countermeasures to mitigate the risks defined in the related risk assessment
  • Documentation

Software as a service

In the bid, please provide also information about the possibility to provide the service to external consumers as a Software as a Service. If the provisioning of the activity as a SaaS implies additional effort or other costs, please report these costs separately, not as part of the overall budget of the bid.

Maintenance

This activity includes:

  • bug fixing
  • maintenance of probes to test the functionality of the service
  • integration (configuration and packaging) of new probes into ARGO
  • coordination of software maintenance activities with other technology providers of the Operational tools part of the EGI Core Infrastructure or remote systems deployed by integrated and peer infrastructures that interoperate with the central EGI components of the system (on a best effort basis for the peer infrastructures providers interoperability).
  • Producing the monthly reports on the performances of the resource centres, NGI central services and EGI central tools requirements gathering
  • documentation

Software Compliance

  • Unless explicitly agreed, software being used and developed to provide the service should:
    • Be licensed under an open source and permissive license (like MIT, BSD, Apache 2.0,...).
      • The license should provide unlimited access rights to the EGI community.
    • Have source code publicly available via a public source code repository (if needed a mirror can be put in place under the EGI organisation in GitHub.) All releases should be appropriately tagged.
    • Adopt best practices:
      • Defining and enforcing code style guidelines.
      • Using Semantic Versioning.
      • Using a Configuration Management frameworks such as Ansible.
      • Taking security aspects into consideration through at every point in time.
      • Having automated testing in place.
      • Using code reviewing.
      • Treating documentation as code.
        • Documentation should be available for Developers, administrators and end users.

IT Service Management compliance

  • Key staff who deliver services should have foundation or basic level ITSM training and certification.
    • ITSM training and certification could include FitSM, ITIL, ISO 20000 etc.
  • Key staff and service owners should have advanced/professional training and certification covering the key processes for their services.
  • Providers should have mature and well maintained ITSM process that are key to support the services they provide.

Support

Support through the EGI helpdesk about the functionality of the service and the monitoring data gathered.

Support hours: eight hours a day , Monday to Friday – excluding public holidays of the hosting organization.

Service level targets

  • Monitoring probes submission engines must be available at least 99% on a monthly basis
  • User interfaces to browse monitoring results must be available at least 95% on a monthly basis

Effort

Bids planning a effort between 24.5 and 28 Person Months/year (STC) would allow these services and activities to be addressed appropriately.