New Availability Reporting
Use case 1: NGI availability reports
We would like to have NGI Availability reports. These reports should include the central services operated by the NGI, this including the regional tools and other middleware core services operated, for example:
- the VOMS service
- the top-BDII service
- the WMS service
and the operational services including
- the NGI SAM service
- the accounting portal and repositories (where available)
- the NGI operations dashboard (where available)
- the NGI helpdesk (where available)
The concerned services would be only those for which the NGI has direct administration responsibilities. For example the NGI availability reports shouldn't include WMS, VOMS etc. instances that are independently deployed by the sites to support local user communities and local projects.
It is important to consider that a NGI core services is often physically distributed across different sites, that only have the role of hosting the hardware (but no administration responsibility). This has several implications.
- If one instance is down but the rest of the cluster is up, then the "logical" service is still available. This means that the alias should be monitored for the sake of availability computation, not the individual physical instances
- The site availability should not be impacted by the unavailability of physical instances of a service operated by the NGI.
This use case could be satisfied by: - grouping NGI services into a dedicated NGI site (in case of a distributed service, only the alias is registered) - create a NGI availability profile just applicable to the "NGI" site, where the availability of the site is computed as the AND composition of the availability of all registered services. Note that if some (optional) services are NOT available, then UP should be returned, i.e. the profile should include a mandatory set of services (e.g. regional SAM) and a complementary set of optional services (e.g. the local helpdesk, VOMS, etc.)
Use case 2: EGI.eu availability
We would like to measure the overall availability of EGI.eu services. Example of such services are:
Operational - accounting portal and accounting repository - GOCDB - operations portal - central MyEGI - message bus - GGUS - security Nagios and Pakiti - security dashboard - DTEAM VOMS - OPS VOMS
Technical - EGI repository - RT
User - application database - training database
For each category above, for example operations, EGI.eu operations service is UP if (GGUS is UP) AND (Operations Portal is UP) AND ... AND (GOCDB is UP).
In GOCDB the VIRTUALOPS ROC could be evolved into the EGI.eu ROC, which includes all EGI.eu services. An new availability profile for EGI.eu is needed.
Use case 3
The usage of the "logical" site could be used to represent a distributed Resource Centre (like the NDGF T1). At the moment it is a single site in GOCDB associated to country X. This use case is mentioned here for the records of the discussion. It is not a crticial use case for the moment.