Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "SAM Tests"

From EGIWiki
Jump to navigation Jump to search
Line 31: Line 31:
The procedure for adding a new probe can be found [[Operations:Procedure_for_setting_Nagios_test_an_operations_test|here]].
The procedure for adding a new probe can be found [[Operations:Procedure_for_setting_Nagios_test_an_operations_test|here]].


The list of tests can be found [[Operations:Operations_tests|here]].
The list of tests (PROC06) can be found [[PROC06|here]].


=== Availability tests ===
=== Availability tests ===

Revision as of 07:14, 2 April 2011

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Tools menu: Main page Instructions for developers AAI Proxy Accounting Portal Accounting Repository AppDB ARGO GGUS GOCDB
Message brokers Licenses OTAGs Operations Portal Perun EGI Collaboration tools LToS EGI Workload Manager


Terminology

SAM tests are executed on NGI/ROC SAM instances.

A SAM test is called

  • OPERATIONS test if it raises alarms in the Operations Portal
  • AVAILABILITY test if result is used for availability calculation by GridView

NOTE: CRITICAL is used ONLY to refer to one of the possible results returned by a Nagios probe.

Within SAM/Nagios world we use the following naming:

  1. Test- procedure which checks specific functionality of a given service, i.e. single measurement (e.g. org.bdii.Freshness, hr.srce.RGMA-CertLifetime)
  2. Probe - code which implements single or multiple tests.
  3. Metric - synonym for tests used in the development documentation. In operations document "test" is the reference term to be used.
  4. Profile: an availability and reliability profile defines groups of service instances (for example, defined via VO feeds) and a set of metrics (in different levels, critical, non-critical, etc.).

SAM tests

Tests on NGI/ROC SAM instances are the one which frameworks includes in the SAM configuration. In addition SAM admins can add their own probes to these instances.

SAM teams proposes addition of new probes. The addition of probes is part of SAM release and thus part of the staged rollout. It was agreed that prior to release new list of probes will be briefly presented at the OMB meeting. Probes which perform internal components of SAM are not presented at OMB.

Operations tests

Tests on Operations Portal are the ones used for raising alarms for ROD and COD teams. Operations portal does not execute these tests, but receives alarms from NGI/ROC SAM instances. Operations Portal contains list of the probes used for alarms and others are filtered.

The procedure for adding a new probe can be found here.

The list of tests (PROC06) can be found here.

Availability tests

Set of tests used for calculating availability and reliability of sites and services. The A/R calculation is related to the OLA. As in case of Operations Portal, availability calculation component receives results from NGI/ROC SAM instances.

TSA1.8 proposes a change in avail calculation (which probe results count in it) and the OMB approves.

The list of tests can be found here.