Difference between revisions of "SAM Tests"
Line 11: | Line 11: | ||
A SAM test is: | A SAM test is: | ||
* '''OPERATIONS''' test which raises alarms in the Operations Portal (see [https://wiki.egi.eu/wiki/Operations:Operations_tests list] of OPERATIONS tests) | * '''OPERATIONS''' test which raises alarms in the Operations Portal (see [https://wiki.egi.eu/wiki/Operations:Operations_tests list] of OPERATIONS tests) | ||
* '''AVAILABILITY''' test whose result is used for availability calculation by GridView (see [https://wiki.egi.eu/wiki/Availability_and_reliability_tests | * '''AVAILABILITY''' test whose result is used for availability calculation by GridView (see [https://wiki.egi.eu/wiki/Availability_and_reliability_tests list] of AVAILABILITY tests) | ||
NOTE: '''CRITICAL''' is used ONLY to refer to one of the possible results returned by a Nagios probe. | NOTE: '''CRITICAL''' is used ONLY to refer to one of the possible results returned by a Nagios probe. |
Revision as of 14:10, 29 September 2011
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Tools menu: | • Main page | • Instructions for developers | • AAI Proxy | • Accounting Portal | • Accounting Repository | • AppDB | • ARGO | • GGUS | • GOCDB |
• Message brokers | • Licenses | • OTAGs | • Operations Portal | • Perun | • EGI Collaboration tools | • LToS | • EGI Workload Manager |
Terminology
SAM Test
Test is a procedure which checks specific functionality of a given service, i.e. single measurement (e.g. org.bdii.Freshness, hr.srce.RGMA-CertLifetime). Tests are executed on NGI/ROC SAM instances.
A SAM test is:
- OPERATIONS test which raises alarms in the Operations Portal (see list of OPERATIONS tests)
- AVAILABILITY test whose result is used for availability calculation by GridView (see list of AVAILABILITY tests)
NOTE: CRITICAL is used ONLY to refer to one of the possible results returned by a Nagios probe.
Probe
Probe is code which implements single or multiple tests.
Metric
Metric is a synonym for tests used in the development documentation. In operations document "test" is the reference term to be used.
POEM Profile
Availability and reliability profile
- An Availability and Reliability Profile defines groups of service instances (for example, defined via VO feeds) and a set of metrics (in different levels, critical, non-critical, etc.).
SAM tests
Tests on NGI/ROC SAM instances are the one which frameworks includes in the SAM configuration. In addition SAM admins can add their own probes to these instances.
SAM teams proposes addition of new probes. The addition of probes is part of SAM release and thus part of the staged rollout. It was agreed that prior to release new list of probes will be briefly presented at the OMB meeting. Probes which perform internal components of SAM are not presented at OMB.
The list of tests included in the SAM release can be found here. Lists of tests to be included are here: here.
Operations tests
Tests on Operations Portal are the ones used for raising alarms for ROD and COD teams. Operations portal does not execute these tests, but receives alarms from NGI/ROC SAM instances. Operations Portal contains list of the probes used for alarms and others are filtered.
The procedure for adding a new probe (PROC06) can be found here.
The list of tests can be found here.
Availability tests
Set of tests used for calculating availability and reliability of sites and services. The A/R calculation is related to the OLA. As in case of Operations Portal, availability calculation component receives results from NGI/ROC SAM instances.
TSA1.8 proposes a change in avail calculation (which probe results count in it) and the OMB approves.
The list of tests can be found here.