Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "SAM Tests"

From EGIWiki
Jump to navigation Jump to search
 
(4 intermediate revisions by 3 users not shown)
Line 2: Line 2:
{{Template:Tools menubar}}
{{Template:Tools menubar}}
{{TOC_right}}
{{TOC_right}}
{{Template:Deprecated}}
[[category:SAM]]
[[category:SAM]]


Line 11: Line 13:
A SAM test is:
A SAM test is:
* an '''OPERATIONS''' test which raises alarms in the Operations Portal (see [[Operations SAM tests | list]] of OPERATIONS tests)
* an '''OPERATIONS''' test which raises alarms in the Operations Portal (see [[Operations SAM tests | list]] of OPERATIONS tests)
* an '''AVAILABILITY''' test whose result is used for Resource Centre availability calculation by ACE (see the [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC_CRITICAL ROC_CRITICAL] profle)
* an '''AVAILABILITY''' test whose result is used for Resource Centre availability calculation by ACE (see the [https://mon.egi.eu/poem/admin/poem/profile/26/ ROC_CRITICAL] profle)


NOTE: '''CRITICAL''' is used ONLY to refer to one of the possible results returned by a Nagios probe.
NOTE: '''CRITICAL''' is used ONLY to refer to one of the possible results returned by a Nagios probe.


== Probe ==
== Probe ==
Probe is code which implements single or multiple tests. See list of [https://tomtools.cern.ch/confluence/display/SAMDOC/Released+Probes SAM Released Probes]
Probe is code which implements single or multiple tests.


== Metric ==
== Metric ==
Metric instances are tuples of flavour, metric name and optionally FQAN.
Metric is a synonym for tests used in the development documentation. In operations document "test" is the reference term to be used.
Metric is a synonym for tests used in the development documentation. In operations document "test" is the reference term to be used.


Line 31: Line 34:
# VO: is the name of the VO
# VO: is the name of the VO


In other words, the ''profile'' is a cartesian product of service groups and metrics, plus VO (read [https://tomtools.cern.ch/confluence/display/SAM/Model more]).
In other words, the ''profile'' is a cartesian product of service groups and metrics, plus VO.


=== Profiles ===
=== Profiles ===
* Main [[SAM#Profiles| OPS profiles]]  
* Main [[ARGO#Profiles| OPS profiles]]  
* [https://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&output=xml All POEM profiles] (OPS VO)
* See all POEM profiles]
* See all [https://grid-monitoring.egi.eu/poem/admin/poem/profile/ POEM profiles]


== Availability and reliability profile ==
== Availability and reliability profile ==
Line 55: Line 57:
Lists of tests '''to be included''' are [[Inactive SAM tests|here - Inactive SAM tests]].  
Lists of tests '''to be included''' are [[Inactive SAM tests|here - Inactive SAM tests]].  


List of '''new MW related tests''': [[MW SAM tests]].  
List of '''MW related tests''': [[MW SAM tests]].
 
List of '''operational tools tests''': [[OPS-MONITOR profile SAM tests]].


List of '''new operational tools tests''': [[OPS-MONITOR profile SAM tests]].
List of '''cloud tests''': [[Cloud SAM tests]].


== Operations tests  ==
== Operations tests  ==


Tests on Operations Portal are the ones used for raising alarms for ROD and COD teams. Operations portal does not execute these tests, but receives alarms from NGI/ROC SAM instances. Operations Portal contains list of the probes used for alarms and others are filtered.  
Tests on Operations Portal are the ones used for raising alarms for ROD and Operations teams. Operations portal does not execute these tests, but receives alarms from NGI/ROC SAM instances. Operations Portal contains list of the probes used for alarms and others are filtered.  


The procedure for adding a new probe (PROC06) can be found [[PROC06|here]].  
The procedure for adding a new probe (PROC06) can be found [[PROC06|here]].  

Latest revision as of 10:48, 19 July 2016

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Tools menu: Main page Instructions for developers AAI Proxy Accounting Portal Accounting Repository AppDB ARGO GGUS GOCDB
Message brokers Licenses OTAGs Operations Portal Perun EGI Collaboration tools LToS EGI Workload Manager



Alert.png This article is Deprecated and should no longer be used, but is still available for reasons of reference.

Terminology

SAM Test

Test is a procedure which checks specific functionality of a given service, i.e. single measurement (e.g. org.bdii.Freshness, hr.srce.RGMA-CertLifetime). Tests are executed on NGI/ROC SAM instances.

A SAM test is:

  • an OPERATIONS test which raises alarms in the Operations Portal (see list of OPERATIONS tests)
  • an AVAILABILITY test whose result is used for Resource Centre availability calculation by ACE (see the ROC_CRITICAL profle)

NOTE: CRITICAL is used ONLY to refer to one of the possible results returned by a Nagios probe.

Probe

Probe is code which implements single or multiple tests.

Metric

Metric instances are tuples of flavour, metric name and optionally FQAN. Metric is a synonym for tests used in the development documentation. In operations document "test" is the reference term to be used.

POEM Profile

A POEM is a profile as a triple of (vo, atp_groups and metric instances) where:

  1. set of atp_groups is a set of service instances defined in the Aggregated Topology Provider via a VO feed, e.g. LHCb_Site LCG.CNAF-T2.it= (service_instance1, service_instance2, etc.)
  2. set of (service_flavor, metric, fqan) tuples
    1. metric is fully qualified name of the metric (e.g. hr.srce.SRM2-CertLifetime)
    2. service_flavor is taken from ATP (e.g. CE, SRM, etc.)
    3. FQAN - voms role to use for the tests (e.g. /Role=lcgadm) - as metric1 is run with fqan1, metric1 with fqan2, etc.
  3. VO: is the name of the VO

In other words, the profile is a cartesian product of service groups and metrics, plus VO.

Profiles

Availability and reliability profile

Ace-profile.jpg

(Courtesy of P. Andrade, CERN)

Availability and Reliability Profiles are a collection of metrics/services defined for VOs (multiple profiles per VO). Each profile defines its computation algorithm. Metrics can be in different levels such as crtical, non-critcal etc.

SAM tests

Tests on NGI/ROC SAM instances are the one which frameworks includes in the SAM configuration. In addition SAM admins can add their own probes to these instances.

SAM teams proposes addition of new probes. The addition of probes is part of SAM release and thus part of the staged rollout. It was agreed that prior to release new list of probes will be briefly presented at the OMB meeting. Probes which perform internal components of SAM are not presented at OMB.

The list of tests included in the SAM release can be found here - NGI profile SAM tests.

Lists of tests to be included are here - Inactive SAM tests.

List of MW related tests: MW SAM tests.

List of operational tools tests: OPS-MONITOR profile SAM tests.

List of cloud tests: Cloud SAM tests.

Operations tests

Tests on Operations Portal are the ones used for raising alarms for ROD and Operations teams. Operations portal does not execute these tests, but receives alarms from NGI/ROC SAM instances. Operations Portal contains list of the probes used for alarms and others are filtered.

The procedure for adding a new probe (PROC06) can be found here.

The list of tests can be found here - Operations SAM tests.

Availability tests

Set of tests used for calculating availability and reliability of sites and services. The A/R calculation is related to the OLA. As in case of Operations Portal, availability calculation component receives results from NGI/ROC SAM instances.

TSA1.8 proposes a change in avail calculation (which probe results count in it) and the OMB approves.

The list of tests can be found here - Availability SAM tests.