From EGIWiki
Jump to: navigation, search
Main operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security

Tools menu: Main page Instructions for developers AAI Proxy Accounting Portal Accounting Repository AppDB ARGO GGUS GOCDB
Message brokers Licenses OTAGs Operations Portal Perun EGI Collaboration tools LToS EGI Workload Manager

Alert.png This article is Deprecated and should no longer be used, but is still available for reasons of reference.


The Service Availability Monitoring (SAM) system is used to monitor the resources within the production infrastructure. SAM monitoring data is used for calculation of availability and reliability of grid sites.

SAM Nagios probes re-factoring TF

SAM tool instances



SAM Tests terminology and types

SAM Test

Test is a procedure which checks specific functionality of a given service, i.e. single measurement (e.g. org.bdii.Freshness, hr.srce.RGMA-CertLifetime). Tests are executed on NGI/ROC SAM instances.

A SAM test is:

NOTE: CRITICAL is used ONLY to refer to one of the possible results returned by a Nagios probe.


Probe is code which implements single or multiple tests.


Metric instances are tuples of flavour, metric name and optionally FQAN (POEM documentation). Metric is a synonym for tests used in the development documentation. In operations document "test" is the reference term to be used.

POEM Profile

POEM (Profile Management Database, former Metric Description Database) aims to describe existing metrics and group (profiles) them in order to run tests. In addition it should define actions that can either configure the way the availability and reliability is computed or allow notifications to messaging system.

A POEM is a profile as a triple of (vo, atp_groups and metric instances) where:

  1. set of atp_groups is a set of service instances defined in the Aggregated Topology Provider via a VO feed, e.g. LHCb_Site (service_instance1, service_instance2, etc.)
  2. set of (service_flavor, metric, fqan) tuples
    1. metric is fully qualified name of the metric (e.g. hr.srce.SRM2-CertLifetime)
    2. service_flavor is taken from ATP (e.g. CE, SRM, etc.)
    3. FQAN - voms role to use for the tests (e.g. /Role=lcgadm) - as metric1 is run with fqan1, metric1 with fqan2, etc.
  3. VO: is the name of the VO

In other words, the profile is a cartesian product of service groups and metrics, plus VO.

New Administration Guide

Please use this guides to install SAM Nagios "Update 23" instance - SAMUpdate23

Administrator guides


Developers guides


This section collects all information related to the SAM team support activities.

SAM-related Procedures


Personal tools