Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "SAM"

From EGIWiki
Jump to navigation Jump to search
Line 15: Line 15:
= Documentation =
= Documentation =
== Introduction ==
== Introduction ==
===SAM tests===
* [[SAM Tests|SAM Tests terminology and types]]
===SAM profiles ===
'''Resource Centre AVAILABILITY/RELIABILITY COMPUTATION'''
* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC_CRITICAL ROC_CRITICAL] - the profile for Availability/Reliability computation of EGI Resource Centres (OPS VO). It replaces WLCG_CREAM_LCGCE_CRITICAL as of 01 Jan 2012.
'''FOR GENERATION OF ALARMS IN THE OPERATIONS DASHBOARD IN CASE OF FAILURE'''
*[http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC_OPERATORS ROC_OPERATORS]
'''ALL METRICS THAT NCG CAN USE TO CONFIGURE A SAM NGI'''
* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC ROC] - all the possible metrics that NCG can use to configure NGI Nagios.
NOTE WELL: starting from SAMUpdate-17 the removal of a metric from ROC profile will immediately cause the removal of the metric from all NGI Nagios instances, i.e. tests will no longer be executed.
=== MyEGI ===
=== MyEGI ===
* [https://tomtools.cern.ch/confluence/display/SAM/MyEGI/ MyEGI documentation]
* [https://tomtools.cern.ch/confluence/display/SAM/MyEGI/ MyEGI documentation]
Line 39: Line 52:
== Probes ==
== Probes ==
* [https://tomtools.cern.ch/confluence/display/SAMDOC/Probes SAM Probes]
* [https://tomtools.cern.ch/confluence/display/SAMDOC/Probes SAM Probes]
* [[MW_Nagios_tests| Probes for the detection of unsupported gLite 3.1/3.2 products or end-points in GOCDB associated to retired service types]]
* [[MW_SAM_tests| EGI probes for software version monitoring]]
* Probe [https://tomtools.cern.ch/confluence/display/SAMDOC/Probes+Development development policy]
* Probe [https://tomtools.cern.ch/confluence/display/SAMDOC/Probes+Development development policy]


== Developers guides ==
== Developers guides ==
Line 48: Line 62:
[https://tomtools.cern.ch/confluence/display/SAMDOC/Support FAQs and Troubleshooting guides]
[https://tomtools.cern.ch/confluence/display/SAMDOC/Support FAQs and Troubleshooting guides]


=Tests and probes=
= Check this =
* [[SAM Tests|SAM Tests terminology and types]]
 
 
* [[MW_SAM_tests| EGI probes for software version monitoring]]
* [https://twiki.cern.ch/twiki/bin/view/EMI/NagiosProbes EMI Nagios] and [https://savannah.cern.ch/task/?21823 status] (ARC, dCache, gLite, UNICORE)
<!--
** [https://twiki.cern.ch/twiki/bin/view/EMI/NagiosProbes EMI Nagios probes] ([[EMI Nagios probes|old page instance]])
** [https://tomtools.cern.ch/confluence/display/SAM/Probes+org.sam Probes] from org.SAM package
-->
 
=Profiles=
 
 
'''Resource Centre AVAILABILITY/RELIABILITY COMPUTATION'''
* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC_CRITICAL ROC_CRITICAL] - the profile for Availability/Reliability computation of EGI Resource Centres (OPS VO). It replaces WLCG_CREAM_LCGCE_CRITICAL as of 01 Jan 2012.
 
'''FOR GENERATION OF ALARMS IN THE OPERATIONS DASHBOARD IN CASE OF FAILURE'''
*[http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC_OPERATORS ROC_OPERATORS]
 
'''ALL METRICS THAT NCG CAN USE TO CONFIGURE A SAM NGI'''
* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC ROC] - all the possible metrics that NCG can use to configure NGI Nagios.
NOTE WELL: starting from SAMUpdate-17 the removal of a metric from ROC profile will immediately cause the removal of the metric from all NGI Nagios instances, i.e. tests will no longer be executed.
 
== EGI.eu central tools and NGI SAM ==
== EGI.eu central tools and NGI SAM ==
* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles/?vo_name=ops&profile_name=OPS_MONITOR OPS_MONITOR]
* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles/?vo_name=ops&profile_name=OPS_MONITOR OPS_MONITOR]
Line 103: Line 94:
* SAM Project [https://tomtools.cern.ch/jira/browse/SAM home page]
* SAM Project [https://tomtools.cern.ch/jira/browse/SAM home page]
* [https://tomtools.cern.ch/confluence/display/SAMDOC/Milestones SAM milestones]
* [https://tomtools.cern.ch/confluence/display/SAMDOC/Milestones SAM milestones]
* [https://twiki.cern.ch/twiki/bin/view/EMI/NagiosProbes EMI Nagios] and [https://savannah.cern.ch/task/?21823 status] (ARC, dCache, gLite, UNICORE)
<!-- obsolete *[https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview Multi Level Monitoring Overview] -->
<!-- obsolete *[https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview Multi Level Monitoring Overview] -->
*[https://tomtools.cern.ch/confluence/download/attachments/2261694/Ace_Service_Availability_Computation.pdf?version=1&modificationDate=1314361543000 Computation of Service Availability Metrics in ACE]
*[https://tomtools.cern.ch/confluence/download/attachments/2261694/Ace_Service_Availability_Computation.pdf?version=1&modificationDate=1314361543000 Computation of Service Availability Metrics in ACE]

Revision as of 11:48, 24 January 2013

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Tools menu: Main page Instructions for developers AAI Proxy Accounting Portal Accounting Repository AppDB ARGO GGUS GOCDB
Message brokers Licenses OTAGs Operations Portal Perun EGI Collaboration tools LToS EGI Workload Manager



The Service Availability Monitoring (SAM) system is used to monitor the resources within the production infrastructure. SAM monitoring data is used for calculation of availability and reliability of grid sites. It includes the following components:

  • probes: a test execution framework (based on the open source monitoring framework Nagios) and the Nagios Configuration Generator (NCG)
  • the Aggregated Topology Provider (ATP), the Metrics Description Database (MDDB), and the Metrics Results Database (MRDB)
  • the message bus to publish results and a programmatic interface
  • the visualization portal (MyEGI).

SAM tool instances

Documentation

Introduction

SAM tests

SAM profiles

Resource Centre AVAILABILITY/RELIABILITY COMPUTATION

  • ROC_CRITICAL - the profile for Availability/Reliability computation of EGI Resource Centres (OPS VO). It replaces WLCG_CREAM_LCGCE_CRITICAL as of 01 Jan 2012.

FOR GENERATION OF ALARMS IN THE OPERATIONS DASHBOARD IN CASE OF FAILURE

ALL METRICS THAT NCG CAN USE TO CONFIGURE A SAM NGI

  • ROC - all the possible metrics that NCG can use to configure NGI Nagios.

NOTE WELL: starting from SAMUpdate-17 the removal of a metric from ROC profile will immediately cause the removal of the metric from all NGI Nagios instances, i.e. tests will no longer be executed.

MyEGI

NCG

ATP

User guides

MyEGI, Nagios, POEM

Administrator guides

Probes


Developers guides

Probes development, SAM PI

Support

FAQs and Troubleshooting guides

Check this

EGI.eu central tools and NGI SAM

OTHERS

WLCG

OSG

Related Procedures

  • Validate ROC or NGI Nagios Procedures: PROC05
  • Setting a Nagios test status to OPERATIONS: PROC06
  • Adding new probes to SAM: PROC07
  • Management of the EGI OPS Availability and Reliability Profile: PROC08

SAM/Nagios EGI Support Procedures

Resources