Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "SAM"

From EGIWiki
Jump to navigation Jump to search
Line 12: Line 12:
= SAM tool instances =
= SAM tool instances =
* [[SAM Instances]]
* [[SAM Instances]]
= Documentation =
* [https://tomtools.cern.ch/confluence/display/SAMDOC/Release+Notes SAM Release Notes]
* [https://tomtools.cern.ch/confluence/display/SAMDOC/Installing+SAM-Nagios SAM Administrator's Guide]
* [https://tomtools.cern.ch/confluence/display/SAMDOC/SAM-Nagios+Card SAM/NAGIOS Reference Card for sitemanger]
* [https://tomtools.cern.ch/confluence/display/SAMDOC/User%27s+Guide User Guides] (Nagios, MyEGI, POEM]
* [https://twiki.cern.ch/twiki/bin/view/EGEE/GridMonitoringNcgYaim Yaim Based Installation of Nagios & NCG]
* [https://tomtools.cern.ch/confluence/display/SAMDOC/FAQs FAQs]
* [https://tomtools.cern.ch/confluence/display/SAMDOC/Troubleshooting Troubleshooting]
* [https://wiki.egi.eu/wiki/VO_Services/VO_Service_Availability_Monitoring Setting up a VO SAM instance]
== Monitoring uncertified sites ==
* [https://tomtools.cern.ch/confluence/display/SAM/Monitor+Uncertified+Sites Setting NAGIOS to Monitor Uncertified Sites]
** IMPORTANT. EGI.eu provides '''catch-all WMS and BDII''' services for the monitoring of uncertified sites. The service is open for use, and your NGI can easily apply [http://site-certification.egi.eu/ here].


=Tests and probes=
=Tests and probes=
Line 47: Line 62:
* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=OSG OSG]
* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=OSG OSG]
* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=OSG_CRITICAL OSG_CRITICAL]
* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=OSG_CRITICAL OSG_CRITICAL]
= Documentation =
* [https://tomtools.cern.ch/confluence/display/SAMDOC/Release+Notes SAM Release Notes]
* [https://tomtools.cern.ch/confluence/display/SAMDOC/Installing+SAM-Nagios SAM Administrator's Guide]
* [https://tomtools.cern.ch/confluence/display/SAMDOC/SAM-Nagios+Card SAM/NAGIOS Reference Card for sitemanger]
* [https://tomtools.cern.ch/confluence/display/SAMDOC/User%27s+Guide User Guides] (Nagios, MyEGI, POEM]
* [https://twiki.cern.ch/twiki/bin/view/EGEE/GridMonitoringNcgYaim Yaim Based Installation of Nagios & NCG]
* [https://tomtools.cern.ch/confluence/display/SAMDOC/FAQs FAQs]
* [https://tomtools.cern.ch/confluence/display/SAMDOC/Troubleshooting Troubleshooting]
* [https://wiki.egi.eu/wiki/VO_Services/VO_Service_Availability_Monitoring Setting up a VO SAM instance]
== Monitoring uncertified sites ==
* [https://tomtools.cern.ch/confluence/display/SAM/Monitor+Uncertified+Sites Setting NAGIOS to Monitor Uncertified Sites]
** IMPORTANT. EGI.eu provides '''catch-all WMS and BDII''' services for the monitoring of uncertified sites. The service is open for use, and your NGI can easily apply [http://site-certification.egi.eu/ here].


==Tools information pages==
==Tools information pages==

Revision as of 11:02, 9 July 2012

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Tools menu: Main page Instructions for developers AAI Proxy Accounting Portal Accounting Repository AppDB ARGO GGUS GOCDB
Message brokers Licenses OTAGs Operations Portal Perun EGI Collaboration tools LToS EGI Workload Manager



The Service Availability Monitoring (SAM) system is used to monitor the resources within the production infrastructure. SAM monitoring data is used for calculation of availability and reliability of grid sites. It includes the following components:

  • probes: a test execution framework (based on the open source monitoring framework Nagios) and the Nagios Configuration Generator (NCG)
  • the Aggregated Topology Provider (ATP), the Metrics Description Database (MDDB), and the Metrics Results Database (MRDB)
  • the message bus to publish results and a programmatic interface
  • the visualization portal (MyEGI).

SAM tool instances

Documentation

Monitoring uncertified sites

Tests and probes

Profiles

Main profiles

FOR EGI AVAILABILITY/RELIABILITY COMPUTATION

  • Resource Centres: ROC_CRITICAL - the profile for Availability/Reliability computation of EGI Resource Centres (OPS VO). It replaces WLCG_CREAM_LCGCE_CRITICAL as of 01 Jan 2012.

FOR GENERATION OF ALARMS IN THE OPERATIONS DASHBOARD IN CASE OF FAILURE

OTHERS

  • ROC - all the possible metrics that NCG can use to configure NGI Nagios
  • ARC
  • GLEXEC
  • NGI

WLCG

OSG

Tools information pages

MyEGI

NCG

Databases

Related Procedures

  • Validate ROC or NGI Nagios Procedures: PROC05
  • Setting a Nagios test status to OPERATIONS: PROC06
  • Adding new probes to SAM: PROC07
  • Management of the EGI OPS Availability and Reliability Profile: PROC08

SAM/Nagios EGI Support Procedures

Resources