Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "SAM"

From EGIWiki
Jump to navigation Jump to search
Line 34: Line 34:
<!-- [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles/?vo_name=ops&profile_name=OPS_MONITOR -->
<!-- [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles/?vo_name=ops&profile_name=OPS_MONITOR -->


* [https://grid-monitoring.egi.eu/poem/admin/poem/profile/22/ OPS_MONITOR] (monitoring of EGI.eu central tools including NGI SAM)
* [https://grid-monitoring.egi.eu/poem/admin/poem/profile/22/ OPS_MONITOR] - monitoring of EGI.eu central tools including NGI SAM)
* [https://grid-monitoring.egi.eu/poem/admin/poem/profile/23/ OPS_MONITOR_CRITICAL] - Subset of OPS_MONITOR tests used for A/R calculation


====Others====
====Others====

Revision as of 09:21, 25 July 2014

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Tools menu: Main page Instructions for developers AAI Proxy Accounting Portal Accounting Repository AppDB ARGO GGUS GOCDB
Message brokers Licenses OTAGs Operations Portal Perun EGI Collaboration tools LToS EGI Workload Manager


The Service Availability Monitoring (SAM) system is used to monitor the resources within the production infrastructure. SAM monitoring data is used for calculation of availability and reliability of grid sites.

SAM Nagios probes re-factoring TF

SAM tool instances

Documentation

Introduction

SAM

SAM profiles

For RC monitoring

  • ROC_CRITICAL - the profile for Availability/Reliability computation of EGI Resource Centres (OPS VO).

It replaces WLCG_CREAM_LCGCE_CRITICAL as of 01 Jan 2012.

  • ROC_OPERATORS for the definition of the metrics that can generate a operations dashboard alarm when failing
  • ROC - all the possible metrics that NCG can use to configure NGI Nagios. NOTE WELL: starting from SAMUpdate-17 the removal of a metric from ROC profile will immediately cause the removal of the metric from all NGI Nagios instances, i.e. tests will no longer be executed.

For Cloud RC monitoring

  • CLOUD-MON Tests for monitoring EGI FedCloud resources from cloudmon.egi.eu

For Operations Tools monitoring

Others

SAM components

User guides

Administrator guides

Probes

Developers guides

Probes development, SAM PI

Support

FAQs and Troubleshooting guides

SAM-related Procedures

  • Validate ROC or NGI Nagios Procedures: PROC05
  • Setting a Nagios test status to OPERATIONS: PROC06
  • Adding new probes to SAM: PROC07
  • Management of the EGI OPS Availability and Reliability Profile: PROC08

SAM/Nagios Support in GGUS

Resources

  • Andreade, P.; M. Babik, M.; Bhatt, K; Service Availability Monitoring Framework Based On Commodity Software; CHEP12, March 2012 (poster)