Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "SAM"

From EGIWiki
Jump to navigation Jump to search
Line 23: Line 23:
====Profiles for RC monitoring====
====Profiles for RC monitoring====
<!--* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC ROC]-->
<!--* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC ROC]-->
*[https://grid-monitoring.egi.eu/poem/admin/poem/profile/25/ ROC] - Tests for monitoring of all EGI services; applied on all NGI SAM Nagioses. NOTE WELL: starting from SAMUpdate-17 the removal of a metric from ROC profile will immediately cause the removal of the metric from all NGI Nagios instances, i.e. tests will no longer be executed.
*[https://mon.egi.eu/poem/admin/poem/profile/25/ ROC] - Tests for monitoring of all EGI services; applied on all NGI SAM Nagioses. NOTE WELL: starting from SAMUpdate-17 the removal of a metric from ROC profile will immediately cause the removal of the metric from all NGI Nagios instances, i.e. tests will no longer be executed.
** Deployed: on all NGI SAM Nagios
** Deployed: on all NGI SAM Nagios
** Tests: 90
** Tests: 90
** [https://TOADD ROC Tests description]
** [https://TOADD ROC Tests description]
<!--* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC_CRITICAL ROC_CRITICAL] -->
<!--* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC_CRITICAL ROC_CRITICAL] -->
* [https://grid-monitoring.egi.eu/poem/admin/poem/profile/26/ ROC_CRITICAL] - The profile for Availability/Reliability computation of EGI Resource Centres (OPS VO), subset of ROC tests. NOTE: It replaces WLCG_CREAM_LCGCE_CRITICAL as of 01 Jan 2012.
* [https://mon.egi.eu/poem/admin/poem/profile/26/ ROC_CRITICAL] - The profile for Availability/Reliability computation of EGI Resource Centres (OPS VO), subset of ROC tests. NOTE: It replaces WLCG_CREAM_LCGCE_CRITICAL as of 01 Jan 2012.
** Deployed: on all NGI SAM Nagios
** Deployed: on all NGI SAM Nagios
** Tests: 31
** Tests: 31
** [https://TOADD ROC_CRITICAL Tests description]
** [https://TOADD ROC_CRITICAL Tests description]
<!--*[http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC_OPERATORS ROC_OPERATORS] -->
<!--*[http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC_OPERATORS ROC_OPERATORS] -->
*[https://grid-monitoring.egi.eu/poem/admin/poem/profile/27/ ROC_OPERATORS] - Subset of ROC tests that are Operations tests, metrics that can generate an alarm on the operations dashboard when failing.
*[https://mon.egi.eu/poem/admin/poem/profile/27/ ROC_OPERATORS] - Subset of ROC tests that are Operations tests, metrics that can generate an alarm on the operations dashboard when failing.
** Deployed: on all NGI SAM Nagios
** Deployed: on all NGI SAM Nagios
** Tests: 67
** Tests: 67
Line 39: Line 39:


====Profile for Cloud RC monitoring ====
====Profile for Cloud RC monitoring ====
* [https://grid-monitoring.egi.eu/poem/admin/poem/profile/29/ CLOUD-MON] - Tests for monitoring EGI FedCloud resources from cloudmon.egi.eu
* [https://mon.egi.eu/poem/admin/poem/profile/29/ CLOUD-MON] - Tests for monitoring EGI FedCloud resources from cloudmon.egi.eu
** Deployed: on Central instance (cloudmon.egi.eu)
** Deployed: on Central instance (cloudmon.egi.eu)
** Tests: 6
** Tests: 6
Line 46: Line 46:
====Profiles for Operations Tools monitoring ====
====Profiles for Operations Tools monitoring ====
<!-- [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles/?vo_name=ops&profile_name=OPS_MONITOR -->
<!-- [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles/?vo_name=ops&profile_name=OPS_MONITOR -->
* [https://grid-monitoring.egi.eu/poem/admin/poem/profile/22/ OPS_MONITOR] - Tests for monitoring of all EGI.eu Central Operational Tools from opsmon.egi.eu, including NGI SAM
* [https://mon.egi.eu/poem/admin/poem/profile/22/ OPS_MONITOR] - Tests for monitoring of all EGI.eu Central Operational Tools from opsmon.egi.eu, including NGI SAM
** Deployed: on Central instance (opsmon.egi.eu)
** Deployed: on Central instance (opsmon.egi.eu)
** Tests: 28
** Tests: 28
** [https://wiki.egi.eu/wiki/OPS-MONITOR_profile_SAM_tests OPS_MONITOR Tests description]
** [https://wiki.egi.eu/wiki/OPS-MONITOR_profile_SAM_tests OPS_MONITOR Tests description]
* [https://grid-monitoring.egi.eu/poem/admin/poem/profile/23/ OPS_MONITOR_CRITICAL] - Subset of OPS_MONITOR tests used for A/R calculation
* [https://mon.egi.eu/poem/admin/poem/profile/23/ OPS_MONITOR_CRITICAL] - Subset of OPS_MONITOR tests used for A/R calculation
** Deployed: on Central instance (opsmon.egi.eu)
** Deployed: on Central instance (opsmon.egi.eu)
** Tests: 23
** Tests: 23
Line 57: Line 57:
====Others====
====Others====
<!-- http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=GLEXEC -->
<!-- http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=GLEXEC -->
* [https://grid-monitoring.egi.eu/poem/admin/poem/profile/17/ GLEXEC] - gLExec tests configured on NGI SAM Nagioses
* [https://mon.egi.eu/poem/admin/poem/profile/17/ GLEXEC] - gLExec tests configured on NGI SAM Nagioses
** Deployed: on all NGI SAM Nagios
** Deployed: on all NGI SAM Nagios
** Tests: 2
** Tests: 2

Revision as of 15:05, 29 July 2014

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Tools menu: Main page Instructions for developers AAI Proxy Accounting Portal Accounting Repository AppDB ARGO GGUS GOCDB
Message brokers Licenses OTAGs Operations Portal Perun EGI Collaboration tools LToS EGI Workload Manager


The Service Availability Monitoring (SAM) system is used to monitor the resources within the production infrastructure. SAM monitoring data is used for calculation of availability and reliability of grid sites.

SAM Nagios probes re-factoring TF

SAM tool instances

Documentation

Introduction

SAM

SAM profiles

POEM (Profile Management Database, former Metric Description Database) aims to describe existing metrics and group (profiles) them in order to run tests. In addition it should define actions that can either configure the way the availability and reliability is computed or allow notifications to messaging system.

Profiles for RC monitoring

  • ROC - Tests for monitoring of all EGI services; applied on all NGI SAM Nagioses. NOTE WELL: starting from SAMUpdate-17 the removal of a metric from ROC profile will immediately cause the removal of the metric from all NGI Nagios instances, i.e. tests will no longer be executed.
  • ROC_CRITICAL - The profile for Availability/Reliability computation of EGI Resource Centres (OPS VO), subset of ROC tests. NOTE: It replaces WLCG_CREAM_LCGCE_CRITICAL as of 01 Jan 2012.
  • ROC_OPERATORS - Subset of ROC tests that are Operations tests, metrics that can generate an alarm on the operations dashboard when failing.

Profile for Cloud RC monitoring

Profiles for Operations Tools monitoring

Others

SAM components

User guides

Administrator guides

Probes

Developers guides

Support

SAM-related Procedures

  • Validate ROC or NGI Nagios Procedures: PROC05
  • Setting a Nagios test status to OPERATIONS: PROC06
  • Adding new probes to SAM: PROC07
  • Management of the EGI OPS Availability and Reliability Profile: PROC08

ARGO/SAM EGI Support in GGUS

Resources

  • Andreade, P.; M. Babik, M.; Bhatt, K; Service Availability Monitoring Framework Based On Commodity Software; CHEP12, March 2012 (poster)