Difference between revisions of "SAM"
(→Others) |
|||
Line 19: | Line 19: | ||
===SAM profiles === | ===SAM profiles === | ||
==== | '''POEM''' (Profile Management Database, former Metric Description Database) aims to describe existing metrics and group ('''profiles''') them in order to run tests. In addition it should define actions that can either configure the way the availability and reliability is computed or allow notifications to messaging system. | ||
====Profiles for RC monitoring==== | |||
<!--* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC ROC]--> | <!--* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=ROC ROC]--> | ||
*[https://grid-monitoring.egi.eu/poem/admin/poem/profile/25/ ROC] - Tests for monitoring of all EGI services; applied on all NGI SAM Nagioses | *[https://grid-monitoring.egi.eu/poem/admin/poem/profile/25/ ROC] - Tests for monitoring of all EGI services; applied on all NGI SAM Nagioses | ||
Line 28: | Line 30: | ||
*[https://grid-monitoring.egi.eu/poem/admin/poem/profile/27/ ROC_OPERATORS] - Subset of ROC tests that are Operations tests, metrics that can generate an alarm on the operations dashboard when failing. | *[https://grid-monitoring.egi.eu/poem/admin/poem/profile/27/ ROC_OPERATORS] - Subset of ROC tests that are Operations tests, metrics that can generate an alarm on the operations dashboard when failing. | ||
==== | ====Profile for Cloud RC monitoring ==== | ||
* [https://grid-monitoring.egi.eu/poem/admin/poem/profile/29/ CLOUD-MON] - Tests for monitoring EGI FedCloud resources from cloudmon.egi.eu | * [https://grid-monitoring.egi.eu/poem/admin/poem/profile/29/ CLOUD-MON] - Tests for monitoring EGI FedCloud resources from cloudmon.egi.eu | ||
==== | ====Profiles for Operations Tools monitoring ==== | ||
<!-- [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles/?vo_name=ops&profile_name=OPS_MONITOR --> | <!-- [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles/?vo_name=ops&profile_name=OPS_MONITOR --> | ||
* [https://grid-monitoring.egi.eu/poem/admin/poem/profile/22/ OPS_MONITOR] - Tests for monitoring of all EGI.eu Central Operational Tools from opsmon.egi.eu, including NGI SAM | * [https://grid-monitoring.egi.eu/poem/admin/poem/profile/22/ OPS_MONITOR] - Tests for monitoring of all EGI.eu Central Operational Tools from opsmon.egi.eu, including NGI SAM | ||
* [https://grid-monitoring.egi.eu/poem/admin/poem/profile/23/ OPS_MONITOR_CRITICAL] - Subset of OPS_MONITOR tests used for A/R calculation | * [https://grid-monitoring.egi.eu/poem/admin/poem/profile/23/ OPS_MONITOR_CRITICAL] - Subset of OPS_MONITOR tests used for A/R calculation |
Revision as of 09:48, 25 July 2014
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Tools menu: | • Main page | • Instructions for developers | • AAI Proxy | • Accounting Portal | • Accounting Repository | • AppDB | • ARGO | • GGUS | • GOCDB |
• Message brokers | • Licenses | • OTAGs | • Operations Portal | • Perun | • EGI Collaboration tools | • LToS | • EGI Workload Manager |
The Service Availability Monitoring (SAM) system is used to monitor the resources within the production infrastructure. SAM monitoring data is used for calculation of availability and reliability of grid sites.
SAM Nagios probes re-factoring TF
SAM tool instances
Documentation
Introduction
SAM
- SAM Tests terminology and types
- SAM Project home page and SAM milestones
SAM profiles
POEM (Profile Management Database, former Metric Description Database) aims to describe existing metrics and group (profiles) them in order to run tests. In addition it should define actions that can either configure the way the availability and reliability is computed or allow notifications to messaging system.
Profiles for RC monitoring
- ROC - Tests for monitoring of all EGI services; applied on all NGI SAM Nagioses
NOTE WELL: starting from SAMUpdate-17 the removal of a metric from ROC profile will immediately cause the removal of the metric from all NGI Nagios instances, i.e. tests will no longer be executed.
- ROC_CRITICAL - The profile for Availability/Reliability computation of EGI Resource Centres (OPS VO), subset of ROC tests. Note: It replaces WLCG_CREAM_LCGCE_CRITICAL as of 01 Jan 2012.
- ROC_OPERATORS - Subset of ROC tests that are Operations tests, metrics that can generate an alarm on the operations dashboard when failing.
Profile for Cloud RC monitoring
- CLOUD-MON - Tests for monitoring EGI FedCloud resources from cloudmon.egi.eu
Profiles for Operations Tools monitoring
- OPS_MONITOR - Tests for monitoring of all EGI.eu Central Operational Tools from opsmon.egi.eu, including NGI SAM
- OPS_MONITOR_CRITICAL - Subset of OPS_MONITOR tests used for A/R calculation
Others
- GLEXEC - gLExec tests configured on NGI SAM Nagioses
- MW_MONITOR - Tests for monitoring all EGI services for special purposes (MW upgrades) from midmon.egi.eu
- SEC_MONITOR - Security tests for monitoring all EGI services from secmon.egi.eu
SAM components
User guides
- MyEGI, Nagios, POEM
- SAM-PI documentation (Non official wiki page containing SAM PI examples)
Administrator guides
- SAM Release Notes
- SAM (including configuration via YAIM)
- SAM/NAGIOS Reference Card for sitemanger
- VO SAM
- Monitoring uncertified sites:
- Setting NAGIOS to Monitor Uncertified Sites
- IMPORTANT. EGI.eu provides catch-all WMS and BDII services for the monitoring of uncertified sites. The service is open for use, and your NGI can easily apply here.
Probes
- SAM Probes
- EGI probes running on midmon Nagios
- Probe development policy
- EMI Nagios and status (ARC, dCache, gLite, UNICORE)
Developers guides
Support
FAQs and Troubleshooting guides
- Validate ROC or NGI Nagios Procedures: PROC05
- Setting a Nagios test status to OPERATIONS: PROC06
- Adding new probes to SAM: PROC07
- Management of the EGI OPS Availability and Reliability Profile: PROC08
SAM/Nagios Support in GGUS
Resources
- Andreade, P.; M. Babik, M.; Bhatt, K; Service Availability Monitoring Framework Based On Commodity Software; CHEP12, March 2012 (poster)