Difference between revisions of "SAM"
Jump to navigation
Jump to search
Line 63: | Line 63: | ||
* [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=OSG_CRITICAL OSG_CRITICAL] | * [http://grid-monitoring.cern.ch/myegi/sam-pi/metrics_in_profiles?vo_name=ops&profile_name=OSG_CRITICAL OSG_CRITICAL] | ||
=Tools information pages= | |||
==MyEGI== | |||
* [https://tomtools.cern.ch/confluence/display/SAM/MyEGI/ MyEGI documentation] | * [https://tomtools.cern.ch/confluence/display/SAM/MyEGI/ MyEGI documentation] | ||
* [https://tomtools.cern.ch/confluence/display/SAMDOC/Web+Services+Specification MyEGI Web Services Specification] | * [https://tomtools.cern.ch/confluence/display/SAMDOC/Web+Services+Specification MyEGI Web Services Specification] | ||
<!-- * [https://twiki.cern.ch/twiki/bin/view/LCG/SAMProbesMetrics SAM Probes and Metrics] --> | <!-- * [https://twiki.cern.ch/twiki/bin/view/LCG/SAMProbesMetrics SAM Probes and Metrics] --> | ||
==NCG== | |||
* [https://tomtools.cern.ch/confluence/display/SAM/NCG NCG Component Overview] | * [https://tomtools.cern.ch/confluence/display/SAM/NCG NCG Component Overview] | ||
<!-- obsoleted * [https://twiki.cern.ch/twiki/bin/view/EGEE/GridMonitoringNcgRecipes Grid Monitoring Specific Ncg Recipes]--> | <!-- obsoleted * [https://twiki.cern.ch/twiki/bin/view/EGEE/GridMonitoringNcgRecipes Grid Monitoring Specific Ncg Recipes]--> | ||
<!-- obsoleted * [https://twiki.cern.ch/twiki/bin/view/EGEE/MyEGEE MyEGEE Documentation]--> | <!-- obsoleted * [https://twiki.cern.ch/twiki/bin/view/EGEE/MyEGEE MyEGEE Documentation]--> | ||
==Databases== | |||
* [https://tomtools.cern.ch/confluence/display/SAM/ATP Aggregated Topology Provider] (ATP) | * [https://tomtools.cern.ch/confluence/display/SAM/ATP Aggregated Topology Provider] (ATP) | ||
<!--* [https://tomtools.cern.ch/confluence/display/SAM/POEM Profile Management Database] (POEM)--> | <!--* [https://tomtools.cern.ch/confluence/display/SAM/POEM Profile Management Database] (POEM)--> |
Revision as of 16:02, 22 October 2012
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Tools menu: | • Main page | • Instructions for developers | • AAI Proxy | • Accounting Portal | • Accounting Repository | • AppDB | • ARGO | • GGUS | • GOCDB |
• Message brokers | • Licenses | • OTAGs | • Operations Portal | • Perun | • EGI Collaboration tools | • LToS | • EGI Workload Manager |
The Service Availability Monitoring (SAM) system is used to monitor the resources within the production infrastructure. SAM monitoring data is used for calculation of availability and reliability of grid sites. It includes the following components:
- probes: a test execution framework (based on the open source monitoring framework Nagios) and the Nagios Configuration Generator (NCG)
- the Aggregated Topology Provider (ATP), the Metrics Description Database (MDDB), and the Metrics Results Database (MRDB)
- the message bus to publish results and a programmatic interface
- the visualization portal (MyEGI).
SAM tool instances
Documentation
- SAM Release Notes
- SAM Administrator's Guide
- SAM Probes
- Probes for the detection of unsupported gLite 3.1/3.2 products or end-points in GOCDB associated to retired service types
- SAM/NAGIOS Reference Card for sitemanger
- User Guides (Nagios, MyEGI, POEM)
- FAQs
- Troubleshooting
Monitoring uncertified sites
- Setting NAGIOS to Monitor Uncertified Sites
- IMPORTANT. EGI.eu provides catch-all WMS and BDII services for the monitoring of uncertified sites. The service is open for use, and your NGI can easily apply here.
Tests and probes
- Terminology
- Probe development policy
- SAM released probes
- EMI Nagios and status (ARC, dCache, gLite, UNICORE)
Profiles
Main profiles
FOR EGI AVAILABILITY/RELIABILITY COMPUTATION
- Resource Centres: ROC_CRITICAL - the profile for Availability/Reliability computation of EGI Resource Centres (OPS VO). It replaces WLCG_CREAM_LCGCE_CRITICAL as of 01 Jan 2012.
FOR GENERATION OF ALARMS IN THE OPERATIONS DASHBOARD IN CASE OF FAILURE
OTHERS
- ROC - all the possible metrics that NCG can use to configure NGI Nagios
- NGI - profile is equivalent to ROC
- GLEXEC - gLExec tests
WLCG
- WLCG_CREAM_CRITICAL
- WLCG_CREAM_LCGCE_CRITICAL profile used for WLCG Availability/Reliability computation
- WLCG_CRITICAL
- WLCG_CRITICAL_TEST
OSG
Tools information pages
MyEGI
NCG
Databases
- Aggregated Topology Provider (ATP)
- POEM User Guide (Profile Management Database)
- Metric Result Store (MRS)
Related Procedures
- Validate ROC or NGI Nagios Procedures: PROC05
- Setting a Nagios test status to OPERATIONS: PROC06
- Adding new probes to SAM: PROC07
- Management of the EGI OPS Availability and Reliability Profile: PROC08
SAM/Nagios EGI Support Procedures
Resources
- SAM Project home page
- SAM milestones
- Computation of Service Availability Metrics in ACE
- SAM-PI documentation (Non official wiki page containing SAM PI examples)
- Andreade, P.; M. Babik, M.; Bhatt, K; Service Availability Monitoring Framework Based On Commodity Software; CHEP12, March 2012 (poster)