Difference between revisions of "ARGO"

From EGIWiki
Jump to: navigation, search
(Profiles for RC monitoring)
 
(16 intermediate revisions by 3 users not shown)
Line 69: Line 69:
 
ARGO monitoring engine consists of the following central instances:
 
ARGO monitoring engine consists of the following central instances:
 
* [https://argo-mon.egi.eu/nagios argo-mon.egi.eu] & [https://argo-mon2.egi.eu/nagios argo-mon2.egi.eu] - redundant instances monitoring all services in EGI infrastructure
 
* [https://argo-mon.egi.eu/nagios argo-mon.egi.eu] & [https://argo-mon2.egi.eu/nagios argo-mon2.egi.eu] - redundant instances monitoring all services in EGI infrastructure
* [https://opsmon.egi.eu/nagios opsmon.egi.eu] - monitoring EGI operational tools
 
 
* [https://secmon.egi.eu/nagios secmon.egi.eu] - security monitoring.
 
* [https://secmon.egi.eu/nagios secmon.egi.eu] - security monitoring.
  
Line 75: Line 74:
  
 
=== Profiles for RC monitoring ===
 
=== Profiles for RC monitoring ===
*[https://poem.egi.eu/poem/admin/poem/profile/2/ ARGO_MON]  
+
*[https://poem.egi.eu/ui/public_metricprofiles/ARGO_MON ARGO_MON]  
 
** Tests for monitoring of all EGI services.
 
** Tests for monitoring of all EGI services.
** [[ROC_SAM_Tests |ROC Tests description]]
+
* [https://poem.egi.eu/ui/public_metricprofiles/ARGO_MON_CRITICAL ARGO_MON_CRITICAL]
* [https://poem.egi.eu/poem/admin/poem/profile/3/ ARGO_MON_CRITICAL]
 
 
** The profile for Availability/Reliability computation of EGI Resource Centres (OPS VO), subset of ARGO_MON tests.  
 
** The profile for Availability/Reliability computation of EGI Resource Centres (OPS VO), subset of ARGO_MON tests.  
** These profile contains a subset of [[ROC_SAM_Tests |ARGO_MON Tests]].
+
*[https://poem.egi.eu/ui/public_metricprofiles/ARGO_MON_OPERATORS ARGO_MON_OPERATORS]  
*[https://poem.egi.eu/poem/admin/poem/profile/4/ ARGO_MON_OPERATORS]  
 
 
** Subset of ROC tests that are Operations tests, metrics that can generate an alarm on the operations dashboard when failing.
 
** Subset of ROC tests that are Operations tests, metrics that can generate an alarm on the operations dashboard when failing.
** These profile contains a subset of [[ROC_SAM_Tests |ARGO_MON Tests]].
 
  
 
=== Profile for Cloud RC monitoring ===
 
=== Profile for Cloud RC monitoring ===
* [https://poem.egi.eu/poem/admin/poem/profile/1/ CLOUD-MON]
+
* The cloud probes are now included in the ARGO_MON* profiles
** Tests for monitoring EGI FedCloud resources from cloudmon.egi.eu
 
** [https://wiki.egi.eu/wiki/Cloud_SAM_tests CLOUD_MONITOR Tests description]
 
 
 
* [https://poem.egi.eu/poem/admin/poem/profile/2/ CLOUD-MON_CRITICAL]
 
** Tests for calculating A/R of EGI FedCloud resources from cloudmon.egi.eu
 
** [https://wiki.egi.eu/wiki/Cloud_SAM_tests CLOUD_MONITOR Tests description]
 
  
 
=== Profiles for Operations Tools monitoring ===
 
=== Profiles for Operations Tools monitoring ===
* [https://poem.egi.eu/poem/admin/poem/profile/4/ OPS_MONITOR]
+
* [https://poem.egi.eu/ui/public_metricprofiles/OPS_MONITOR OPS_MONITOR]
 
** Tests for monitoring of all EGI.eu Central Operational Tools from opsmon.egi.eu, including NGI SAM
 
** Tests for monitoring of all EGI.eu Central Operational Tools from opsmon.egi.eu, including NGI SAM
** [https://wiki.egi.eu/wiki/OPS-MONITOR_profile_SAM_tests OPS_MONITOR Tests description]
+
* [https://poem.egi.eu/ui/public_metricprofiles/OPS_MONITOR_CRITICAL OPS_MONITOR_CRITICAL]
* [https://poem.egi.eu/poem/admin/poem/profile/5/ OPS_MONITOR_CRITICAL]
 
 
** Subset of OPS_MONITOR tests used for A/R calculation
 
** Subset of OPS_MONITOR tests used for A/R calculation
  
 
=== Others ===
 
=== Others ===
* [https://midmon.egi.eu/poem/admin/poem/profile/1/ MW_MONITOR] - Tests for monitoring all EGI services for special purposes (MW upgrades) from midmon.egi.eu
+
* [https://poem.egi.eu/ui/public_metricprofiles/SEC_MONITOR SEC_MONITOR] - Security tests for monitoring all EGI services from secmon.egi.eu
** Deployed: on Central instance (midmon.egi.eu)
 
** Tests: 15
 
** [https://wiki.egi.eu/wiki/MW_Nagios_tests MW_MONITOR Tests description]
 
* [https://secmon.egi.eu/poem/admin/poem/profile/1/ SEC_MONITOR] - Security tests for monitoring all EGI services from secmon.egi.eu
 
 
** Deployed: on Central instance (secmon.egi.eu)
 
** Deployed: on Central instance (secmon.egi.eu)
** Tests: 14
 
 
** [https://wiki.egi.eu/wiki/EGI_CSIRT:SMG SEC_MONITOR Tests description]
 
** [https://wiki.egi.eu/wiki/EGI_CSIRT:SMG SEC_MONITOR Tests description]
  
== SAM tests ==
+
== ARGO tests ==
 
+
The list of metrics with detailed information on the probes is available on Poem:
Tests on NGI/ROC SAM instances are the one which frameworks includes in the SAM configuration. In addition SAM admins can add their own probes to these instances.
+
* [https://poem.egi.eu/ui/public_services Services]
 
+
* [https://poem.egi.eu/ui/public_metricprofiles Profiles]
SAM teams proposes addition of new probes.
+
* [https://poem.egi.eu/ui/public_metrics Metrics]
* '''The addition of probes is part of SAM release and thus part of the staged rollout'''.
+
* [https://poem.egi.eu/ui/public_probes Probes]
* It was agreed that prior to release new list of probes will be briefly presented at the OMB meeting.  
 
* Probes which perform internal components of SAM are not presented at OMB.  
 
 
 
List of tests:
 
* The list of '''tests included in the SAM release''' can be found [[ROC_SAM_Tests|here - NGI profile SAM tests]].
 
 
 
* List of '''MW related tests''': [[MW SAM tests]].
 
 
 
* List of '''operational tools tests''': [[OPS-MONITOR profile SAM tests]].
 
 
 
* List of '''cloud tests''': [[Cloud SAM tests]].
 
  
 
=== Operations tests  ===
 
=== Operations tests  ===

Latest revision as of 10:11, 30 April 2020

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Tools menu: Main page Instructions for developers AAI Proxy Accounting Portal Accounting Repository AppDB ARGO GGUS GOCDB
Message brokers Licenses OTAGs Operations Portal Perun EGI Collaboration tools LToS EGI Workload Manager




Tool name ARGO
Tool Category and description Service Monitoring for Availability and Reliability
Tool url https://argoeu.github.io
Email argo-ggus-support@grnet.gr
GGUS Support unit ARGO/SAM EGI Support
GOC DB entry https://goc.egi.eu/portal/index.php?Page_Type=Site&id=641
Requirements tracking - EGI tracker https://rt.egi.eu/rt/Dashboards/5544/SAM-Requirements
Issue tracking - Developers tracker https://github.com/ARGOeu/ARGO/issues
Release schedule https://github.com/ARGOeu/ARGO/milestones
Release notes TBD
Roadmap TBD
Related OLA https://documents.egi.eu/public/ShowDocument?docid=2170
Test instance url http://cclavoisier04.in2p3.fr:8080/lavoisier
Documentation https://argoeu.github.io/overview/
License Apache 2
Provider GRNET, SRCE, CNRS
Source code https://github.com/ARGOeu/


Change, Release and Deployment

This sections are providing detailed agreement in terms of requirements gathering, release and deployment of the tool which extend Instructions for Operations Tools teams

Documentation

ARGO monitoring engine

ARGO monitoring engine consists of the following central instances:

POEM Profiles

Profiles for RC monitoring

  • ARGO_MON
    • Tests for monitoring of all EGI services.
  • ARGO_MON_CRITICAL
    • The profile for Availability/Reliability computation of EGI Resource Centres (OPS VO), subset of ARGO_MON tests.
  • ARGO_MON_OPERATORS
    • Subset of ROC tests that are Operations tests, metrics that can generate an alarm on the operations dashboard when failing.

Profile for Cloud RC monitoring

  • The cloud probes are now included in the ARGO_MON* profiles

Profiles for Operations Tools monitoring

  • OPS_MONITOR
    • Tests for monitoring of all EGI.eu Central Operational Tools from opsmon.egi.eu, including NGI SAM
  • OPS_MONITOR_CRITICAL
    • Subset of OPS_MONITOR tests used for A/R calculation

Others

ARGO tests

The list of metrics with detailed information on the probes is available on Poem:

Operations tests

Tests on Operations Portal are the ones used for raising alarms for ROD and Operations teams. Operations portal does not execute these tests, but receives alarms from NGI/ROC SAM instances. Operations Portal contains list of the probes used for alarms and others are filtered.

The procedure for adding a new probe can be found here PROC06.

The list of tests can be found here - Operations SAM tests.

Availability tests

Set of tests used for calculating availability and reliability of sites and services. The A/R calculation is related to the OLA. As in case of Operations Portal, availability calculation component receives results from NGI/ROC SAM instances.

TSA1.8 proposes a change in avail calculation (which probe results count in it) and the OMB approves.

The list of tests can be found here - Availability SAM tests.