Difference between revisions of "ARGO"
(9 intermediate revisions by 2 users not shown) | |||
Line 74: | Line 74: | ||
=== Profiles for RC monitoring === | === Profiles for RC monitoring === | ||
*[https://poem.egi.eu/ | *[https://poem.egi.eu/ui/public_metricprofiles/ARGO_MON ARGO_MON] | ||
** Tests for monitoring of all EGI services. | ** Tests for monitoring of all EGI services. | ||
* [https://poem.egi.eu/ui/public_metricprofiles/ARGO_MON_CRITICAL ARGO_MON_CRITICAL] | |||
* [https://poem.egi.eu/ | |||
** The profile for Availability/Reliability computation of EGI Resource Centres (OPS VO), subset of ARGO_MON tests. | ** The profile for Availability/Reliability computation of EGI Resource Centres (OPS VO), subset of ARGO_MON tests. | ||
*[https://poem.egi.eu/ui/public_metricprofiles/ARGO_MON_OPERATORS ARGO_MON_OPERATORS] | |||
*[https://poem.egi.eu/ | |||
** Subset of ROC tests that are Operations tests, metrics that can generate an alarm on the operations dashboard when failing. | ** Subset of ROC tests that are Operations tests, metrics that can generate an alarm on the operations dashboard when failing. | ||
=== Profile for Cloud RC monitoring === | === Profile for Cloud RC monitoring === | ||
* The cloud probes are now included in the ARGO_MON* profiles | * The cloud probes are now included in the ARGO_MON* profiles | ||
=== Profiles for Operations Tools monitoring === | === Profiles for Operations Tools monitoring === | ||
* [https://poem.egi.eu/ | * [https://poem.egi.eu/ui/public_metricprofiles/OPS_MONITOR OPS_MONITOR] | ||
** Tests for monitoring of all EGI.eu Central Operational Tools from opsmon.egi.eu, including NGI SAM | ** Tests for monitoring of all EGI.eu Central Operational Tools from opsmon.egi.eu, including NGI SAM | ||
* [https://poem.egi.eu/ui/public_metricprofiles/OPS_MONITOR_CRITICAL OPS_MONITOR_CRITICAL] | |||
* [https://poem.egi.eu/ | |||
** Subset of OPS_MONITOR tests used for A/R calculation | ** Subset of OPS_MONITOR tests used for A/R calculation | ||
=== Others === | === Others === | ||
* [https:/ | * [https://poem.egi.eu/ui/public_metricprofiles/SEC_MONITOR SEC_MONITOR] - Security tests for monitoring all EGI services from secmon.egi.eu | ||
** Deployed: on Central instance (secmon.egi.eu) | ** Deployed: on Central instance (secmon.egi.eu) | ||
** [https://wiki.egi.eu/wiki/EGI_CSIRT:SMG SEC_MONITOR Tests description] | ** [https://wiki.egi.eu/wiki/EGI_CSIRT:SMG SEC_MONITOR Tests description] | ||
== ARGO tests == | == ARGO tests == | ||
The list of metrics with detailed information on the probes is available on Poem: | |||
* [https://poem.egi.eu/ui/public_services Services] | |||
* [https://poem.egi.eu/ui/public_metricprofiles Profiles] | |||
* [https://poem.egi.eu/ui/public_metrics Metrics] | |||
* | * [https://poem.egi.eu/ui/public_probes Probes] | ||
* | |||
* | |||
* | |||
=== Operations tests === | === Operations tests === |
Latest revision as of 10:11, 30 April 2020
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Tools menu: | • Main page | • Instructions for developers | • AAI Proxy | • Accounting Portal | • Accounting Repository | • AppDB | • ARGO | • GGUS | • GOCDB |
• Message brokers | • Licenses | • OTAGs | • Operations Portal | • Perun | • EGI Collaboration tools | • LToS | • EGI Workload Manager |
Tool name | ARGO |
Tool Category and description | Service Monitoring for Availability and Reliability |
Tool url | https://argoeu.github.io |
argo-ggus-support@grnet.gr | |
GGUS Support unit | ARGO/SAM EGI Support |
GOC DB entry | https://goc.egi.eu/portal/index.php?Page_Type=Site&id=641 |
Requirements tracking - EGI tracker | https://rt.egi.eu/rt/Dashboards/5544/SAM-Requirements |
Issue tracking - Developers tracker | https://github.com/ARGOeu/ARGO/issues |
Release schedule | https://github.com/ARGOeu/ARGO/milestones |
Release notes | TBD |
Roadmap | TBD |
Related OLA | https://documents.egi.eu/public/ShowDocument?docid=2170 |
Test instance url | http://cclavoisier04.in2p3.fr:8080/lavoisier |
Documentation | https://argoeu.github.io/overview/ |
License | Apache 2 |
Provider | GRNET, SRCE, CNRS |
Source code | https://github.com/ARGOeu/ |
Change, Release and Deployment
This sections are providing detailed agreement in terms of requirements gathering, release and deployment of the tool which extend Instructions for Operations Tools teams
Documentation
ARGO monitoring engine
ARGO monitoring engine consists of the following central instances:
- argo-mon.egi.eu & argo-mon2.egi.eu - redundant instances monitoring all services in EGI infrastructure
- secmon.egi.eu - security monitoring.
POEM Profiles
Profiles for RC monitoring
- ARGO_MON
- Tests for monitoring of all EGI services.
- ARGO_MON_CRITICAL
- The profile for Availability/Reliability computation of EGI Resource Centres (OPS VO), subset of ARGO_MON tests.
- ARGO_MON_OPERATORS
- Subset of ROC tests that are Operations tests, metrics that can generate an alarm on the operations dashboard when failing.
Profile for Cloud RC monitoring
- The cloud probes are now included in the ARGO_MON* profiles
Profiles for Operations Tools monitoring
- OPS_MONITOR
- Tests for monitoring of all EGI.eu Central Operational Tools from opsmon.egi.eu, including NGI SAM
- OPS_MONITOR_CRITICAL
- Subset of OPS_MONITOR tests used for A/R calculation
Others
- SEC_MONITOR - Security tests for monitoring all EGI services from secmon.egi.eu
- Deployed: on Central instance (secmon.egi.eu)
- SEC_MONITOR Tests description
ARGO tests
The list of metrics with detailed information on the probes is available on Poem:
Operations tests
Tests on Operations Portal are the ones used for raising alarms for ROD and Operations teams. Operations portal does not execute these tests, but receives alarms from NGI/ROC SAM instances. Operations Portal contains list of the probes used for alarms and others are filtered.
The procedure for adding a new probe can be found here PROC06.
The list of tests can be found here - Operations SAM tests.
Availability tests
Set of tests used for calculating availability and reliability of sites and services. The A/R calculation is related to the OLA. As in case of Operations Portal, availability calculation component receives results from NGI/ROC SAM instances.
TSA1.8 proposes a change in avail calculation (which probe results count in it) and the OMB approves.
The list of tests can be found here - Availability SAM tests.