Difference between revisions of "Fedcloud-tf:WorkGroups:Scenario5"

Monitoring in this context is the monitoring of the availability (and reliability) of the cloud resorces provided by the resource provided. What will be tested is the possibility for an hipotetical user to instantiate at least one pre-defined virtual machine within a given period of time. It consists of an "external" monitoring, no data will be collected from inside the VMs. Monitoring the capabilities of the cloud resource providers in terms of how many resources are available is beyond the scope of this Scenario, at least in its initial phase.Possible evolution of the FedCloud monitoring will be evaluated when the basic monitoring will be in place.

The outcome of Scenario5 will be a system that is able to run at least one probe on each Resource Provider paticipating to the FedCloud.

The probe will have an OK status if the VM can be instantiated.

Given the experience accumulated with the NAGIOS system within the EMI and EGI projects the monitoring framework will be based on NAGIOS. This has also the advantage to ease the integration of the FedCloud monitoring framework in the SAM monitoring sytem used by the EGI project to monitor the production infrastructure.

Status history of all the probes will be available trough web interfaces.

The proposed approach

The proposed approach is to have a central monitoring instance that run probes on al the FedCloud resources and collects their output.

The central instance could be a full blown SAM instance as those available through EGI (https://wiki.egi.eu/wiki/SAM_Instances) or a simple NAGIOS box. This has to be decided.

The following steps need to be completed in order to have the approach established.

1. Identify the Central instance: One resource or technology provider needs to provide a machine (virtual or real) where the testing instance would be deployed. Based on our experience providing 1GB RAM, 1 CPU/core and at least 10GB of disk is sufficient. This instance will be used to monitor all the clouds provided by resource providers. EGI-JRA1 will help with installation of this SAM instance.

2. Creation of basic probes: for each technology/Resource provider a basic probe should be created to test:

login functionality to the remote VM web interface (assuming that such web interface is available)
pre-defined VM instantiation

The idea is to start deinfing a skeleton probe with one of the technology providers that will be used to create all other probes.

The characteristics of the pre-defined VM are to be discussed with TPs

this step can be split in sub steps

2.a Identify a technology provider that voulunteer to create the skeleton probe with the help of EG-JRA1

2.b Use the skeleton probe to create probes for all the other TPs

3.Integration of probes: Once all the probes are created they can be integrated into the NAGIOS or SAM instance that can start collecting status data about the FedCloud

We estimante that these 3 initial steps can be accomplished before the end of February 2012.

Possible evolution

1. Integration in the EGI production infrastructure: Once all the probes are provided and tested on the dedicated central FedCloud instance we can start the discussion on integration with EGI operations, two options are foreseeable:

keep a single instance for all resources
distribute various monitoring instances over the current NGI model.

The next steps are integration with other operational tools used in EGI (operations portal, GOCDB, etc.)

2. Improvement of probes: probes can be developed based on APIs (OCCI or other) and can will perform actual capability (storage, computing) tests and can provide more accurate information on status of individual resource provider. This step depends on other activities (e.g. definition of common interface) and previous steps of scenario5.

NAGIOS Probes

Who has the responsibility to develop probes? Following the EGI model probes are developed by the Technology Providers and are integrated into the monitoring framework by the EGI-JRA1 staff that can also provide support during the initial phase of probes development in order to give guidelines and templates.

Informationon on how to develop NAGIOS probes can be retrieved in the SAM Developmes Guide

List of available probes within EGI is reported in the SAM Administrrator Guide

The EGI SAM System

The SAM system is basically a framework consisting of:
- Nagios monitoring system (https://www.nagios.org),
- custom databases for topology, probes description and storing results of tests
- web interface MyWLCG/MyEGI (https://tomtools.cern.ch/confluence/display/SAM/MyWLCG)
Probes used to perform check of services are provided by service developers. In case of EMI services probes are provided by EMI product teams. In case of Globus Toolkit, probes are provided by IGE project, etc. SAM team only maintains probes which test internal SAM functions (e.g. communication with messaging system, database synchronization, etc).

More information here

Operative Steps

Step 1: Setup the Scenario5 group

Identify the group leader and collaborators: Done

Step 2: Agree on the proposed approach

To be done in the coming meetings

Step 3: Identify a Resource Provider that will host the central nagios instance

Currently we have two volunteers: CESGA (Ivan and Alvaro) and GWDG (Kasprzak, Piotr). to be contacted for confirmation.

Step 4: Create a skeleton probe with a volunteer TP

GWDG (Kasprzak, Piotr) showed interest in developing the probe for their system

Step 5: Advertise the skeleton probe and use it as template

Step 6: Integrate the probes into the central NAGIOS system

Scenario5 meetings

A first dedicated meeting is being organised through this doodle:

http://doodle.com/dysckvz4yvmasw4k

Scenario5 Actions

Scenario5 Action available on RAL Basecamp

Further Resources

The SAM system EGi wiki pages

File:Flessr nagios probes.pdf (Thanks to David Wallom)

@@ Line 18: / Line 18: @@
 |-
 |
 Collaborator
 | SRCE<br>
 | &nbsp; Emir Imamagic
 |-
 |
 Collaborator
 | CESGA
 | Ivan Diaz
 |-
 | Collaborator
 | CESGA
 | Alvaro Simon
 |}
+<br>
 == What Monitoring means in this context  ==
@@ Line 126: / Line 126: @@
 The SAM system is basically a framework consisting of: <br>- Nagios monitoring system (https://[http://www.nagios.org/ www.nagios.org]), <br>- custom databases for topology, probes description and storing results of tests <br>- web interface MyWLCG/MyEGI ([https://tomtools.cern.ch/confluence/display/SAM/MyWLCG https://tomtools.cern.ch/confluence/display/SAM/MyWLCG]) <br>Probes used to perform check of services are provided by service developers. In case of EMI services probes are provided by EMI product teams. In case of Globus Toolkit, probes are provided by IGE project, etc. SAM team only maintains probes which test internal SAM functions (e.g. communication with messaging system, database synchronization, etc).
+<br>
+More information [[SAM|here]]
-More information [[SAM|here]]
+<br>
+== Operative Steps<br> ==
+<br>
+=== Step 1: Setup the Scenario5 group<br> ===
+Identify the group leader and collaborators: Done<br>
+=== Step 2: Agree on the proposed approach<br> ===
+To be done in the coming meetings<br>
+=== Step 3: Identify a Resource Provider that will host the central nagios instance<br> ===
+Currently we have two volunteers: CESGA (Ivan and Alvaro) and GWDG (Kasprzak, Piotr). to be contacted for confirmation.<br>
+=== Step 4: Create a skeleton probe with a volunteer TP<br> ===
+GWDG (Kasprzak, Piotr) showed interest in developing the probe for their system <br>
+=== Step 5: Advertise the skeleton probe and use it as template ===
+=== Step 6: Integrate the probes into the central NAGIOS system ===
-== Scenario5 meetings  ==
+== Scenario5 meetings  ==
 A first dedicated meeting is being organised through this doodle:
 http://doodle.com/dysckvz4yvmasw4k
 == Scenario5 Actions  ==

Difference between revisions of "Fedcloud-tf:WorkGroups:Scenario5"

Revision as of 19:24, 22 November 2011

Contents

Scenario 5: Reliability/Availability of Resource Providers

Scenario collaborators

What Monitoring means in this context

The proposed approach

Possible evolution

NAGIOS Probes

The EGI SAM System

Operative Steps

Step 1: Setup the Scenario5 group

Step 2: Agree on the proposed approach

Step 3: Identify a Resource Provider that will host the central nagios instance

Step 4: Create a skeleton probe with a volunteer TP

Step 5: Advertise the skeleton probe and use it as template

Step 6: Integrate the probes into the central NAGIOS system

Scenario5 meetings

Scenario5 Actions

Further Resources

Navigation menu

Difference between revisions of "Fedcloud-tf:WorkGroups:Scenario5"

Revision as of 19:24, 22 November 2011

Scenario 5: Reliability/Availability of Resource Providers

Scenario collaborators

What Monitoring means in this context

The proposed approach

Possible evolution

NAGIOS Probes

The EGI SAM System

Operative Steps

Step 1: Setup the Scenario5 group

Step 2: Agree on the proposed approach

Step 3: Identify a Resource Provider that will host the central nagios instance

Step 4: Create a skeleton probe with a volunteer TP

Step 5: Advertise the skeleton probe and use it as template

Step 6: Integrate the probes into the central NAGIOS system

Scenario5 meetings

Scenario5 Actions

Further Resources

Navigation menu

Search