Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Operational tools information"

From EGIWiki
Jump to navigation Jump to search
Line 60: Line 60:
* [https://forge.in2p3.fr/projects/show/opsportaluser Bug/task tracking system]
* [https://forge.in2p3.fr/projects/show/opsportaluser Bug/task tracking system]
* [https://cvs.in2p3.fr/operations-portal/package/installation-guide.pdf?revision=HEAD Installation of a Dashboard Regional Instance]
* [https://cvs.in2p3.fr/operations-portal/package/installation-guide.pdf?revision=HEAD Installation of a Dashboard Regional Instance]
=== Service Availability Monitoring ===
The Service Availability Monitoring (SAM) system is used to monitor the resources within the production infrastructure. SAM monitoring data is used for calculation of availability and reliability of grid sites.
It includes the following components:
* probes: a test execution framework (based on the open source monitoring framework Nagios) and the Nagios Configuration Generator (NCG)
* the Aggregated Topology Provider (ATP), the Metrics Description Database (MDDB), and the Metrics Results Database (MRDB)
* the message bus to publish results and a programmatic interface
* the visualization portal (MyEGI).
'''Main links:'''
* [[SAM Instances]]
* NEW! [https://tomtools.cern.ch/confluence/display/SAMDOC/Grid+probes Grid probes] from org.SAM package
* [[EMI Nagios probes]]
==== Documentation ====
'''Installation instructions'''
* [https://tomtools.cern.ch/confluence/display/SAMDOC/SAM+Installation+Guide Installation Instruction -NEW Confluence page]
* [https://twiki.cern.ch/twiki/bin/view/EGEE/GridMonitoringNcgYaim NAGIOS&NCG YAim Based Installation Instruction -OLD page with YAIM variables definition]
* [https://tomtools.cern.ch/confluence/display/SAMDOC/Service+reference+card+-+egee-NAGIOS SAM/NAGIOS Reference Card for sitemanger]
* [https://tomtools.cern.ch/confluence/display/SAMDOC/SAM+Administrators+FAQ SAM Administrators FAQ]
* [https://tomtools.cern.ch/confluence/display/SAM/Setting+Nagios+to+monitor+uncertified+sites Setting NAGIOS to Monitor Uncertified Sites]
'''Tests list'''
* [[SAM Tests]]
'''Tools information pages:'''
* '''MyEGI'''
** [https://tomtools.cern.ch/confluence/display/SAM/MyEGI/ MyEGI documentation]
** [https://tomtools.cern.ch/confluence/display/SAM/MyEGI+Web+Services+Specification MyEGI Web Services Specification]
<!-- * [https://twiki.cern.ch/twiki/bin/view/LCG/SAMProbesMetrics SAM Probes and Metrics] -->
* '''NCG''':
** [https://twiki.cern.ch/twiki/bin/view/EGEE/GridMonitoringNcgOverview NCG Component Overview]
** [https://twiki.cern.ch/twiki/bin/view/EGEE/GridMonitoringNcgRecipes Grid Monitoring Specific Ncg Recipes]
<!-- obsoleted * [https://twiki.cern.ch/twiki/bin/view/EGEE/MyEGEE MyEGEE Documentation]-->
* [https://twiki.cern.ch/twiki/bin/view/LCG/ATP Aggregated Topology Provider] (ATP)
* [https://tomtools.cern.ch/jira JIRA SAM project tracking system]
'''Procedures'''
* [https://twiki.cern.ch/twiki/bin/view/EGEE/ValidateROCNagios Validate ROC or NGI Nagios Procedures]
* [[Procedure for adding new probes to SAM release]]
* [[Procedure for setting Nagios test an Availability test]]
'''Resources'''
* [https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview Multi Level Monitoring Overview]
* [https://twiki.cern.ch/twiki/pub/LCG/GridView/Gridview_Service_Availability_Computation.pdf A/R algorithms]
* [https://twiki.cern.ch/twiki/bin/view/EGEE/ExternalROCNagios Deployed ROC and NGI Nagios]
* [https://twiki.cern.ch/twiki/bin/view/EGEE/OAT_EGEE_III Main EGEE OAT wiki]


=== GOCDB ===
=== GOCDB ===

Revision as of 15:20, 11 February 2011

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Tools menu: Main page Instructions for developers AAI Proxy Accounting Portal Accounting Repository AppDB ARGO GGUS GOCDB
Message brokers Licenses OTAGs Operations Portal Perun EGI Collaboration tools LToS EGI Workload Manager



The Operation tools page provides information about operation tools available in EGI.

Quick links

Tool Link
Operations portal https://operations-portal.egi.eu/

The old CIC portal

Service Availability Monitoring SAM Instances
GOCDB https://goc.egi.eu/
GGUS https://gus.fzk.de/pages/home.php
Accounting portal http://accounting.egi.eu/
Metrics portal http://metrics.egi.eu/
Gstat http://gstat.egi.eu/ and WLCG Gstat
GridView http://gridview.cern.ch/GRIDVIEW/same_index.php
Network monitoring Network

Deployment plans

Tools

Individual operation tools are described in sections below. Currently each tool is hosted on a different address. In the future all tools will be integrated into single Operations portal.

Operations portal

The operations portal consists of web pages providing information to various actors (NGI Operations Centres, VO managers, etc.) along with related facilities, such as the VO registration tool, the broadcast and downtime system, the periodic, operations report submission system, the regional dashboard, etc. The programme of work includes tool maintenance (bug fixing and enhancement for the failover configuration).

Main links:

Documentation

GOCDB

Grid Configuration Database (GOCDB) contains general information about the sites participating to the production Grid. Accessed by all the project actors (end-users, sitemanagers, NGI mangers, support teams, VO managers), by other tools and by third party middleware in order to get Grid topology. The portal has a single central installation but a regional package will be developed and deployed on the interested NGIs.

Main links:

Documentation

GGUS

The Global Grid User Support (GGUS) system is the primary means by which users request support when they are using the grid. The GGUS system is the main support access point for the EGI project. The GGUS system creates a trouble ticket to record the request and tracks the ticket from creation through to solve. There are two ways in which a user can submit at request via email or the web interface.

Main links:

Documentation

Accounting portal

The accounting infrastructure is a complex system that involves various sensors in different regions, all publishing data to a central repository. The data is processed, summarized and displayed in the accounting portal, which acts as a common interface to the different accounting record providers and presents a homogeneous view of the data gathered and a user-friendly access to understanding resource utilization.

Main links:

Documentation

Metrics portal

The Metrics Portal displays a set of metrics that will be used to monitor the performance of the infrastructure and the project, and to track their changes over time. The portal automatically collects all the required data and calculates these metrics before displaying them in the portal. The portal aggregates information from different sources such as GOCDB, GGUS, GridView, etc. using various connectors provided by the data provider. These connectors translate the information gathered from diverse producers and store it in a local database.

Main links:

Documentation

Network monitoring

A light-weight end-to-end network performance monitoring infrastructure is coordinated and its configuration support provided by EGI.eu. These tools are used to troubleshoot network connectivity issues, such as end-to-end network performance affecting Grid data transfers.

Main links:

Documentation

External tools

Gstat

The main aim of GStat is to display information about grid services, the grid information system itself and related metrics. Gstat provides a method to visualize a grid infrastructure from an operational perspective based on information found in the grid information system (BDII).

Main links:

Documentation

GridView

Gridview is a monitoring and visualization tool being developed to provide a high level view of various functional aspects of the Worldwide LHC Computing Grid (LCG). Currently it shows the statistics of data transfers, FTS file transfers, jobs running and service availability information for the WLCG.

Main links:

Documentation


Tferrari 12:15, 4 February 2011 (UTC)