Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI-InSPIRE:SA1 EGI Global tasks assessments MS108"

From EGIWiki
Jump to navigation Jump to search
 
(32 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{Template:Notes To Contributors}}
{{Template:EGI-Inspire menubar}}
[[Category:Metrics]]
 
__NOTOC__
__NOTOC__
=EGI Global Tasks=
=EGI Global Tasks=
Line 9: Line 9:


The Operations Management Board (OMB) drives future developments in the operations area by making sure that the infrastructure operations evolve to support the integration of new resources such as desktop grids, cloud computing and virtualisation, and high performance computing resources. It does this by providing management and developing policies and procedures for the operational services that are integrated into the production infrastructure through a set of distributed management and product teams.
The Operations Management Board (OMB) drives future developments in the operations area by making sure that the infrastructure operations evolve to support the integration of new resources such as desktop grids, cloud computing and virtualisation, and high performance computing resources. It does this by providing management and developing policies and procedures for the operational services that are integrated into the production infrastructure through a set of distributed management and product teams.
{{Template:Assessment_Template
|O_Task_number = #
|O_Task_name = Operations Management Board
|O_Task_assessment = write here
|O_Task_score = write here
|O_Task_howtoimprove = write here
|
|
}}


===Operations Support ===
===Operations Support ===
Line 25: Line 15:
==== Coordination of Grid Oversight ====
==== Coordination of Grid Oversight ====
EGI operations oversight (COD)
EGI operations oversight (COD)
 
Partner: NCF - coordinator ([[MS108 Grid oversight| Fill assessment]])
Partner: NCF (([[MS108 Grid oversight| Fill assessment]])
{{Template:Assessment_Template
|O_Task_number = #
|O_Task_name = Operations Support
|O_Task_assessment = write here
|O_Task_score = write here
|O_Task_howtoimprove = write here
}}


==== Coordination of network support ====
==== Coordination of network support ====
Partner: GARR ([[MS108 Coordination of network support | Fill assessment]])
Partner: GARR ([[MS108 Coordination of network support | Fill assessment]])
{{Template:Assessment_Template
|O_Task_number = #
|O_Task_name = Operations Support
|O_Task_assessment = write here
|O_Task_score = write here
|O_Task_howtoimprove = write here
}}


==== Coordination of Operational interoperation between NGIs and DCIs ====
==== Coordination of Operational interoperation between NGIs and DCIs ====
Line 51: Line 25:
Partner: KTH ([[MS108 interoperation | Fill assessment]])
Partner: KTH ([[MS108 interoperation | Fill assessment]])


=== Coordiantion of operational security ===
==== Coordination of documentation====
Coordination of maintenance and development operational documentation, procedures, best practices.
Partner: CSC ([[MS108 documentation| Fill assessment]])


=== Ticket Processing Management ===
1st line support to the user community and the operations community
Partners: KIT - coordinator ([[MS108 TPM | Fill assessment]])
=== Requirements Gathering ===
This activity is part of the OMB duties.
Partner: EGI.eu ([[MS108 requirements gathering | Fill assessment]])
=== Security ===
Security vulnerabilities and risks presented by e-Infrastructures provide a rationale for coordination amongst the EGI participants at various levels. Central coordination groups ensure policies, operational security, and maintenance to guarantee secure access to users. In addition, security and incident response is provided through the EGI Computer Security and Incident Response Team by coordinating activity at the sites across the infrastructure. This coordination ensures that common policies are followed by providing services such as security monitoring, training and dissemination with the goal of improving the response to incidents (e.g. security drills).
Security vulnerabilities and risks presented by e-Infrastructures provide a rationale for coordination amongst the EGI participants at various levels. Central coordination groups ensure policies, operational security, and maintenance to guarantee secure access to users. In addition, security and incident response is provided through the EGI Computer Security and Incident Response Team by coordinating activity at the sites across the infrastructure. This coordination ensures that common policies are followed by providing services such as security monitoring, training and dissemination with the goal of improving the response to incidents (e.g. security drills).


Partner: STFC ([[MS108 Security|Fill assessment]])
Partner: STFC ([[MS108 Security|Fill assessment]])
----


===Infrastructure Services===
=== Availability/reliability statistics===
This task includes the validation of distribution of monthly availability statistics, and the coordination of the evolution of the EGI OLA framework.


Partner: AUTH ([[MS108 Availability|Fill assessment]])
==Infrastructure Services==


===Software Rollout===
===Software Rollout===
Updates of deployed software need to be gradually adopted in production after internal verification. This process is implemented in EGI through staged rollout, i.e. through the early deployment of a new component by a selected list of candidate Resource Centres. The successful verification of a new component is a precondition for declaring the software ready for deployment. Given the scale of the EGI infrastructure, this process requires careful coordination to ensure that every new capability is verified by a representative pool of candidate sites, to supervise the responsiveness of the candidate sites and ensure that the staged rollout progresses well without introducing unnecessary delays, and to review the reports produced. It also ensures the planning of resources according to the foreseen release schedules from the Technology Providers. EGI.eu coordination is necessary to ensure a successful interoperation of the various stakeholders: Resource Centres, Technology Providers, the EGI.eu Technical Manager and the EGI repository managers.
Updates of deployed software need to be gradually adopted in production after internal verification. This process is implemented in EGI through staged rollout, i.e. through the early deployment of a new component by a selected list of candidate Resource Centres. The successful verification of a new component is a precondition for declaring the software ready for deployment. Given the scale of the EGI infrastructure, this process requires careful coordination to ensure that every new capability is verified by a representative pool of candidate sites, to supervise the responsiveness of the candidate sites and ensure that the staged rollout progresses well without introducing unnecessary delays, and to review the reports produced. It also ensures the planning of resources according to the foreseen release schedules from the Technology Providers. EGI.eu coordination is necessary to ensure a successful interoperation of the various stakeholders: Resource Centres, Technology Providers, the EGI.eu Technical Manager and the EGI repository managers.


Line 72: Line 60:


=== Monitoring ===
=== Monitoring ===
Description: A distributed monitoring framework is necessary to continuously test the level of functionality delivered by each service node instance in the production Resource Centres, to generate alarms and tickets in case of critical failures and to compute monthly availability and reliability statistics, and to monitor and troubleshoot network problems. The Monitoring Infrastructure is a distributed service based on Nagios and messaging. The central services – operated by EGI.eu – include systems such as the MyEGI portal for the visualisation of information, and a set of databases for the persistent storage of information about test results, availability statistics, monitoring profiles and aggregated topology information. The central services need to interact with the local monitoring infrastructures operated by the NGIs. The central monitoring services are critical and need to deliver high availability.
Description: A distributed monitoring framework is necessary to continuously test the level of functionality delivered by each service node instance in the production Resource Centres, to generate alarms and tickets in case of critical failures and to compute monthly availability and reliability statistics, and to monitor and troubleshoot network problems. The Monitoring Infrastructure is a distributed service based on Nagios and messaging. The central services – operated by EGI.eu – include systems such as the MyEGI portal for the visualisation of information, and a set of databases for the persistent storage of information about test results, availability statistics, monitoring profiles and aggregated topology information. The central services need to interact with the local monitoring infrastructures operated by the NGIs. The central monitoring services are critical and need to deliver high availability.
 
* Central SAM monitoring services. Partner: CERN ([[MS108 SAM | Fill assessment]])
==== Central monitoring services ====
* Brokers network. Partner: GRNET and SRCE ([[MS108 broker | Fill assessment]])
MyEGI portal and central databases
* Central Network monitoring tools. Partner: GARR ([[MS108 Network tools| Fill assessment]])
 
Partner: CERN ()
 
{{Template:Assessment_Template
|O_Task_number = #
|O_Task_name = Monitoring
|O_Task_assessment = write here
|O_Task_score = write here
|O_Task_howtoimprove = write here
}}


=== Accounting ===
=== Accounting ===
The EGI Accounting Infrastructure is distributed. At a central level it includes the repositories for the persistent storage of usage records, and a portal for the visualisation of accounting information. The central databases are populated through individual usage records published by the Resource Centres, or through the publication of summarised usage records. The Accounting Infrastructure is essential in a service-oriented business model to record usage information. Accounting data needs to be validated and regularly published centrally.
The EGI Accounting Infrastructure is distributed. At a central level it includes the repositories for the persistent storage of usage records, and a portal for the visualisation of accounting information. The central databases are populated through individual usage records published by the Resource Centres, or through the publication of summarised usage records. The Accounting Infrastructure is essential in a service-oriented business model to record usage information. Accounting data needs to be validated and regularly published centrally.
* Central accounting reposotories. Partner: STFC ([[MS108 apel | Fill assessment]])
* Central accounting portal. Partner: CESGA ([[MS108 accounting portal| Fill assessment]])


==== Central accounting reposotory ====
Partner: STFC


==== Accounting portal ====
=== Security Monitoring ===
Partner: CESGA
The objective of a Security Infrastructure is to protect itself from intrusions such as exploitable software vulnerabilities, misuse by authorised users, resource "theft", etc., while allowing the information, resources and services to remain accessible and productive to its intended users. A specifically designed set of tools and services help reduce these vulnerabilities such as monitoring individual resource centers (based on Nagios and Pakiti), a central security dashboard to allow sites, NGIs and EGI Computer Security Incident Response Teams to access security alerts in a controlled manner, and a ticketing system to support coordination efforts.
{{Template:Assessment_Template
|O_Task_number = #
|O_Task_name = Accounting
|O_Task_assessment = write here
|O_Task_score = write here
|O_Task_howtoimprove = write here
}}


====Metrics portal====
* CSIRT Nagios server. Partner: GRNET ([[MS108 Security tools| Fill assessment]])
* CSIRT Pakiti. Partner CESNET ([[MS108 Pakiti| Fill assessment]])


Partner: CESGA
==== Security ====
A Security Infrastructure is needed to monitor the status of the individual Resource Centres in case of security vulnerabilities. The monitoring infrastructure – currently based on Nagios and Pakiti - is dedicated. A central security dashboard is also needed to allow sites, NGIs and EGI Computer Security Incident Response Teams to access security alerts in a controlled manner. In addition, a ticketing system is needed to support security incident coordination.
{{Template:Assessment_Template
|O_Task_number = #
|O_Task_name = Security
|O_Task_assessment = write here
|O_Task_score = write here
|O_Task_howtoimprove = write here
}}
==== Configuration Repository ====


=== Configuration Repository (GOCDB) ===
EGI relies on a central database (GOCDB) to record static information about different entities such as the Operations Centres, the Resource Centres, and the service instances. It also provides contact, role and status information. GOCDB is a source of information for many other operational tools, such as the broadcast tool, the Aggregated Topology Provider, etc.
EGI relies on a central database (GOCDB) to record static information about different entities such as the Operations Centres, the Resource Centres, and the service instances. It also provides contact, role and status information. GOCDB is a source of information for many other operational tools, such as the broadcast tool, the Aggregated Topology Provider, etc.


{{Template:Assessment_Template
Partner: STFC ([[MS108 GOCDB| Fill assessment]])
|O_Task_number = #
|O_Task_name = Configuration repository
|O_Task_assessment = write here
|O_Task_score = write here
|O_Task_howtoimprove = write here
}}
 
===Deployment of central operational tools ===
==== Operations Portal ====
 
Partner: IN2P3


=== Operations Portal ===
EGI.eu provides a central portal for the operations community that offers a bundle of different capabilities, such as the broadcast tool, VO management facilities, and a dashboard for grid operators that is used to display information about failing monitoring probes and to open tickets to the Resource Centres affected. The dashboard also supports the central grid oversight activities. It is fully interfaced with the EGI Helpdesk and the monitoring system through the message passing. It is a critical component as it is used by all EGI Operations Centres to provide support to the respective Resource Centres.  
EGI.eu provides a central portal for the operations community that offers a bundle of different capabilities, such as the broadcast tool, VO management facilities, and a dashboard for grid operators that is used to display information about failing monitoring probes and to open tickets to the Resource Centres affected. The dashboard also supports the central grid oversight activities. It is fully interfaced with the EGI Helpdesk and the monitoring system through the message passing. It is a critical component as it is used by all EGI Operations Centres to provide support to the respective Resource Centres.  


==== Central Nagios and MyEGI ====
Partner: IN2P3 ([[MS108 Operations Portal| Fill assessment]])
Partner: CERN
 
==== GOCDB ====
 
Partner: STFC
 
==== Central messaging infrastructure ====
 
Partner: AUTH
 
 
==== Central Network monitoring tools ====
 
Partner: GARR
{{Template:Assessment_Template
|O_Task_number = #
|O_Task_name = Operations Portal
|O_Task_assessment = write here
|O_Task_score = write here
|O_Task_howtoimprove = write here
}}
 
==== Helpdesk ====
 
Partner: KIT


=== Helpdesk ===
EGI provides support to users and operators through a distributed helpdesk with central coordination (GGUS). The central helpdesk provides a single interface for support. The central system is interfaced to a variety of other ticketing systems at the NGI level in order to allow a bi-directional exchange of tickets (for example, those opened locally can be passed to the central instance or other areas, while user and operational problem tickets can be open centrally and subsequently routed to the NGI local support infrastructures).
EGI provides support to users and operators through a distributed helpdesk with central coordination (GGUS). The central helpdesk provides a single interface for support. The central system is interfaced to a variety of other ticketing systems at the NGI level in order to allow a bi-directional exchange of tickets (for example, those opened locally can be passed to the central instance or other areas, while user and operational problem tickets can be open centrally and subsequently routed to the NGI local support infrastructures).


{{Template:Assessment_Template
Partner: KIT ([[MS108 GGUS| Fill assessment]])
|O_Task_number = #
|O_Task_name = Helpdesk
|O_Task_assessment = write here
|O_Task_score = write here
|O_Task_howtoimprove = write here
}}
 
 
 
 
==== Core Services ====


=== Core Services ===
Auxiliary core services are needed for the good running of Infrastructure Services. Examples of such services are VOMS service and VO membership management for infrastructural VOs (DTEAM, OPS), the provisioning of middleware services needed by the monitoring infrastructure (e.g. top-BDII and WMS), the catch-all CA and other catch-all core services to support small user communities (central catalogues, workflow schedulers, authentication services).
Auxiliary core services are needed for the good running of Infrastructure Services. Examples of such services are VOMS service and VO membership management for infrastructural VOs (DTEAM, OPS), the provisioning of middleware services needed by the monitoring infrastructure (e.g. top-BDII and WMS), the catch-all CA and other catch-all core services to support small user communities (central catalogues, workflow schedulers, authentication services).
 
Parner: GRNET ([[MS108 core services| Fill assessment]])
{{Template:Assessment_Template
|O_Task_number = #
|O_Task_name = Core Services
|O_Task_assessment = write here
|O_Task_score = write here
|O_Task_howtoimprove = write here
}}
 
===Deployment of central grid middleware services (aka catch-all)===
 
Partner:AUTH
====VOMS service for VOs requesting it C. Kanellopoulos====
Partner: AUTH
 
====Central Nagios infrastructure for security monitoring====
Partner: AUTH
 
====Catch all CA====
Partner: AUTH
 
====Validation and distribution of monthly availability/reliability statistics====
 
Partner: AUTH
 
====Enhancement/extensions of Operational Level Agreements====
 
Partner: AUTH
 
====Coordination of maintenance and development operational documentation, procedures, best practices====
 
Partner: CSC

Latest revision as of 16:17, 6 January 2015

EGI Inspire Main page



EGI Global Tasks

Human Services

Operation Management Board Coordination

Partner: EGI.eu

The Operations Management Board (OMB) drives future developments in the operations area by making sure that the infrastructure operations evolve to support the integration of new resources such as desktop grids, cloud computing and virtualisation, and high performance computing resources. It does this by providing management and developing policies and procedures for the operational services that are integrated into the production infrastructure through a set of distributed management and product teams.

Operations Support

EGI.eu coordinates and supervises operations and network support activities provided by the individual NGIs to ensure that operational issues are properly handled at both Resource Centre and NGI level. It is also responsible of handling of Resource Centre suspension in case of operational issues.

Coordination of Grid Oversight

EGI operations oversight (COD) Partner: NCF - coordinator ( Fill assessment)

Coordination of network support

Partner: GARR ( Fill assessment)

Coordination of Operational interoperation between NGIs and DCIs

EGI coordinates the integration of heterogeneous middleware stacks and Distributed Computing Infrastructures with the EGI operational infrastructures such as: accounting, monitoring, managemenet and support.

Partner: KTH ( Fill assessment)

Coordination of documentation

Coordination of maintenance and development operational documentation, procedures, best practices. Partner: CSC ( Fill assessment)

Ticket Processing Management

1st line support to the user community and the operations community

Partners: KIT - coordinator ( Fill assessment)

Requirements Gathering

This activity is part of the OMB duties. Partner: EGI.eu ( Fill assessment)

Security

Security vulnerabilities and risks presented by e-Infrastructures provide a rationale for coordination amongst the EGI participants at various levels. Central coordination groups ensure policies, operational security, and maintenance to guarantee secure access to users. In addition, security and incident response is provided through the EGI Computer Security and Incident Response Team by coordinating activity at the sites across the infrastructure. This coordination ensures that common policies are followed by providing services such as security monitoring, training and dissemination with the goal of improving the response to incidents (e.g. security drills).

Partner: STFC (Fill assessment)

Availability/reliability statistics

This task includes the validation of distribution of monthly availability statistics, and the coordination of the evolution of the EGI OLA framework.

Partner: AUTH (Fill assessment)

Infrastructure Services

Software Rollout

Updates of deployed software need to be gradually adopted in production after internal verification. This process is implemented in EGI through staged rollout, i.e. through the early deployment of a new component by a selected list of candidate Resource Centres. The successful verification of a new component is a precondition for declaring the software ready for deployment. Given the scale of the EGI infrastructure, this process requires careful coordination to ensure that every new capability is verified by a representative pool of candidate sites, to supervise the responsiveness of the candidate sites and ensure that the staged rollout progresses well without introducing unnecessary delays, and to review the reports produced. It also ensures the planning of resources according to the foreseen release schedules from the Technology Providers. EGI.eu coordination is necessary to ensure a successful interoperation of the various stakeholders: Resource Centres, Technology Providers, the EGI.eu Technical Manager and the EGI repository managers.

This activities includes:

  • Definition and adoption of a workflow to automate software deployment
  • Coordination of the staged rollout activities carried out by the NGIs
  • Liaison with the UMD team (EGI-InSPIRE SA2)

Partner: LIP ( Fill assessment)

Monitoring

Description: A distributed monitoring framework is necessary to continuously test the level of functionality delivered by each service node instance in the production Resource Centres, to generate alarms and tickets in case of critical failures and to compute monthly availability and reliability statistics, and to monitor and troubleshoot network problems. The Monitoring Infrastructure is a distributed service based on Nagios and messaging. The central services – operated by EGI.eu – include systems such as the MyEGI portal for the visualisation of information, and a set of databases for the persistent storage of information about test results, availability statistics, monitoring profiles and aggregated topology information. The central services need to interact with the local monitoring infrastructures operated by the NGIs. The central monitoring services are critical and need to deliver high availability.

Accounting

The EGI Accounting Infrastructure is distributed. At a central level it includes the repositories for the persistent storage of usage records, and a portal for the visualisation of accounting information. The central databases are populated through individual usage records published by the Resource Centres, or through the publication of summarised usage records. The Accounting Infrastructure is essential in a service-oriented business model to record usage information. Accounting data needs to be validated and regularly published centrally.


Security Monitoring

The objective of a Security Infrastructure is to protect itself from intrusions such as exploitable software vulnerabilities, misuse by authorised users, resource "theft", etc., while allowing the information, resources and services to remain accessible and productive to its intended users. A specifically designed set of tools and services help reduce these vulnerabilities such as monitoring individual resource centers (based on Nagios and Pakiti), a central security dashboard to allow sites, NGIs and EGI Computer Security Incident Response Teams to access security alerts in a controlled manner, and a ticketing system to support coordination efforts.


Configuration Repository (GOCDB)

EGI relies on a central database (GOCDB) to record static information about different entities such as the Operations Centres, the Resource Centres, and the service instances. It also provides contact, role and status information. GOCDB is a source of information for many other operational tools, such as the broadcast tool, the Aggregated Topology Provider, etc.

Partner: STFC ( Fill assessment)

Operations Portal

EGI.eu provides a central portal for the operations community that offers a bundle of different capabilities, such as the broadcast tool, VO management facilities, and a dashboard for grid operators that is used to display information about failing monitoring probes and to open tickets to the Resource Centres affected. The dashboard also supports the central grid oversight activities. It is fully interfaced with the EGI Helpdesk and the monitoring system through the message passing. It is a critical component as it is used by all EGI Operations Centres to provide support to the respective Resource Centres.

Partner: IN2P3 ( Fill assessment)

Helpdesk

EGI provides support to users and operators through a distributed helpdesk with central coordination (GGUS). The central helpdesk provides a single interface for support. The central system is interfaced to a variety of other ticketing systems at the NGI level in order to allow a bi-directional exchange of tickets (for example, those opened locally can be passed to the central instance or other areas, while user and operational problem tickets can be open centrally and subsequently routed to the NGI local support infrastructures).

Partner: KIT ( Fill assessment)

Core Services

Auxiliary core services are needed for the good running of Infrastructure Services. Examples of such services are VOMS service and VO membership management for infrastructural VOs (DTEAM, OPS), the provisioning of middleware services needed by the monitoring infrastructure (e.g. top-BDII and WMS), the catch-all CA and other catch-all core services to support small user communities (central catalogues, workflow schedulers, authentication services). Parner: GRNET ( Fill assessment)