Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

EGI-InSPIRE:Tasks-Y3

From EGIWiki
Revision as of 22:06, 24 December 2014 by Krakow (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
EGI Inspire Main page



Introduction

This page contains the description of the EGI global and international tasks as updated according to EGI-InSPIRE milestones MS115 and MS116. Within the following categories the services are grouped as follows: ‘Governance and Management’ for coordination; ‘Community Engagement’ for outreach and support of user community interaction; ‘Platforms’ for technical activities; and 'Operations and Tools' for infrastructure services and tools.

Global Tasks

The EGI Global tasks are the responsibility of the EGI.eu organisation and are undertaken by EGI.eu staff in Amsterdam or by staff based at participants and associated participants institutions funded through EGI.eu, the NGIs and the EC through the EGI-InSPIRE project for the benefit and use of all.

Governance

The strategic direction of the EGI ecosystem and the collaboration between the individual activities is undertaken by the EGI Council. It also acts as the senior decision-making and supervisory authority of EGI.eu and as the organisation representing the EGI collaboration.

Council

The EGI Council is the senior governance body of the EGI collaboration which is established through the EGI.eu foundation based in Amsterdam. The EGI Council meets regularly through the year and as required by its statutes [R37] and Terms of References [R38] standing orders to govern the strategic development and activity of EGI. It is required to approve the annual accounts and budget for EGI.eu, which includes the fees paid by the participants to cover the running costs of EGI.eu and the EGI Global Tasks delivered within the community.

Executive Board

The EGI.eu Executive Board is comprised of six elected representatives from the EGI Council and the EGI Council Chair, who also chairs the EGI.eu Executive Board meetings. The EGI.eu Director and Deputy Director are permanent observers. The Board meets every 2 weeks by phone with quarterly F2F meetings to prepare the annual budget, review the annual accounts and to supervise and advise the Director on the day-to-day running of the organisation. Additional F2F meetings are held during the year to discuss the agenda for EGI Council meetings and discuss any other urgent issues.

Strategic Planning and Policy

Resulting from the first project review, the name of this task was expanded from ‘Policy Development’ to ‘Strategic Planning and Policy Support’. The new task description includes activities at the strategic level that were started during the first year and are now structured to better reflect team activities and support the EGI strategy development. This activity is led by the EGI.eu Strategy and Policy Team (SPT) and encompasses the development of strategies and policies within and external to EGI.eu relating to governance, standardisation and integration with other infrastructures. The team also develops EGI’s strategic response and alignment to EU policy and EC initiatives, such as EU2020, the Digital Agenda and the online ERA, and supports the boards and committees within EGI that draft policies and procedures for evolving the technical infrastructure. The main objectives are to analyse strategic themes and trends globally and in Europe and produce documents and reports to inform the EGI management bodies and wider community to support the decision-making process; liaise with other projects and organisations, including industry and international policy bodies to establish collaboration agreements and monitor progress; organise meetings and workshops on key themes that are strategic to EGI and attend relevant events and conferences; and support the formulation and development of policies and procedures by the EGI policy groups (e.g. security, technology coordination, operations management).

Finance and Secretariat

An organisation needs a secretariat to support its governance functions, but also to support the community and the staff it employs. Within EGI.eu, support is provided during Council and Executive Boards meetings, community support is provided through a range of IT services to local staff and to the collaboration (e.g. website, wiki, meeting planner, mailing lists, document server, timesheet tool). In addition, the community organises two large meetings a year (the User/Community and Technical Forums) to continue the building collaborations within EGI and a number of additional workshops as required to support the community’s activities.

Technical Management

User Community Board (UCB)

A forum whereby representatives from self-organised virtual research communities (VRCs) meet to review and agree on the prioritisation of the emerging requirements for their use of EGI resources on a regular basis. The VRC model encourages researchers to identify and communicate with others in their field in order to capture the needs particular to their field of expertise and articulate them to EGI.

Operations Management Board (OMB)

The OMB drives future developments in the operations area by making sure that operations evolve with the needs of the community and to support the integration of new resources and middleware platforms (e.g. desktop grids, virtual machines, high performance computing). It does this by providing coordination and management and by developing policies and procedures for the operational services that are integrated into the production infrastructure through the operational support of distributed operations teams. Coordination of software deployment and feedback gathering is delivered through fortnightly operations meetings.

Technical Coordination Board (TCB)

It coordinates the interactions that EGI has with its technology providers. This involves combining the prioritised requirements from the operations and end-user communities into a technology roadmap. Elements from this roadmap are sourced from technology providers within the EGI community into the Unified Middleware Distribution (UMD). Before their inclusion into UMD these components are verified against the original requirements to ensure that these have been met.

Technology Roadmapping

Maintaining the technology roadmap for EGI requires the collection, prioritisation and analysis of requirements from the user and operations communities. From these requirements, new features are sourced from technology providers currently known to EGI, or from open-source or commercial technology providers. Components coming from within the EGI community, in order to provide bespoke functionality needed within the production infrastructure that cannot be sourced elsewhere, are captured within the UMD Roadmap. This continuously evolved documentation translates users requirements and technology evolution into a roadmap describing the functional aspects, release dates, maintenance support, acceptance criteria and dependencies for software components that are offered to the Resource Infrastructure Providers for installation.

Community Engagement

Marketing

This activity is coordinated by EGI.eu on behalf of the European NGIs and projects, and other international partners. The aim is to communicate the work of the EGI and its user communities and target audiences for the dissemination outputs to new and existing user communities, journalists, general public, grid research and standards communities, resource providers, collaborating projects, decision makers and governmental representatives. Means for dissemination include the project website, wiki site, materials and publications, media and public relations, social media channels and attendance at events in order to market EGI to new users.

Community Outreach

Regularly bringing EGI stakeholders together is vital in enabling the collaborations within the community and provides an opportunity to showcase EGI's achivements internally and to new user communities. The Community Outreach team of EGI.eu organises two community wide meetings year and - in collaboration with NGIs and user communities - several small, targeted events and workshops.

Technical Outreach to New Communities

Converting a potential new user community to being an actual user community requires substantial effort and planning at the European and national level. This may include identifying which resources will be used within the production infrastructure, ensuring the integration of new resources into EGI, porting applications to an EGI platform, deploying new services to meet the needs of new communities, training new communities, etc. A team of three at EGI.eu provides coordination for this activity and works with the NGI International Liaisons and their national partners in VT projects to ensure that a coordinated, systematic and strategic approach is taken.

Community Technical Services

NGI Coordination

A registry system where leads identified within potential new communities are registered by the NGIs and by EGI.eu staff, and where the main discussions the EGI community has with these leads can be recorded. These enable the community to identify topics for technical engagement with new communities.

Software Acceptance Criteria

Based on the prioritised requirements obtained from the operations and end-user communities, software acceptance criteria are defined to capture the key functional and non-functional features expected from the delivered technologies. Regular review of Quality Criteria is based on collected eedback, such as regular peer reviews, Software Verification, StagedRollout, and infrastructure incidents collected by the DMSU.

Software Verification

Before software is published for production use in the UMD section of the EGI Software Repository, delivered software is verified against the published Quality Criteria, where applicable. Software Verification entails the deployment of the software in a controlled testbed, and check the functional requirements encoded in the Quality Criteria. Verification reports are written and published for any interested party to use as required.

Software Repository

The software repository provides the coordination needed by EGI for the release of software, e.g. the UMD, into production. Technology providers can contribute their software components into the repository, it manages the workflow as the software components are validated to ensure they meet the defined quality criteria and then placed into staged rollout.

Application Database

The EGI Applications Database (AppDB) stores tailor-made computing applications for scientists, and grid application developer tools for software developers. It embraces all scientific fields, from resources that simulate exotic excitation modes in physics, to applications for complex protein sequences analysis. Storing pre-made applications and reusable tools means that scientists and grid application developers can achieve their goals with EGI in a shorter time. The aim for AppDB is twofold: 1) to inspire scientists and developers of DCI applications to use EGI and its resources due to the immediate availability of the software that they need to use; and 2) to avoid duplication of effort across the user and user support communities.

Training Marketplace

The training services are aimed at supporting cooperation between trainers and users in different localities and projects by connecting the groups through the activities that are established within the NGIs and scientific clusters. The goal is to enable users to achieve better scientific performance when using EGI and guide the establishment of self-sustainable user communities. Among the provided services include a list of training events, which allows trainers to advertise their training events and to be made aware of other training events being run within the community. The marketplace includes a map of these training events, a repository of training materials and other resources and a web gadget that can be used to embed customised instances of these services into different websites.

Core Services

Auxiliary core services are needed for the good running of Infrastructure Services. Examples of such services are VOMS service and VO membership management for infrastructural VOs (DTEAM, OPS), the provisioning of middleware services needed by the monitoring infrastructure (e.g. top-BDII and WMS), and the catch-all CA.

Operations and Tools

Infrastructure Services and Tools

Message Broker Network

EGI provides a network of brokers, as a messaging common infrastructure for the exchange of information betwen operational tools and other systems.

Monitoring

The Monitoring Infrastructure is a distributed service based on Nagios and messaging. The central service include systems such as the MyEGI portal for the visualisation of information, and a set of databases for the persistent storage of information about test results, Availability statistics, monitoring profiles and aggregated topology information. The central services need to interact with the local monitoring infrastructures operated by the Resource infrastructure Providers.

Service Availability Monitoring

The Service Availability Monitoring Infrastructure is a distributed service based on Nagios and messaging. The central service include systems such as the MyEGI portal for the visualisation of information, and a set of databases for the persistent storage of information about test results, Availability statistics, monitoring profiles and aggregated topology information. The central services need to interact with the local monitoring infrastructures operated by the Resource infrastructure Providers.

Security monitoring

Security Monitoring is an important part of Security in a distributed infrastructure. One of the EGI CSIRT activities is to provide EGI, NGI and site security staff with tools and procedures to contain security incidents and to monitor sites for weaknesses that could lead to an incident. Tools have been and continue to be developed to allow monitoring both at Site and NGI level, as well as EGI level by CSIRT members themselves. EGI CSIRT collects various pieces of information on the infrastructure, using security probes and sensors developed by EGI CSIRT members. Data collected by these probes (e.g. if a site is running a vulnerable version of some software) is displayed on the visualization tool, known as The Security Dashboard, to provide high-level overviews to staff at various levels according to their authorisation. This includes sufficient detail to allow staff to resolve any issues detected. Members of the EGI CSIRT can view all details and if necessary, follow up with sites to assist them to address any security issue. The system also archives information to allow the evaluation of the security trends. Further function such as security metrics and monthly or quarterly security reports are being developed.

Network monitoring

EGI is a highly distributed networked infrastructure of grid services using network connectivity for remote job submission, data transfer and data access, hence tools are needed for network troubleshooting and performance monitoring.

Operations Portal

EGI.eu provides a central portal for the operations community that offers a bundle of different capabilities, such as the broadcast tool, VO management facilities, a security dashboard and an operations dashboard that is used to display information about failing monitoring probes and to open tickets to the Resource Centres affected. The dashboard also supports the central grid oversight activities. It is fully interfaced with the EGI Helpdesk and the monitoring system through messaging. It is a critical component as it is used by all EGI Operations Centres to provide support to the respective Resource Centres.

Accounting

The EGI Accounting Infrastructure is distributed. At a central level it includes the repositories for the persistent storage of usage records, and a portal for the visualisation of accounting information. The central databases are populated through individual usage records published by the Resource Centres, or through the publication of summarised usage records. The Accounting Infrastructure is essential in a service-oriented business model to record usage information. Accounting data needs to be validated and regularly published centrally.

Helpdesk

EGI provides support to users and operators through a distributed helpdesk with central coordination (GGUS). The central helpdesk provides a single interface for support. The central system is interfaced to a variety of other ticketing systems at the NGI level in order to allow a bi-directional exchange of tickets (for example, those opened locally can be passed to the central instance or other areas, while user and operational problem tickets can be open centrally and subsequently routed to the NGI local support infrastructures).

GOCDB

EGI relies on a central registry (GOCDB) to record information about different entities such as the Operations Centres, the Resource Centres, service endpoints and the contact information and roles of people responsible of operations at different levels. GOCDB is a source of information for many other operational tools, such as the broadcast tool, the Aggregated Topology Provider, the Accounting Portal, etc.

Metrics Portal

The Metrics Portal is the tool for the registration of EGI-InSPIRE metrics.

Support

EGI.eu coordinates and supervises operations and network support activities provided by the individual NGIs to ensure that operational issues are properly handled at both Resource Centre and NGI level. It is also responsible of handling of Resource Centre suspension in case of operational issues.

1st Level: Ticket Process Management

Through the EGI helpdesk support issues are routed through to NGI support teams. Some of these requests may be related to specific support units but others issues relating to users’ use of the e-infrastructure will require human intervention either from an operational or user support aspect.

2nd Level: Deployed Middleware Support Unit

The Deployed Middleware Support Unit provides technical support for incidents around operative Grid Middleware. Processing support tickets assigned by TPM, the DMSU assesses whether changing middleware configuration or deployment can mitigate the described incident. In conjunction with 3rd level expert support provided by Technology Providers, the DMSU assesses whether the reported incident constitutes a persistent software problem, which requires fixing through software update cycles. Inhabiting this pivotal position within the Grid Middleware related support infrastructure, the DMSU is empowered to actively assign and maintain prioritisation of patch development and publication in Software updates.

Network Support

EGI provides network support for the resolution of end-to-end network performance issues.

Operations Management and Coordination

Operations Coordination

See OMB.

Grid Oversight (COD)

EGI.eu central Grid oversight activities are intended to supervise the activity performed locally by the Regional Operator on Duty (ROD) teams of the EGI Operations Centres. Central Grid oversight assist existing ROD teams in user and operations support, check the monthly performance delivered by Resource Centres and NGIs/EIROs, hold the responsibility of certifying new Operations Centres, provide training to new ROD teams also assist existing ROD teams in user and operations support. The quality of the support work delivered by the ROD teams is measured through a ROD performance index that is computed on a monthly basis. Central Grid Oversight is responsible of taking appropriate actions if metrics indicate that a ROD is not functioning properly.

Availability/Reliability Management

Availability/Reliability Management is responsible of overseeing of monthly service levels delivered at different levels by Resource Centres, by Resource infrastructure Providers and centrally by EGI.eu. In case of low performance, the service providers are generally contacted to provide plans of improvement of their services. In case of extended underperformance Resource Centres are suspended. This service is also responsible of producing updated performance reports in case problems with the computations are reported.

Coordination of Operations Security

Security is recognised as an important aspect of e-Infrastructures and requires co-ordination between the EGI participants at various levels, in particular for the prevention and handling of incidents. Various EGI central groups carry out this co-ordination role. The security policy group (SPG) is responsible for developing security policies. The Software Vulnerability Group (SVG) aims to eliminate existing software vulnerabilities from the deployed infrastructure and prevent the introduction of new ones. The EGI Computer Security Incident Response Team (CSIRT) is responsible co-ordinating operational security in areas of security incident response, security monitoring, security training and dissemination, as well as carrying out security drills (cyber-security exercise) to improve the response to future incidents.

Coordination of Interoperation

EGI.eu coordination is necessary to ensure a successful interoperation of the various stakeholders: Resource Centres, Technology Providers, the EGI.eu Technical Manager and the EGI repository managers.

Coordination of Staged Rollout and Related Support Tools

New technology releases made available to EGI, are verified to ensure that they meet the original requirements and subsequently gradually deployed in the production environment (staged rollout). Verification takes place by deploying and assessing the software against the publicly published criteria. Updates of deployed software need to be gradually adopted in production after internal verification. This process is implemented in EGI through staged rollout, i.e. through the early deployment of a new component by a selected list of candidate Resource Centres. The successful verification of a new component is a precondition for declaring the software ready for deployment. Given the scale of EGI, change management requires careful coordination to ensure that every new capability is verified by a representative pool of candidate sites, to supervise the responsiveness of the candidate sites and ensure that the staged rollout progresses well without introducing unnecessary delays, and to review the reports produced. It also ensures the planning of resources according to the foreseen release schedules from the Technology Providers.

Coordination of Requirements Gathering

A transparent requirement processing system is needed to offer a system where the user or operations community can requirements, or to share them within the whole EGI community. All of these requirements are investigated, analysed and prioritised within a transparent and structured process. The prioritised requirements can then be acted upon by other parties as appropriate. Depending on the domain and potential impact, identified needs might be met by the User Support Teams or Operations within EGI or by technology providers external to EGI be they community-based, project-based or commercial. The progress and outcomes of whichever solutions are adopted will be fed back to the requesting community on a regular basis.

Coordination of Documentation

EGI.eu is responsible of maintenance and development of operational documentation, procedures, best practices, etc. EGI.eu provides coordination of this community activity needed to connect partners with specialized expertise.

NGI International Tasks

The NGI International Tasks are the responsibility of the individual NGI to deliver the task to a satisfactory level, funded through the NGI‘s own budget with currently a contribution from the EC through the EGI-InSPIRE project. Staff in EGI.eu is there to coordinate the staff undertaking the NGI International Tasks – they have no managerial control over them.

Community Engagement

NGI International Liaison

The role of the NGI International Liaisons is new to the non-operational activities in EGI, but replicates a similar model that has proven to be successful in the EGI operations community. It recognises the complexity and diversity of individual NGIs yet the need for each NGI to be encapsulated through a management structure for the purpose of providing consistent and integrated European wide activity. It is not necessarily the role of the NGI International Liaisons to undertake any of the following tasks, but instead to make sure the appropriate individuals or teams within the NGI respond to any particular non-operational issue or activity that is requested. These issues or activity may include matters of policy, strategy, dissemination, training, outreach, events, etc. but will have a focus around new communities and sustainability. The NGI International Liaisons may need to access other technical expertise within the NGI to carry out these tasks by using resources from SA1 and from outside the project. One of the key functions of the NGI International Liaisons is to identify technical expertise within an individual NGI that can be brought to tackle issues of importance to the EGI community as a whole.

Distributed Competency Centre

Distributed competency centres have been established across the NGIs by providing a web based registry of human skills and technical assets that reside within the NGIs that can be accessed by the EGI community. The recorded skills and assets focus around the NA2 tasks of communication and marketing, strategic planning and policy support, community outreach and technical outreach to new communities. Each NGI records the effort they contribute locally to activities undertaken as part of the EGI community in conjunction with the other NA2 tasks. Such activities include helping new communities with the integration of their applications into the infrastructure through exemplar ‘proof of concepts’ that could involve a workshop to establish community priorities (through TNA2.4), technical effort (porting new applications to the infrastructure, integrating the applications into portals, workflow engines or other services using effort contributed by the NGIs coordinated by TNA2.5), communication and marketing (using skills in TNA2.2) telling the target communities about this exemplar and possibly updating policies (from TNA2.3) to establish new modes of operation within the production infrastructure.

Operations

NGI Activity Management

NGIs are responsible for coordinating internal operational activities and to participate to the OMB for coordination at the EGI level.

A Secure Infrastructure

The aim of this task is to address the various operational security-related risks and to maintain the availability of EGI services. This task covers all aspects of operational security including Security Incident Coordination and Security Vulnerability Handling. It relies on the GOCDB Security contact information and the security related policy work done under NA2.

Service Deployment

This task ensures that new software releases (for operational tools, and global and site services) is deployed safely and reliably without any degradation of service to the production grid infrastructure, and while maintaining interoperability with other grids infrastructures. This is achieved through a managed staged roll-out of middleware and operational tools. In collaboration with NGIs and end-user communities new software releases are deployed to build operational and user experience.

Infrastructure for Grid Management

The purpose of this task is the deployment of the infrastructure for Grid management consisting of a set of services and tools needed by the NGI/EIRO Operations Centres for the running of the Grid software services, for Grid monitoring (including SLA and security monitoring), and ongoing Grid management. At the core of this infrastructure is a set of monitoring tools to be deployed in all NGIs to monitor their sites. Above this will sit higher-level monitoring of global services and automated measurement of various service and site-reliability metrics.

Accounting

This task provides a reliable record of the usage of the infrastructure for users, VOs, NGI and EGI management. Access to data is restricted according to agreed policies and NGI/EIRO privacy laws. Overall, this task provides: securely and reliably run accounting repositories for EGI, and if desired at the NGI-level; a portal to provide on-demand visualisation and/or data download. Developments needed to account for additional resources to support new business models are described in JRA1.

Helpdesk Infrastructure

This task is linked to the central EGI Helpdesk available to all NGIs and related support projects. NGIs integrate their own national helpdesk into EGIs through an agreed interface or use the EGI Helpdesk remotely. Standard procedures for handling tickets, passing them between helpdesks, escalating them is established based on the experiences from previous projects.

Support Teams

This task brings together the various teams of people handling support issues for users, sites and the network within the production infrastructure. It does not merge them into a common team as the skills required differ, but it ensures the infrastructure is in place and the teams are trained and resourced and all the required documentation is in place.

Providing a Reliable Grid Infrastructure

This task ensures that sites and operational and middleware services are functional, reliable, and responsive. It achieves this through subtasks on: production grid services, interoperability, best practices and service level agreements. It also has dependencies on other subtasks which manage the human support teams, security, helpdesks, and the monitoring and management infrastructure.


PY2 Tasks can be found here: EGI-InSPIRE-Tasks-Y2