Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

EGI-InSPIRE:Tasks-Y2

From EGIWiki
(Redirected from EGI-InSPIRE-Tasks-Y2)
Jump to navigation Jump to search
EGI Inspire Main page



Introduction

This page contains the description of the EGI global and international tasks as updated according to EGI-InSPIRE milestones MS115 and MS116. Within the following categories the services are grouped as follows: ‘Governance and Management’ for coordination; ‘Community Engagement’ for outreach and support of user community interaction; ‘Platforms’ for technical activities; and 'Operations and Tools' for infrastructure services and tools.

Global Tasks

The EGI Global tasks are the responsibility of the EGI.eu organisation and are undertaken by EGI.eu staff in Amsterdam or by staff based at participants and associated participants institutions funded through EGI.eu, the NGIs and the EC through the EGI-InSPIRE project for the benefit and use of all.

Governance

The strategic direction of the EGI ecosystem and the collaboration between the individual activities is undertaken by the EGI Council. It also acts as the senior decision-making and supervisory authority of EGI.eu and as the organisation representing the EGI collaboration.

Council

The EGI Council is the senior governance body of the EGI collaboration which is established through the EGI.eu foundation based in Amsterdam. The EGI Council meets regularly through the year and as required by its statutes [R37] and Terms of References [R38] standing orders to govern the strategic development and activity of EGI. It is required to approve the annual accounts and budget for EGI.eu, which includes the fees paid by the participants to cover the running costs of EGI.eu and the EGI Global Tasks delivered within the community.

Executive Board

The EGI.eu Executive Board is comprised of six elected representatives from the EGI Council and the EGI Council Chair, who also chairs the EGI.eu Executive Board meetings. The EGI.eu Director and Deputy Director are permanent observers. The Board meets every 2 weeks by phone with quarterly F2F meetings to prepare the annual budget, review the annual accounts and to supervise and advise the Director on the day-to-day running of the organisation. Additional F2F meetings are held during the year to discuss the agenda for EGI Council meetings and discuss any other urgent issues.

Strategic Planning and Policy

Resulting from the first project review, the name of this task was expanded from ‘Policy Development’ to ‘Strategic Planning and Policy Support’. The new task description includes activities at the strategic level that were started during the first year and are now structured to better reflect team activities and support the EGI strategy development. This activity is led by the EGI.eu Strategy and Policy Team (SPT) and encompasses the development of strategies and policies within and external to EGI.eu relating to governance, standardisation and integration with other infrastructures. The team also develops EGI’s strategic response and alignment to EU policy and EC initiatives, such as EU2020, the Digital Agenda and the online ERA, and supports the boards and committees within EGI that draft policies and procedures for evolving the technical infrastructure. The main objectives are to analyse strategic themes and trends globally and in Europe and produce documents and reports to inform the EGI management bodies and wider community to support the decision-making process; liaise with other projects and organisations, including industry and international policy bodies to establish collaboration agreements and monitor progress; organise meetings and workshops on key themes that are strategic to EGI and attend relevant events and conferences; and support the formulation and development of policies and procedures by the EGI policy groups (e.g. security, technology coordination, operations management).

Finance and Secretariat

An organisation needs a secretariat to support its governance functions, but also to support the community and the staff it employs. Within EGI.eu, support is provided during Council and Executive Boards meetings, community support is provided through a range of IT services to local staff and to the collaboration (e.g. website, wiki, meeting planner, mailing lists, document server, timesheet tool). In addition, the community organises two large meetings a year (the User/Community and Technical Forums) to continue the building collaborations within EGI and a number of additional workshops as required to support the community’s activities.

Technical Management

User Community Board (UCB)

A forum whereby representatives from self-organised virtual research communities (VRCs) meet to review and agree on the prioritisation of the emerging requirements for their use of EGI resources on a regular basis. The VRC model encourages researchers to identify and communicate with others in their field in order to capture the needs particular to their field of expertise and articulate them to EGI.

Operations Management Board (OMB)

The OMB drives future developments in the operations area by making sure that operations evolve with the needs of the community and to support the integration of new resources and middleware platforms (e.g. desktop grids, virtual machines, high performance computing). It does this by providing coordination and management and by developing policies and procedures for the operational services that are integrated into the production infrastructure through the operational support of distributed operations teams. Coordination of software deployment and feedback gathering is delivered through fortnightly operations meetings.

Technical Coordination Board (TCB)

It coordinates the interactions that EGI has with its technology providers. This involves combining the prioritised requirements from the operations and end-user communities into a technology roadmap. Elements from this roadmap are sourced from technology providers within the EGI community into the Unified Middleware Distribution (UMD). Before their inclusion into UMD these components are verified against the original requirements to ensure that these have been met.

Technology Roadmapping

Maintaining the technology roadmap for EGI requires the collection, prioritisation and analysis of requirements from the user and operations communities. From these requirements, new features are sourced from technology providers currently known to EGI, or from open-source or commercial technology providers. Components coming from within the EGI community, in order to provide bespoke functionality needed within the production infrastructure that cannot be sourced elsewhere, are captured within the UMD Roadmap. This continuously evolved documentation translates users requirements and technology evolution into a roadmap describing the functional aspects, release dates, maintenance support, acceptance criteria and dependencies for software components that are offered to the Resource Infrastructure Providers for installation.

Community Engagement

Marketing and Communication

This activity is coordinated by EGI.eu on behalf of the European NGIs and projects, and other international partners. The aim is to communicate the work of the EGI and its user communities and target audiences for the dissemination outputs to new and existing user communities, journalists, general public, grid research and standards communities, resource providers, collaborating projects, decision makers and governmental representatives. Means for dissemination include the project website, wiki site, materials and publications, media and public relations, social media channels and attendance at events in order to market EGI to new users.

Community Outreach

Regularly bringing EGI stakeholders together is vital in enabling the collaborations within the community and provides an opportunity to showcase EGI's achivements internally and to new user communities. The Community Outreach team of EGI.eu organises two community wide meetings year and - in collaboration with NGIs and user communities - several small, targeted events and workshops.

Technical Outreach to New Communities

Converting a potential new user community to being an actual user community requires substantial effort and planning at the European and national level. This may include identifying which resources will be used within the production infrastructure, ensuring the integration of new resources into EGI, porting applications to an EGI platform, deploying new services to meet the needs of new communities, training new communities, etc. A team of three at EGI.eu provides coordination for this activity and works with the NGI International Liaisons and their national partners in VT projects to ensure that a coordinated, systematic and strategic approach is taken.

Community Technical Services

NGI Coordination

A registry system where leads identified within potential new communities are registered by the NGIs and by EGI.eu staff, and where the main discussions the EGI community has with these leads can be recorded. These enable the community to identify topics for technical engagement with new communities.

Software Acceptance Criteria

Based on the prioritised requirements obtained from the operations and end-user communities, software acceptance criteria are defined to capture the key functional and non-functional features expected from the delivered technologies. Regular review of Quality Criteria is based on collected eedback, such as regular peer reviews, Software Verification, StagedRollout, and infrastructure incidents collected by the DMSU.

Software Verification

Before software is published for production use in the UMD section of the EGI Software Repository, delivered software is verified against the published Quality Criteria, where applicable. Software Verification entails the deployment of the software in a controlled testbed, and check the functional requirements encoded in the Quality Criteria. Verification reports are written and published for any interested party to use as required.

Software Repository

The software repository provides the coordination needed by EGI for the release of software, e.g. the UMD, into production. Technology providers can contribute their software components into the repository, it manages the workflow as the software components are validated to ensure they meet the defined quality criteria and then placed into staged rollout.

Application Database

The EGI Applications Database (AppDB) stores tailor-made computing applications for scientists, and grid application developer tools for software developers. It embraces all scientific fields, from resources that simulate exotic excitation modes in physics, to applications for complex protein sequences analysis. Storing pre-made applications and reusable tools means that scientists and grid application developers can achieve their goals with EGI in a shorter time. The aim for AppDB is twofold: 1) to inspire scientists and developers of DCI applications to use EGI and its resources due to the immediate availability of the software that they need to use; and 2) to avoid duplication of effort across the user and user support communities.

Training Marketplace

The training services are aimed at supporting cooperation between trainers and users in different localities and projects by connecting the groups through the activities that are established within the NGIs and scientific clusters. The goal is to enable users to achieve better scientific performance when using EGI and guide the establishment of self-sustainable user communities. Among the provided services include a list of training events, which allows trainers to advertise their training events and to be made aware of other training events being run within the community. The marketplace includes a map of these training events, a repository of training materials and other resources and a web gadget that can be used to embed customised instances of these services into different websites.

Core Services

Auxiliary core services are needed for the good running of Infrastructure Services. Examples of such services are VOMS service and VO membership management for infrastructural VOs (DTEAM, OPS), the provisioning of middleware services needed by the monitoring infrastructure (e.g. top-BDII and WMS), and the catch-all CA.

Operations and Tools

Infrastructure Services and Tools

Message Broker Network

EGI provides a network of brokers, as a messaging common infrastructure for the exchange of information betwen operational tools and other systems.

Monitoring

The Monitoring Infrastructure is a distributed service based on Nagios and messaging. The central service include systems such as the MyEGI portal for the visualisation of information, and a set of databases for the persistent storage of information about test results, Availability statistics, monitoring profiles and aggregated topology information. The central services need to interact with the local monitoring infrastructures operated by the Resource infrastructure Providers.

Service Availability Monitoring

The Service Availability Monitoring Infrastructure is a distributed service based on Nagios and messaging. The central service include systems such as the MyEGI portal for the visualisation of information, and a set of databases for the persistent storage of information about test results, Availability statistics, monitoring profiles and aggregated topology information. The central services need to interact with the local monitoring infrastructures operated by the Resource infrastructure Providers.

Security monitoring

Security Monitoring is an important part of Security in a distributed infrastructure. One of the EGI CSIRT activities is to provide EGI, NGI and site security staff with tools and procedures to contain security incidents and to monitor sites for weaknesses that could lead to an incident. Tools have been and continue to be developed to allow monitoring both at Site and NGI level, as well as EGI level by CSIRT members themselves. EGI CSIRT collects various pieces of information on the infrastructure, using security probes and sensors developed by EGI CSIRT members. Data collected by these probes (e.g. if a site is running a vulnerable version of some software) is displayed on the visualization tool, known as The Security Dashboard, to provide high-level overviews to staff at various levels according to their authorisation. This includes sufficient detail to allow staff to resolve any issues detected. Members of the EGI CSIRT can view all details and if necessary, follow up with sites to assist them to address any security issue. The system also archives information to allow the evaluation of the security trends. Further function such as security metrics and monthly or quarterly security reports are being developed.

Network monitoring

EGI is a highly distributed networked infrastructure of grid services using network connectivity for remote job submission, data transfer and data access, hence tools are needed for network troubleshooting and performance monitoring.

Operations Portal

EGI.eu provides a central portal for the operations community that offers a bundle of different capabilities, such as the broadcast tool, VO management facilities, a security dashboard and an operations dashboard that is used to display information about failing monitoring probes and to open tickets to the Resource Centres affected. The dashboard also supports the central grid oversight activities. It is fully interfaced with the EGI Helpdesk and the monitoring system through messaging. It is a critical component as it is used by all EGI Operations Centres to provide support to the respective Resource Centres.

Accounting

The EGI Accounting Infrastructure is distributed. At a central level it includes the repositories for the persistent storage of usage records, and a portal for the visualisation of accounting information. The central databases are populated through individual usage records published by the Resource Centres, or through the publication of summarised usage records. The Accounting Infrastructure is essential in a service-oriented business model to record usage information. Accounting data needs to be validated and regularly published centrally.

Helpdesk

EGI provides support to users and operators through a distributed helpdesk with central coordination (GGUS). The central helpdesk provides a single interface for support. The central system is interfaced to a variety of other ticketing systems at the NGI level in order to allow a bi-directional exchange of tickets (for example, those opened locally can be passed to the central instance or other areas, while user and operational problem tickets can be open centrally and subsequently routed to the NGI local support infrastructures).

GOCDB

EGI relies on a central registry (GOCDB) to record information about different entities such as the Operations Centres, the Resource Centres, service endpoints and the contact information and roles of people responsible of operations at different levels. GOCDB is a source of information for many other operational tools, such as the broadcast tool, the Aggregated Topology Provider, the Accounting Portal, etc.

Metrics Portal

The Metrics Portal is the tool for the registration of EGI-InSPIRE metrics.

Support

EGI.eu coordinates and supervises operations and network support activities provided by the individual NGIs to ensure that operational issues are properly handled at both Resource Centre and NGI level. It is also responsible of handling of Resource Centre suspension in case of operational issues.

1st Level: Ticket Process Management

Through the EGI helpdesk support issues are routed through to NGI support teams. Some of these requests may be related to specific support units but others issues relating to users’ use of the e-infrastructure will require human intervention either from an operational or user support aspect.

2nd Level: Deployed Middleware Support Unit

The Deployed Middleware Support Unit provides technical support for incidents around operative Grid Middleware. Processing support tickets assigned by TPM, the DMSU assesses whether changing middleware configuration or deployment can mitigate the described incident. In conjunction with 3rd level expert support provided by Technology Providers, the DMSU assesses whether the reported incident constitutes a persistent software problem, which requires fixing through software update cycles. Inhabiting this pivotal position within the Grid Middleware related support infrastructure, the DMSU is empowered to actively assign and maintain prioritisation of patch development and publication in Software updates.

Network Support

EGI provides network support for the resolution of end-to-end network performance issues.

Operations Management and Coordination

Operations Coordination

See OMB.

Grid Oversight (COD)

EGI.eu central Grid oversight activities are intended to supervise the activity performed locally by the Regional Operator on Duty (ROD) teams of the EGI Operations Centres. Central Grid oversight assist existing ROD teams in user and operations support, check the monthly performance delivered by Resource Centres and NGIs/EIROs, hold the responsibility of certifying new Operations Centres, provide training to new ROD teams also assist existing ROD teams in user and operations support. The quality of the support work delivered by the ROD teams is measured through a ROD performance index that is computed on a monthly basis. Central Grid Oversight is responsible of taking appropriate actions if metrics indicate that a ROD is not functioning properly.

Availability/Reliability Management

Availability/Reliability Management is responsible of overseeing of monthly service levels delivered at different levels by Resource Centres, by Resource infrastructure Providers and centrally by EGI.eu. In case of low performance, the service providers are generally contacted to provide plans of improvement of their services. In case of extended underperformance Resource Centres are suspended. This service is also responsible of producing updated performance reports in case problems with the computations are reported.

Coordination of Operations Security

Security is recognised as an important aspect of e-Infrastructures and requires co-ordination between the EGI participants at various levels, in particular for the prevention and handling of incidents. Various EGI central groups carry out this co-ordination role. The security policy group (SPG) is responsible for developing security policies. The Software Vulnerability Group (SVG) aims to eliminate existing software vulnerabilities from the deployed infrastructure and prevent the introduction of new ones. The EGI Computer Security Incident Response Team (CSIRT) is responsible co-ordinating operational security in areas of security incident response, security monitoring, security training and dissemination, as well as carrying out security drills (cyber-security exercise) to improve the response to future incidents.

Coordination of Interoperation

EGI.eu coordination is necessary to ensure a successful interoperation of the various stakeholders: Resource Centres, Technology Providers, the EGI.eu Technical Manager and the EGI repository managers.

Coordination of Staged Rollout and Related Support Tools

New technology releases made available to EGI, are verified to ensure that they meet the original requirements and subsequently gradually deployed in the production environment (staged rollout). Verification takes place by deploying and assessing the software against the publicly published criteria. Updates of deployed software need to be gradually adopted in production after internal verification. This process is implemented in EGI through staged rollout, i.e. through the early deployment of a new component by a selected list of candidate Resource Centres. The successful verification of a new component is a precondition for declaring the software ready for deployment. Given the scale of EGI, change management requires careful coordination to ensure that every new capability is verified by a representative pool of candidate sites, to supervise the responsiveness of the candidate sites and ensure that the staged rollout progresses well without introducing unnecessary delays, and to review the reports produced. It also ensures the planning of resources according to the foreseen release schedules from the Technology Providers.

Coordination of Requirements Gathering

A transparent requirement processing system is needed to offer a system where the user or operations community can requirements, or to share them within the whole EGI community. All of these requirements are investigated, analysed and prioritised within a transparent and structured process. The prioritised requirements can then be acted upon by other parties as appropriate. Depending on the domain and potential impact, identified needs might be met by the User Support Teams or Operations within EGI or by technology providers external to EGI be they community-based, project-based or commercial. The progress and outcomes of whichever solutions are adopted will be fed back to the requesting community on a regular basis.

Coordination of Documentation

EGI.eu is responsible of maintenance and development of operational documentation, procedures, best practices, etc. EGI.eu provides coordination of this community activity needed to connect partners with specialized expertise.

NGI International Tasks

The NGI International Tasks are the responsibility of the individual NGI to deliver the task to a satisfactory level, funded through the NGI‘s own budget with currently a contribution from the EC through the EGI-InSPIRE project. Staff in EGI.eu is there to coordinate the staff undertaking the NGI International Tasks – they have no managerial control over them.

External Relations

Policy Development

Local policy development activities are integrated with those taking place within the EGI.eu Policy Development Team that supports the development of policies and procedures at a European level. It is the local partner who implements policies and procedures locally. Therefore, most of the NGIs responsibilities include implementing EGI policies and procedures, developing EGI policies and procedures by participation in EGI policy groups, communicating with national governments and national research councils about policy priorities for the DCIs, establishing agreements with Resource centres, and drafting national policies and procedures that are in alignment with EGI ones.

Dissemination

NGIs promote their work and that of EGI to their local national audiences. Therefore, while the external liaison functions at a European level are coordinated by EGI.eu, NGIs are focused on dissemination and liaison at the regional and national level. NGIs also provide EGI representation at local and regional events. NGIs active on the international front are considered to represent themselves, but are of course free to propose coordination of any international activities with EGI.eu. NGIs report news stories and interesting user community events in their local area to the central EGI.eu team for further dissemination. They also get involved by providing people to be at these events. In addition, some of the NGI dissemination activities include publicising local success stories in suitable media, creating materials for various audiences (from politicians to scientists), writing up success stories, pointing potential users in the right direction, etc.

User Services

Requirements Gathering

While new requirements are gathered centrally, the collection of new requirements starts in the NGIs and EIROs. They have the contacts with the users and operations staff that are using or operating the EGI resources on a daily basis and can identify issues that need to be resolved.

Application Database

The application database provides a mechanism for users to discover which applications are in use, or are being ported to use the production infrastructure. NGI staff has a vital role to play in adding new entries and keeping entries up to date as they work with their respective user communities.

Training Marketplace

Many NGIs are able to provide generic or specific training courses to help user communities use EGI resources. The training marketplace provides a means of enabling the coordination that NGIs need to do locally in collaboration with other NGIs to support particular user communities.

Consultancy

The staff within NGIs represent an excellent source of local expertise for new users or new sites wishing to make use of e-Infrastructure. This expertise can be disseminated through training, but more frequently requires in depth one on one work with particular applications or user groups.

Operations and Tools

Infrastructure and Tools

NGI infrastructure services and tools support day-by-day operations. They are detailed in the following sections. Additional systems that can be deployed by NGIs are the regional configuration repository and the regional operations portal. These systems are optional depending on the needs of the NGIs.

NGI Monitoring Infrastructure

The EGI Monitoring Infrastructure is distributed. The NGI Monitoring Infrastructure is responsible of running periodic functionality checks. Results are stored and displayed locally through NGI portals, and are collected centrally at an EGI-level to provide an overall view of the EGI Resource Infrastructure status.

Accounting Infrastructure

Each Resource Centre collects Usage Records. Depending on the customisable set-up chosen by the NGI, the data gathered can be directly published in the central databases, or alternatively can be persistently stored at an NGI level and summarised for publication at an EGI level. NGIs are responsible of the validation of the data gathered and to supervise the record publication process to make sure that records are regularly collected centrally.

NGI Helpdesk

An NGI support system fully integrated with the central instance – GGUS – is often required to support local users and Resource Centre administrators. This is typically required by medium and large NGIs. For small-scale NGIs operating a limited number of Resource Centres, the local support system can be simply implemented centrally through a dedicated support unit.

Grid Services

Core Services for VOs

Core middleware services for user information discovery, authentication, workflow management, file cataloguing etc., are often provided by NGIs to support users and the local Infrastructure Services. The actual set of services operated can vary, and depends on the scale of the NGI and on the number of VOs supported.

Stage Rollout

While EGI.eu is responsible of the coordination and supervision of the process, individual Resource Centres are requested to participate as early adopters to staged rollout for proper verification of new deployed software releases in the production infrastructure.

Gathering Middleware Requirements

While new operations requirements are gathered centrally, the collection of new operational requirements starts in the NGIs/EIROs and the Resource Centres. Requirements are periodically gathered and assessed by the Operations Management Board.

Support

EGI.eu coordinates and supervises operations and network support activities provided by the individual NGIs to ensure that operational issues are properly handled at both Resource Centre and NGI level. It is also responsible of handling of Resource Centre suspension in case of operational issues. First level support, Ticket Process Management (TPM), is through the EGI helpdesk support issues are routed through to NGI support teams. Some of these requests may be related to specific support units but other issues relating to e-Infrastructure usage will require human intervention either from the operational or user support aspect. Second level support, Deployed Middleware Support Unit (DSMU), provides technical support for incidents around operative grid middleware. Processing support tickets assigned by TPM, the DMSU assesses whether the described incident can be mitigated by changing middleware configuration or deployment. In conjunction with 3rd level expert support provided by Technology Providers, the DMSU assesses whether the reported incident constitutes a persistent software problem that requires fixing through software update cycles. Inhabiting this pivotal position within the grid middleware related support infrastructure, the DMSU is empowered to actively assign and maintain prioritisation of patch development and publication in software updates.

Operations and Coordination

Grid Oversight

The Regional Operations team is responsible for detecting problems, coordinating the diagnosis, and monitoring the problems through to a resolution. It monitors sites in their region, and react to problems identified by the monitors, either directly or indirectly, provide support to Resource Centre administrators as needed, contribute to the knowledge base, and provide informational flow to oversight bodies in cases of non-reactive or non-responsive Resource Centres.

Service Level Management

NGIs are responsible of supervising the levels of services delivered both at a Resource Centre level for the services providing access to resources, and at an NGI level for collective services provided by the NGIs, adhering to the requirements of the Resource Centre OLA and the Resource infrastructure Provider OLA.

Security Management

NGIs contribute to software vulnerability assessment, to internal Computer Security Incident Response activities, are responsible of appointed a security officer and provide security support to their Resource Centre administrators.

Operations Management

NGIs are responsible for coordinating internal operational activities and to participate to the OMB for coordination at the EGI level.