Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Instructions for centrally-provided services Service Providers"

From EGIWiki
Jump to navigation Jump to search
(Deprecate the page)
 
(99 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{Template:Op menubar}} {{Template:Tools menubar}} {{Template:Under_construction}}  
{{Template:Deprecated}}


{{TOC_right}}  
{{Template:Op menubar}} {{Template:Tools menubar}}  


<br>
{{TOC_right}}


Container for instructions for EGI tools developers
= Initial activities  =


=== General<br>  ===
'''Goal: Introducing a new centrally-provided service.'''
Every new centrally-provided service developed to support EGI operations needs to contact EGI Operations (operations@egi.eu) and provide the following data:


*name
# Name of the service
*provide support email  
# Support email  
*point a leader  
# Name and contact details of the Service Provider Team leader  
*EGI url
# Description of the service - purpose
*Register in GOC&nbsp;DB under Egi.eu NGI
# License


<br>
The following steps needs to be performed with the help of the EGI Operations team:


to define<br>
# [[EGI Collaboration tools#EGI.eu_domain|Creating a sub domain in egi.eu zone]]
# Registering the endpoints in GOCDB under EGI.eu NGI
# Creating a category in "Requirements" queue in EG RT tracker and RT dashboard to receive and handle service requests
# Creating a GGUS Support Unit to receive and handle incidents (define level of quality of support - default: Medium)
# Negotiating and signing an OLA with EGI Foundation
# [[Copyright#For_websites_.28EGI.2C_EGI-Engage.29|Acknowledging EGI]]
# Creating a wiki entry for the service with the relevant information
#* Service name
#* Category, Short description of the service
#* URL
#* Contact
#* GGUS Support Unit
#* GOCDB entry
#* link to related OLA
#* link to documentation
#* license
#* provider name
#* link to source code
#* Change management:
#** link to EGI RT dashboard for the service
#** (if applicable) internal bug/task tracking facilities
#** (if applicable) link to [[OTAG|Operations Tool Advisory Group]] (OTAG) team
#* Release and Deployment management:
#** Release schedule
#** Release notes
#** (if applicable) Roadmap
#** URL of test instance
# Subscribe to Tool-admins mailing list
# Get from EGI access to notification@egi.eu email


*Bug/task tracking facilities
= Ongoing activities  =
*Internal communication channe
*Support communication channels<br>


<br>
== Change management  ==


<br>
'''Goal: To ensure changes are planned, approved, implemented and reviewed in a controlled manner to avoid adverse impact of changes to services or the customers receiving services. '''


=== Requirements gathering and testing<br> ===
[[Image:CM.png|600px|CM.png]]


*add category in requirements RT tracker and create RT dashboard
=== Record  ===
*Test instance url
*request OTAG is needed https://wiki.egi.eu/wiki/OTAG- ''The OTAG mandate is to help developers in requirment prioritatization and releasing process of operational tools''''. ''''OTAG provide forums'' ''to discuss the tools evolution that meet'' ''the expressed needs of the EGI community. It has representation from the all end users groups depending on the tool.''


<br>
# '''EGI recording system''' is [http://rt.egi.eu EGI RT].
# '''EGI user''' is instructed to submit changes requests in EGI RT.
# Team can use an '''internal tracker''' but keeping consistence between EGI RT and internal tracker is under the responsibility of the development team.
# If the service has '''other customers than EGI,''' the development team is responsible to inform EGI about submitted change request from the other customer.
# Minimum set of statuses of entry
#* New - newly recorded in system
#* Accepted - accepted by OTAG/OMB
#* Rejected - rejected by OTAG/OMB
#* In progress/Open - Development team is working on the request
#* Resolved - released, closing the ticket being under the responsibility of the development team
#* Stalled - on hold
=== Classify  ===


=== Monitoring<br>  ===
# All change requests should be '''classified in a consistent manner'''. Classification in EGI RT is based on the Service the request is related to.
# Service providers should define list of '''standard change '''requests (a Change that is recurrent, well known, has been documented in a procedure following a pre-defined, relatively risk-free path, and is the accepted response to a specific requirement or set of circumstances, where authority is effectively given in advance of implementation.). Standard change request '''doesn't need approval''' to be implemented.
# '''Emergency change''' for Operational tools are the highest priority change related to '''security vulnerability '''and can be implemented '''without approval '''but will be '''subject of a post-review'''.


*Ava/rel threshold defined with EGI Operations<br>
=== Assess and Approve  ===
*Develop monitoring probe <br>


=== Documentation<br>  ===
# Each change request should be commented by the development team with an assessment of the work needed to implement the change.
# '''Every change''' which is not a standard change or an emergency change '''should be assessed '''(prioritised)'''and approved '''both internally, among the Product Teams (PTs) and with the OMB.
# Where needed a dedicated [[OTAG|Operations Tool Advisory Group]] (OTAG) can be set up. The OTAG mandate is to help developers in requirement prioritisation and releasing process of operational services. OTAG provide forums to discuss the service evolution to meet the expressed needs of the EGI community. It has representations from all the end users groups depending on the service.
# Minimum set of value for prioritisation:
#* None (in RT - 0)
#* Low (in RT - 1)
#* Normal (in RT - 2)
#* High (in RT - 3)
#* Immediate/emergency (in RT - 4)


*wiki page with '''Release schedule,&nbsp;''' '''Release notes''', '''Roadmap,&nbsp;''' '''Related OLA''',&nbsp;<br>
=== Implement, release and deploy  ===


=== Support - incident handling  ===
Phase which takes place in Release and Deployment process (see below).


*declare quality of support
=== Review  ===
*Support is provided in following language: English
*create GGUS&nbsp;SU
*<span>Monday and Friday</span>
*<span>8 h per day</span>


Support is provided via the GGUS portal [GGUS], which is the single point of contact for infrastructure users to access the EGI Service Desk. The EGI Service Desk within GGUS is organized in Support Units. Every Support Unit is responsible for one or more services. The number and definition of the EGI Support Units in GGUS is not regulated by this OLA and can change at any time to fulfil the EGI Incident and Problem Management requirements.<br>
# Every release is a subject of a '''post-implementation review'''. Within one week customers are providing feedback answering to following questions:
#* Were release notes and documentation sufficient?
#* Is the service working as expected?


=== Development<br> ===
== Release and deployment management ==


broadcast <br>
'''Goal: To bundle changes into a release, so that these changes can be tested and deployed to the production environment together.'''


=== Planed maintenance windows or interruptions<!--[if gte mso 9]><xml>
[[Image:RnDM.png|700px|RnDM.png]]  
<o:OfficeDocumentSettings>
  <o:AllowPNG/>
  <o:PixelsPerInch>72</o:PixelsPerInch>
</o:OfficeDocumentSettings>
</xml><![endif]-->  ===


Downtime in GOC&nbsp;DB
=== Plan Release  ===


To be communicated in a timely manner i.e. 24 hours before, to the Customer through the Broadcast Tool [BT]. Typical duration is up to 24 hours otherwise needs to be justified.<br>
# Release schedules should be defined, including the frequency of releases (eg. every 1-2 month).
# Scope of every release should be defined.
# Release should be planned in a way that OMB can be informed in at least one week time in advance.  
#* all planned releases should be presented during the monthly OMB meeting prior to their release to production.
#* emergency releases should be presented at the first available OMB meeting, post-release.


{| class="wikitable"
=== Build Release  ===
|-
| <br>
| This manual provides information on how to manage central operational tool unscheduled downtimes.
|}


=== ===
# Due to the open-source nature of the developed software, each service should have and use '''publicly available code repositories,''' using a Version Control System (VCS) like Git, Subversion or CVS
#* The preferred Version Control System is Git using [https://github.com/ GitHub] to ease collaborations
# All new releases should correspond to a new '''tag''' in the VCS
#* If intended to be part of the UMD distribution, all tools should be packaged in an OS native format (rpm for RH family, debs Debian)
#** packaging standards and policies should be followed (ex: [http://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard FHS], [https://fedoraproject.org/wiki/EPEL/GuidelinesAndPolicies EPEL-GuidelinesAndPolicies], [https://www.debian.org/doc/debian-policy/ Debian Policies])
# Distribution
#* source code archived as a tarball or as binaries through EGI AppDB or attached to GitHub repositories
#** if intended to be part of UMD - new releases should be distributed as binaries through the UMD repository


https://wiki.egi.eu/wiki/MAN04_Tool_Intervention_Management<br>
=== Test Release  ===


=== Security  ===
# Testing of the software, prior to its release, is a '''responsibility of the development team.'''
# Where a dedicated '''OTAG''' exists it can be involved in the testing activity.
# '''Major or particularly important releases''' should pass acceptance tests performed by customer representatives
# When the development phase of a new release is completed an announce should be sent to all the Service Providers teamss and to all the actors that should be involved in the release testing (OTAGs). The announcement should contain:
#* release notes, changelog, installation and configuration steps to apply the update, as well as known issues
#* documentation links
#* detailed test plan
#* all the information needed by the [https://wiki.egi.eu/wiki/EGI_Quality_Criteria_Dissemination conformance criteria].
#* the expected release date and the kind of testing will depend on each specific release and on its importance.
# '''If a test fails''' a report will be produced and the release sent back to development to restart the cycle. Tests will include a documentation review and a documentation update if needed. The test phase can be performed internally in the development team if no other tools or services are affected.


The following rules for information security and data protection apply:<br>•&nbsp;&nbsp;&nbsp; The Provider must define and abide by an information security and data <br>protection policy related to the service being provided. <br>•&nbsp;&nbsp;&nbsp; This must meet all requirements of any relevant EGI policies or procedures [POL] and also must be compliant with the relevant national legislation.<br><br>
=== Document  ===


3&nbsp;&nbsp;&nbsp; INTERACTIONS OF THE PRODUCT TEAMS<br>OTPTs need to interact with each other inside the activity, as well as with other project activities, primarily with SA1,the main customer of the operational tools.<br>Interactions among OTPTs is guaranteed by the activity management through periodic phone conferences and face to face meetings. Activity progress will be tracked using EGI.eu facilities such as the RT system [31], and will be reported in periodic reports. A dedicated mailing list (under the egi.eu domain) is available for activity as well. <br>Interaction with external bodies is fundamental in order to get input, feedback and new requirements for the developed tools. Two advisory groups are foreseen by the project description of work: the Operational Tools Advisory Group (OTAG) and the User Services Advisory Group (USAG):<br>-&nbsp;&nbsp;&nbsp; Being composed by representatives from the operation community, from the middleware developers and from the JRA1 activity, the OTAG will be the main supervisory group for the development progress and the place where technical discussion about the evolution of the tools will take place<br>-&nbsp;&nbsp;&nbsp; The USAG has representatives from the EGI user communities and will focus on the requirements for the complete set of services run by the project but could also impact on the operational tools -&nbsp; for example end user requirements could be addressed to the GGUS Helpdesk<br>In order to create the proper schedule for the development it will be important that the outcome of the advisory groups will be a single prioritized table. The prioritization is an important step and possible conflicts should be resolved when multiple requirements from different groups impact the same tool. This is not expected to happen frequently and will be analyzed case by case by the management of the activity together with the advisory groups, possibly escalating the problems to the Activity Management Board if needed.<br>The development progress of new features, requested and approved by the advisory groups, will be tracked using the project tracking system [31].<br>The representation of the activity within other project bodies and the reporting on the status of the activity is a responsibility of the activity manager.<br><br>[[Image:Ops_tool_release_process.png]]<br>
# The '''development team is responsible for creation and maintenance of the documentation''', instructions and manuals related to the service in collaboration with the EGI Operations team.  
# Before each release documentation should be checked and updated where needed.


4&nbsp;&nbsp;&nbsp; OUTPUT OF THE PRODUCT TEAMS<br>Each tool will be released as a standalone package. PTs are autonomous in the development, but the release schedule and roadmap will be discussed and agreed both internally and with other project actors if needed (i.e. SA1). <br>Testing and documenting the released packages are responsibilities of the PTs under the supervision of the activity management. If a new release for an operational tool affects other tools or middleware installed in the production infrastructure, then a test plan will be discussed among the PTs and/or with the SA1 activity. To easy the information exchange and to discuss test plans a release procedure, which is&nbsp; described below and depicted in Figure 2, will be applied by every PT.<br><br><br><br><br>&nbsp;<br>Figure 2 - Operational Tools Release Procedure<br><br>When the development of a new release is completed (T0) a first announce is broadcasted to all the PTs and to all the actors that should be involved in the release testing. The announce has to contain&nbsp; release notes, documentation links, a detailed test plan, an indication of the expected release date and all the information needed by the conformance criteria set by the SA2 activity for the software providers of the project. More information on conformance criteria can be found on the project milestone MS503. The expected release date and the kind of testing will depend on each specific release and on its importance. <br>After the T0 announce the testing phase will take place according to the test plan. If a test fails a report will be produced and the release sent back to development to restart the cycle. Tests will include a documentation review and a documentation update if needed. The test phase can be performed internally to the PT if no other tools or services are affected. When all the tests are passed and no more development/testing cycles are needed the testing phase is concluded (T1) and a second release announce will be broadcasted to the consumers of the new release (i.e. SA1) using the communication facilities offered by the Operation Portal. This second announce will contain the actual release date (TR), the release notes, the documentation links and a document describing the testing phase details. The release can result in an immediate installation on the production instances for the centralized tools or will follow a deployment process, with a possible initial testing phase on a selected number of production instances according to the SA1 needs (StagedRollout, for further details refer to the project milestone MS402).<br>Each tool will maintain its own documentation on the web and links to these web pages will be maintained on the project wiki by the activity management in order to have a single access point to all the tools documentation. The activity management will also supervise the need and the editing of cross tools documentation to support the integrated usage of multiple components.<br><br>4.1&nbsp;&nbsp;&nbsp; Strengthening the test phase and monitoring the software quality<br>Testing of the released software is a responsibility of the product teams, however the manpower available for each OTPT does not allow for fully independent testing. In almost all PTs the developers act also as testers, even if, when possible, not the same person performs both activities on the same piece of code.&nbsp; To mitigate this situation it will be important for a PT to discuss the testing plan within the whole jra1 activity to agree contribution to testing from other PTs. Moreover in case of major or particularly important releases contribution can also be found outside the activity, mainly inside the operation community.<br>The quality of the operational tools releases will be constantly monitored trough a set of metrics that will include, in example, the number of bugs found in certification and in production, as well as the time needed to fix critical and non critical bugs by each PT. This metrics are currently under definition. If the monitoring activity will show that the software quality of some operational tools needs to be improved it will reported to the projects management and actions will be undertaken in order to strengthen the testing phase, i.e. increase the number of external and independent testers to be found both internally to the partners developing the tools and externally within the community.<br>Given the limited manpower the prioritization work performed by the supervisioning bodies will be particularly important in order not to waste developing and testing efforts.&nbsp;&nbsp;&nbsp; <br>
=== Inform  ===


# '''Information about next release '''should be communicated
#* '''during OMB meeting''' (at least one week before release).
#** One slide information should be send to operations@egi.eu
#* through '''Monthly Operational broadcast '''
#** Content of the broadcast should be sent to operations-support@mailman.egi.eu


=== Deploy Release  ===


# For '''changes of high impact and high risk''', the steps required to reverse an unsuccessful change or remedy any negative effects shall be defined.


=== Review Release  ===
# '''Each release should be monitored for success or failure''' and the results shall be analysed internally.
== Incident and Problem management  ==
'''Goal: To restore agreed service level within the agreed time after the occurrence of an incident, and to investigate the root causes of (recurring) incidents in order to avoid future recurrence of incidents by resolving the underlying problem, or to ensure workarounds are available. '''
=== Incident and requests management  ===
Support should be provided:
* via the [GGUS https://ggus.eu] HelpDesk, which is the single point of contact for infrastructure users to access the EGI Service Desk.
* (for incidents) According to declared level of [[FAQ GGUS-QoS-Levels|quality of support ]](default: Medium)
*Support is provided in '''English'''
* '''Support is available'''
** from Monday to Friday
** 8 h per day
* All incidents and requests should be '''(assigned to proper Support Unit) and [[FAQ GGUS-Ticket-Priority|prioritised]]''' according to the suitable scheme.
=== Problems management  ===
# In case of '''recurring incidents''' which cannot be solved by the development team a GGUS ticket should be created to "Operations" Support Unit. EGI Operations team will coordinate the investigation of the root cause of (recurring) incidents in order to avoid future recurrence of incidents.
#* Any existing GGUS tickets which may help investigation should be marked as a child of the created ticket.
=== Planned maintenance windows or interruptions  ===
'''Planned maintenance''' should be:
* declared in GOCDB in a timely manner i.e. '''24 hours before'''
* with a typical duration up to 24 hours otherwise it should be justified
'''Unscheduled interruptions '''should be managed according to [[MAN04 Tool Intervention Management|MAN04]].
=== Security incidents  ===
All security incidents should be registered according to [[EGI CSIRT:Incident reporting|EGI_CSIRT:Incident_reporting]].
=== Monitoring  ===
The development team is '''responsible''' for development, maintenance and support of the service '''monitoring probes.'''
Availability and reliability threshold should be defined with EGI Operations team depending on the criticality of the service.
== Information Security management  ==
'''Goal: to manage information security effectively through all activities performed to deliver and manage services, so that confidentiality, integrity and accessibility of relevant assets are preserved'''
The following rules for information security and data protection apply:
* The Provider must define and abide by an information security and data protection policy related to the service being provided;
* This must meet all requirements of any relevant [http://www.egi.eu/about/policy/policies_procedures.html EGI policies or procedures] and also must be compliant with the relevant national legislation.
== Customer relationship management'  ==
'''Goal:''' '''Establish and maintain a good relationship with customers receiving service'''
Development team need to interact with each other inside EGI activities, primarily with [[OMB|OMB]], the main customer of the operational tools.
'''Interactions between Development teams''' are guaranteed through periodic phone conferences and face to face meetings. A dedicated mailing list is available as well.
'''Interactions with customers '''are guaranteed through periodic OMB meetings and (where needed) dedicated [[OTAG|Operations Tool Advisory Group]]. The OTAG mandate is to help developers in requirement prioritisation and releasing process of operational tools. OTAG provide forums to discuss the tools evolution that meet the expressed needs of the EGI community. It has representation from the all end users groups depending on the tool.


[[Category:Tools]]
[[Category:Tools]]
<br>

Latest revision as of 15:20, 18 September 2019

Alert.png This article is Deprecated and should no longer be used, but is still available for reasons of reference.



Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Tools menu: Main page Instructions for developers AAI Proxy Accounting Portal Accounting Repository AppDB ARGO GGUS GOCDB
Message brokers Licenses OTAGs Operations Portal Perun EGI Collaboration tools LToS EGI Workload Manager



Initial activities

Goal: Introducing a new centrally-provided service. Every new centrally-provided service developed to support EGI operations needs to contact EGI Operations (operations@egi.eu) and provide the following data:

  1. Name of the service
  2. Support email
  3. Name and contact details of the Service Provider Team leader
  4. Description of the service - purpose
  5. License

The following steps needs to be performed with the help of the EGI Operations team:

  1. Creating a sub domain in egi.eu zone
  2. Registering the endpoints in GOCDB under EGI.eu NGI
  3. Creating a category in "Requirements" queue in EG RT tracker and RT dashboard to receive and handle service requests
  4. Creating a GGUS Support Unit to receive and handle incidents (define level of quality of support - default: Medium)
  5. Negotiating and signing an OLA with EGI Foundation
  6. Acknowledging EGI
  7. Creating a wiki entry for the service with the relevant information
    • Service name
    • Category, Short description of the service
    • URL
    • Contact
    • GGUS Support Unit
    • GOCDB entry
    • link to related OLA
    • link to documentation
    • license
    • provider name
    • link to source code
    • Change management:
      • link to EGI RT dashboard for the service
      • (if applicable) internal bug/task tracking facilities
      • (if applicable) link to Operations Tool Advisory Group (OTAG) team
    • Release and Deployment management:
      • Release schedule
      • Release notes
      • (if applicable) Roadmap
      • URL of test instance
  8. Subscribe to Tool-admins mailing list
  9. Get from EGI access to notification@egi.eu email

Ongoing activities

Change management

Goal: To ensure changes are planned, approved, implemented and reviewed in a controlled manner to avoid adverse impact of changes to services or the customers receiving services.

CM.png

Record

  1. EGI recording system is EGI RT.
  2. EGI user is instructed to submit changes requests in EGI RT.
  3. Team can use an internal tracker but keeping consistence between EGI RT and internal tracker is under the responsibility of the development team.
  4. If the service has other customers than EGI, the development team is responsible to inform EGI about submitted change request from the other customer.
  5. Minimum set of statuses of entry
    • New - newly recorded in system
    • Accepted - accepted by OTAG/OMB
    • Rejected - rejected by OTAG/OMB
    • In progress/Open - Development team is working on the request
    • Resolved - released, closing the ticket being under the responsibility of the development team
    • Stalled - on hold

Classify

  1. All change requests should be classified in a consistent manner. Classification in EGI RT is based on the Service the request is related to.
  2. Service providers should define list of standard change requests (a Change that is recurrent, well known, has been documented in a procedure following a pre-defined, relatively risk-free path, and is the accepted response to a specific requirement or set of circumstances, where authority is effectively given in advance of implementation.). Standard change request doesn't need approval to be implemented.
  3. Emergency change for Operational tools are the highest priority change related to security vulnerability and can be implemented without approval but will be subject of a post-review.

Assess and Approve

  1. Each change request should be commented by the development team with an assessment of the work needed to implement the change.
  2. Every change which is not a standard change or an emergency change should be assessed (prioritised)and approved both internally, among the Product Teams (PTs) and with the OMB.
  3. Where needed a dedicated Operations Tool Advisory Group (OTAG) can be set up. The OTAG mandate is to help developers in requirement prioritisation and releasing process of operational services. OTAG provide forums to discuss the service evolution to meet the expressed needs of the EGI community. It has representations from all the end users groups depending on the service.
  4. Minimum set of value for prioritisation:
    • None (in RT - 0)
    • Low (in RT - 1)
    • Normal (in RT - 2)
    • High (in RT - 3)
    • Immediate/emergency (in RT - 4)

Implement, release and deploy

Phase which takes place in Release and Deployment process (see below).

Review

  1. Every release is a subject of a post-implementation review. Within one week customers are providing feedback answering to following questions:
    • Were release notes and documentation sufficient?
    • Is the service working as expected?

Release and deployment management

Goal: To bundle changes into a release, so that these changes can be tested and deployed to the production environment together.

RnDM.png

Plan Release

  1. Release schedules should be defined, including the frequency of releases (eg. every 1-2 month).
  2. Scope of every release should be defined.
  3. Release should be planned in a way that OMB can be informed in at least one week time in advance.
    • all planned releases should be presented during the monthly OMB meeting prior to their release to production.
    • emergency releases should be presented at the first available OMB meeting, post-release.

Build Release

  1. Due to the open-source nature of the developed software, each service should have and use publicly available code repositories, using a Version Control System (VCS) like Git, Subversion or CVS
    • The preferred Version Control System is Git using GitHub to ease collaborations
  2. All new releases should correspond to a new tag in the VCS
    • If intended to be part of the UMD distribution, all tools should be packaged in an OS native format (rpm for RH family, debs Debian)
  3. Distribution
    • source code archived as a tarball or as binaries through EGI AppDB or attached to GitHub repositories
      • if intended to be part of UMD - new releases should be distributed as binaries through the UMD repository

Test Release

  1. Testing of the software, prior to its release, is a responsibility of the development team.
  2. Where a dedicated OTAG exists it can be involved in the testing activity.
  3. Major or particularly important releases should pass acceptance tests performed by customer representatives
  4. When the development phase of a new release is completed an announce should be sent to all the Service Providers teamss and to all the actors that should be involved in the release testing (OTAGs). The announcement should contain:
    • release notes, changelog, installation and configuration steps to apply the update, as well as known issues
    • documentation links
    • detailed test plan
    • all the information needed by the conformance criteria.
    • the expected release date and the kind of testing will depend on each specific release and on its importance.
  5. If a test fails a report will be produced and the release sent back to development to restart the cycle. Tests will include a documentation review and a documentation update if needed. The test phase can be performed internally in the development team if no other tools or services are affected.

Document

  1. The development team is responsible for creation and maintenance of the documentation, instructions and manuals related to the service in collaboration with the EGI Operations team.
  2. Before each release documentation should be checked and updated where needed.

Inform

  1. Information about next release should be communicated
    • during OMB meeting (at least one week before release).
      • One slide information should be send to operations@egi.eu
    • through Monthly Operational broadcast
      • Content of the broadcast should be sent to operations-support@mailman.egi.eu

Deploy Release

  1. For changes of high impact and high risk, the steps required to reverse an unsuccessful change or remedy any negative effects shall be defined.

Review Release

  1. Each release should be monitored for success or failure and the results shall be analysed internally.

Incident and Problem management

Goal: To restore agreed service level within the agreed time after the occurrence of an incident, and to investigate the root causes of (recurring) incidents in order to avoid future recurrence of incidents by resolving the underlying problem, or to ensure workarounds are available.

Incident and requests management

Support should be provided:

  • via the [GGUS https://ggus.eu] HelpDesk, which is the single point of contact for infrastructure users to access the EGI Service Desk.
  • (for incidents) According to declared level of quality of support (default: Medium)
  • Support is provided in English
  • Support is available
    • from Monday to Friday
    • 8 h per day
  • All incidents and requests should be (assigned to proper Support Unit) and prioritised according to the suitable scheme.

Problems management

  1. In case of recurring incidents which cannot be solved by the development team a GGUS ticket should be created to "Operations" Support Unit. EGI Operations team will coordinate the investigation of the root cause of (recurring) incidents in order to avoid future recurrence of incidents.
    • Any existing GGUS tickets which may help investigation should be marked as a child of the created ticket.

Planned maintenance windows or interruptions

Planned maintenance should be:

  • declared in GOCDB in a timely manner i.e. 24 hours before
  • with a typical duration up to 24 hours otherwise it should be justified

Unscheduled interruptions should be managed according to MAN04.

Security incidents

All security incidents should be registered according to EGI_CSIRT:Incident_reporting.

Monitoring

The development team is responsible for development, maintenance and support of the service monitoring probes.

Availability and reliability threshold should be defined with EGI Operations team depending on the criticality of the service.

Information Security management

Goal: to manage information security effectively through all activities performed to deliver and manage services, so that confidentiality, integrity and accessibility of relevant assets are preserved

The following rules for information security and data protection apply:

  • The Provider must define and abide by an information security and data protection policy related to the service being provided;
  • This must meet all requirements of any relevant EGI policies or procedures and also must be compliant with the relevant national legislation.

Customer relationship management'

Goal: Establish and maintain a good relationship with customers receiving service

Development team need to interact with each other inside EGI activities, primarily with OMB, the main customer of the operational tools.

Interactions between Development teams are guaranteed through periodic phone conferences and face to face meetings. A dedicated mailing list is available as well.

Interactions with customers are guaranteed through periodic OMB meetings and (where needed) dedicated Operations Tool Advisory Group. The OTAG mandate is to help developers in requirement prioritisation and releasing process of operational tools. OTAG provide forums to discuss the tools evolution that meet the expressed needs of the EGI community. It has representation from the all end users groups depending on the tool.