The EGI Change Management (CHM) Process Introduction and Overview

This is the public homepage of the EGI Change Management Process. Change management within the EGI’s production IT environment is extremely important in ensuring high-quality delivery of IT services.

The purpose of the IT Change Management Policy is to manage higher risk changes in a planned and predictable manner in order to assess risks, assign resources, and minimize any potential negative impact to services. This is done by requiring change owners to prepare submit a Jira ticket including information about the change, which is then considered by the Change Advisory Board (CAB, a group of technical and strategic experts membership, decided by Services and Solution Board, who are tasked with reviewing proposed change requests and reviewing them and approving or rejecting the changes).

The CAB meets to assess and approve changes and is coordinated on the egi-cab@mailman.egi.eu mailing list. 

Here is a brief introduction to the different change management procedures. More details may be found in the CHM Procedure Pages and CHM Risk Page.

Normal changes

The basic procedure for a normal change, is as follows:

  1. For higher risk changes (score >4), the Change Requester (usually the Service Supplier - see below) opens a Jira ticket. Lower risk changes do not need to be recorded unless the change can affect other services under EGI Change Control, or unless the Change Requester feels that there is benefit from doing so.
  2. The change risk of something going wrong (risk = likelihood X impact) should be recorded in the ticket in preparation for the CAB review. Further details about evaluating risk may be found here.
  3. If the change is urgent, the Change Requester should send an email to EGI-CAB to convene the CAB which reviews the change with the Change Requester present. Once approved, this decision is recorded on the ticket (along with the planned intervention date) and the change may proceed.
  4. The change should be implemented following Release and Deployment Management. After the change, the Change Owner should update the Jira ticket with the intervention date, a comment about the outcome of the change.
  5. The change is reviewed at the next CAB and the ticket closed, with the intervention date recorded, if different from the planned intervention date.

Standard changes

In addition, repeated changes of a similar type may be approved as a standard change by the CAB. Subsequent changes that have first been registered as a normal change and executed without problems do not then require explicit approval (or review) by the CAB; it is sufficient for the Service Instance Owner to submit a Jira ticket to recording the change and confirming that it is a standard change. After the change, the Service Instance Owner can then review the change by adding a comment to the ticket saying whether the change was successful and close the ticket. The list of standard changes is provided below.

Emergency changes

Sometimes changes need to be done to address a critical situation (e.g. patch to fix a newly discovered vulnerability) and there may be insufficient time to follow the normal change procedure. 

  1. The Change Requester opens a Jira ticket
  2. CHM staff approve the change 
  3. The change should be implemented following Release and Deployment Management (RDM1). After the change, the Change Owner should update the Jira ticket with the intervention date, a comment about the outcome of the change.
  4. The change is reviewed at the next CAB and the ticket closed, with the intervention date recorded, if different from the planned intervention date.

Services that fall under EGI Change Control

This is the list of services that are under the scope of the central EGI Change Management process (note that federated EGI services are expected to be under the Change Management process of the service supplier's SMS):

ServiceService Supplier
 Accounting repository (Computing and Grid)UKRI
Accounting PortalCESGA
Application Database (AppDB)IASA
Check-inGRNET
Collaboration Tools (Document Repository, Indico, Mailing lists, Mediawiki, RT, SSO)EGI Foundation
Configuration Database (GOCDB)UKRI
DataHubCYFRONET
Helpdesk (GGUS)KIT
Infrastructure ManagerUPV-GRyCAP
Messaging Service (AMS)GRNET
NotebooksCESNET
ReplayCESNET
Operations PortalCC-IN2P3
Service Monitoring (ARGO)GRNET, CC-IN2P3
Software DistributionUKRI
Workload ManagerCC-IN2P3/CNRS

Standard Changes

Service
Title
DescriptionChange Request Reference
Collaboration ToolsReboot of a VM following a regular OS updateRebooting Collaboration Tools VMs following regular OS updates.

IMSCHM-28 - Getting issue details... STATUS

DataHubUpgrade Onedata on the EGI DataHubUpgrade of the EGI DataHub Onezone.

IMSCHM-50 - Getting issue details... STATUS

DataHubUpgrade Oneprovider on the EGI DataHubUpgrade of the EGI DataHub Oneprovider

IMSCHM-277 - Getting issue details... STATUS

Helpdesk (GGUS)Add new VOAdd new VO name to 'Concerned VO' list in GGUS.

IMSCHM-248 - Getting issue details... STATUS

Helpdesk (GGUS)Remove VORemove VO name from 'Concerned VO' list in GGUS.

IMSCHM-252 - Getting issue details... STATUS

Helpdesk (GGUS)Add new support unitAdd new Support Unit to Helpdesk

IMSCHM-253 - Getting issue details... STATUS

NotebooksEnable a new Virtual Organization (VO)Allowing access to the EGI Notebooks platform to all members of the specified new VO.

IMSCHM-64 - Getting issue details... STATUS

NotebooksAccess to a new CVMFS repositoryEnable access to a new CVMFS repository

IMSCHM-276 - Getting issue details... STATUS

ConfluenceUpgrade of Confluence versionUpgrade of Confluence version

IMSCHM-99 - Getting issue details... STATUS

Confluence Upgrade of kernel/os versionUpgrade of kernel/os version

IMSCHM-99 - Getting issue details... STATUS

Jira (part of Collab Tools)Minor and patch release updatesMinor and patch release updates of Jira instance

IMSCHM-244 - Getting issue details... STATUS

Infrastructure ManagerMinor and patch release updatesMinor and patch release updates of Infrastructure Manager

IMSCHM-242 - Getting issue details... STATUS

Quality of Change


Definition
FailedThe change did not complete successfully and had to be rolled back or worked around by following unplanned procedures.  Details shall be recorded in the CR ticket.
ProblematicImplementation of the change did not proceed entirely according to the plan but these were overcome and the change was ultimately successful.  Details of the problems shall be recorded in the CR ticket.
SuccessfulImplementation of the change went according to the plan as described in the CR


Change management operated by other organisation

The EGI Change Management is a centralised process for the EGI Federation. If organisations are providing EGI branded services and are already running their own internal Change Management process, they may continue to do so if their process meets the essential requirements of ISO20k with respect to change management:

  • is there a systematic way of evaluating the risk for changes?
  • is there a procedure within the organisation for approving high-risk changes
  • are high-risk changes recorded (who implemented the change, when was it implemented and what was its outcome)?

In addition to the above, if any changes are planned that have the potential to impact other EGI branded services, then the EGI Change Management process should be informed in advance by submission of a ticket to the Jira queue linked above.

EGI should keep track of organisations running their own internal Change Management process and should periodically run a lightweight audit to ensure that the above requirements are being met.

Contact

If you have any questions relating to EGI Change Management, please contact Matthew Viljoen (matthew DOT viljoen AT egi.eu).