EGI-InSPIRE:NSRW IMPLEMENTATION RT

From EGIWiki
Jump to: navigation, search
EGI Inspire Main page



General Comments / Assumptions

  • External Technology Providers will use GGUS to submit tickets
  • Internal Technolgy Providers will use RT to submit tickets
  • Emergency Releases (External and Internal) will be handled only by RT
  • This is Based on Feedback_from_M._Drescher

MDavid I think "Emergency Releases" should also start from GGUS tickets, we will have to check if this is to much overhead, but the urgency of it can be already in "High priority" in ggus

(Custom) fields

EmergencyRelease

This is a check-box field used to flag a release as an emergency one which implies that the release should go straight to production

MDavid this should be evaluated in a case by case basis, I was planing that it may go through a very fast staged rollout.

Status

Existing field. Takes the general RT status of the ticket.

Allowed values:

  • new - Ticket is created, but not assigned yet. Initial state once synched from GGUS.
  • open - Ticket is assigned to a person or group (does that feature exist in RT?) and work is undertaken
  • rejected - The ticket is faulty, invalid, duplicate or otherwise something that must not inflict work upon potential assignees.
  • resolved - The request has been resolved with an outcome (and auditable trace of documentation etc.!). Causes the ticket to be synchronized back to GGUS.
  • deleted - does not apply
  • stalled - does not apply(?)


RolloutProgress

The RolloutProgress reflects both the phases of the SW release workflow, and the final outcome.

Allowed values: see Workflow

QualityCriteria Verification Report

This field holds a link to the report of the QC verification process.


StageRollout Report

This field holds a link to the consolidated report summarising individual reports of all EAs.


Repository URL

Holds the currently valid URL to download the installation package. So, if the ticket has the current RolloutProgress of "StageRollout" (or equivalent), the URL should point to the package in the StageRollout repository.

Release metadata

This "field" I envision not really a field, but an attachment to the ticket that holds the XML data for the release referred to in this ticket. Didn't know how to describe it. For better automation the attachment should always have the same name (for example, "release.xml") and be attached throught the WS integration with GGUS (if possible).

The contents of this XML file is currently under discussion, so I will not drill into it right now.


Owner

The owner of the ticket should change from person to person, depending on the progress through the workflow. I am not a fan of assigning the ticket to a group of people, as this often leads to confusion as to who should pick up the ticket. For now I would like to see group leaders, such as the Task leaders for QC verification (TSA2.3, Carlos Fernandez), and the Task Leader for SR (TSA1.3, Mario David) as standing assignees who then delegate the work appropriately, and collect and consolidate the reports (see other custom fields, and workflow comments).

MDavid Agree, and that is what I thought initially.

Watchers

I would like a list of watchers added to the tickets to ensure proper monitoring. Again, I think the task leaders for QC verification and SR are mandatory watchers, as well as the task leader for repository management (TSA2.4, Kostas). I also would want the members of said tasks (TSA2.4 and TSA2.3 in particular) as watchers. The activity leader for SA2 (currently me, Michel) should be a mandatory watcher as well.

I am thinking of the task leader of TSA2.5, Michael, to also watch this queue as to give the DMSU a forecast on what is in the current pipeline of patches that may be rolled out into production.

I also would like to include the group of early adopters in SR to be watchers, to get their heads up for what's going on. But that needs more discussion that perhaps Mario David may lead.

MDavid early-adopters-XXX should be notified when the status passes to -> "StageRollout". Since watchers are individuals, I think we will need some field that when set (somehow) triggers a notification mail to the respective mlist, now maybe a tricky part is then that only the ones accepting the staged rollout will be included as watchers.

Associated Major Release OR Associated Software

Provides the association of the given release, with a higher level "object" (i.e. "Major release" or even a "Software"). This object should be characterized by a set of static/agreed attributes.

For example, the location on the production repository, where each release associated with an "object" should be populated.

One more (more pragmatic) example: Since the release ca-policy-egi-core-1.37-1 it is associated with the ca-1.0 major release, then the ca-policy-egi-core-1.37-1 should be populated at http://repository.egi.eu/sw/production/CAs/.

[From my point of view, these (they are more than one) kind of attributes, should be decided within the EGI and agreed (maybe) with the provider. They should be at a major release level (or at least at a software level) and they should be updated, for example on a major release sequence]

Workflow

REPO-STATE-Diagram.png

RollOutProgress has the following States

  • Unverified (Initial Value): RT Notifies Repo and QCVFY
  • InVerification: QCVFY performs verification, not Action from RT,REPO
  • WaitingForResponse: Waiting For response from TP, not Action from RT,REPO
  • StageRollOut: RT Notifies Repo, SR Team
  • Deferred: Special State used by the Repo to signal that there is another release in SR
  • Rejected: RT Notifies Repo, TP With Outcome for the Reason
  • Production: RT: RT Notifies Repo,TP,SA1 Release Manager
  • Failed: Special State used by the Repo to signal that there is a failure.

Two New States to be added. Both will work exactly as Rejected but recorded with different semantics as follows.

  • Ignored - the submitted target does not match the UMD supported targets.
  • Not published - the product passed the EGI provisioning process but was for the given reason not published.

Note: When ignoring, rejecting, or not publishing a product, the RT ticket has to be resolved.



alt text

This images Describes the Repo View of the above State Diagram with more detail

Notifications

After testing of the notification mechanisms we (Marios and me) have realized that we are not completely sure who should be notified when during the ticket lifespan. Current implementation follows our discussions during December. Respective groups are added as ticket CCs when the Repo returns CommunicationStatus equal to OK and the RolloutProgress is as depicted in the table below. Note that the sw-rel-admin is always notified as AdminCcs of all changes for the tickets in the sw-rel queue.

RolloutProgress Groups to be notified
Submitted
Unverified sw-rel-qc
In verification
Waiting for response
Deferred
StageRollout sw-rel-sr
Production sw-rel-production
Rejected

The notification mechanism works nicely. However, we have realized that the table above is probably not sufficient and we might want to have a full matrix of RolloutProgress states and groups to be notified. That would be probably implemented by purging the CCs on each CommunicationStatus == OK && RolloutProgress change and adding the groups from scratch, but this is not a big deal from the imlpementation point of view.

Progress / Issues

Requests

Requested By Description Response Status
Carlos block the changes in one ticket along its workflow ("rolloutprogress"). e.g. one ticket in StageRollout should not change to "In verification" This is generally feasible. The question is how to trigger the mechanism, which would block the rollout progress changes. Is there something we already have stored within the ticket or ticket custom fields which can be used to distinguish that the rollout progress should be blocked? Implemented
Carlos Record the time the ticket goes to different "rolloutprogress" RT keeps track of the time being worked on a ticket, due dates and so on. However, I'm afraid that these mechanisms are not enough to keep track of time spent in different rollout progress states. I guess that We'll have to keep track on this in a separate custom field. Do you have some format that would suit your needs to store the progress? Would be something like CSV fine? Pending
Kostas Default Value RolloutProgress for a new ticket should be submitted Was already in place. I have found a bug in my code which did not sanitize the RolloutProgress with "no value" (i.e., null) as the input. I have changed the sanitization condition from (if ($self->TicketObj->FirstCustomFieldValue('RolloutProgress')) )to unless ($self->TicketObj->FirstCustomFieldValue('RolloutProgress') eq 'Submitted'). Implemented
Kostas when I changed the ticket from submitted to unverified I get the following error "Could not add new custom field value: Permission Denied" This was actually more general issue where RT did not follow the ModifyCustomField rights quite correctly when modifying the ticket. Solved
Kostas E-mail notification to be sent to the corresponding teams depending on the RolloutProgress State e.g when in InVerification it should notify Carlo's team etc. Implemented
DavidG a label "Delete" would help clarifying the meaning of the tick box next to the "ReleaseMetadata" file name, when modifying the custom values of an existing ticket. It now just has a tickbox and the name of the file ("release.xml"). But the function of the tick box is left up to the reader to decide ... ACK. Makes sense. I had to hack this directly in the RT source. It should be clear now. Implemented

Progress

A post API has been developed in order to receive post commands originated from the EGI RT system. The information required is: Ticket_Id, Current_Rollout_Progress, Previous_Rollout_Progress and a Yes/No flag that indicates whether it is an Emergency_Release or not. The API has the appropriate functionality to:

  • Validate (in some level) the data received and in case of a failure (i.e. erroneous data), it responds with an error message to the post request made by the RT.
  • The API it is also responsible for communicating the error to the main Repo system, the later appends an explanatory message (as a comment) to the corresponding RT ticket.
  • All the messages received through the API, are stored into a DB (more specific to a table called Queue) for further processing by the Repo ssytem (see below).

The Repo system (lets call it Repo daemon) it is responsible to:

  • Periodically, watches the Queue for any new request
  • Upon a new request reception, it is responsible of perform all the necessary checks and actions
  • For example, in case that the new request which concerns a transition of a given release from Submitted -> Unverified, the Repo daemon:
      • makes a connection to the EGI RT system over the RT REST interface and acquires the necessary data
      • performs the necessary checks to the newly arrived data, based on a checksums xml file that is provided by the RT
      • checks if the mandatory fields are there
      • based on the RolloutProgress transition type, retrieves the necessary configuration values from the database (f.e: if the transition is Submitted -> Unverified the target repository should be the /sw/unverified and so on…)
      • downloads the release.xml (Metadata) file from the RT
      • parses the XML and inserts the data into the database (currently, it parses "only" the <Release> high level section, as this provided at the NRMS_New_Release_MetaDATA_schema , DTD schema)
      • creates a scratch structure, where it downloads the software using the rsync url provided by the SP (currently \*\*only\*\* rsync is supported)
      • does the necessary housekeeping
      • performs the necessary movements. If for example, the transition is Submitted -> Unverified, the newly submitted release appeared in the unverified repository, following the structure /sw/unverified/<DistributionShortName>/<Version>/ (for the <DistributionShortName> and <Version> values, we are using those that are included in the release.xml \[an example it is available at [1]\])

NOTE: The unverified area/repository it is not in a public view, it requires an EGI SSO account for accessing it.

  • In case of a failure, either a technical one \[i.e. Mysql problem\] or logical \[i.e. unaccepted transition from StageRollout->Unverified\], the Repo daemon appends a comment with an "explanatory" message to the corresponding EGI RT ticket, and sets the ticket\’s RolloutProgress field to \’Failed\’

Issues

  1. To continue our work and extent the functionality offered by the Repo, some issues should be clarified (most of them are also included at Questions_from_the_Developers ):
  2. should, the Repo, be able to manage both incremental (delta) and non-incremental releases?
    1. MDavid yes, as I see it a major release is a "non-incremental" release, and all other are "delta" only newer packages that should go to the "updates" repo associated to any given major release
  3. Do we need/want more than one releases of the same software in the "stage rollout" phase or should they enter this phase sequential? If we want many releases of the same software in "stage rollout" phase will they be approved/rejected independently? (i.e. rejecting/approving one automatically rejects/approves the other(s))
    1. MDavid there should be only one release at a time for any major release. Though, it might happen to be 2 releases at the same time if they refer to two major release, which are independent from each other.
    2. MichelD: Yes, this is indeed a requirement as EMI communicated that they will support more than one major release at a time given on its lifetime and support contracts.
  1. What is the purpose of the transition from Accepted (Production) to Rejected? Does, it means that the "last release" inserted in the production for a given distribution, should be rejected (Rollback). Correct?
    1. MDavid "accepted" is the state that at least one early adopter has accepted the staged rollout and thus it may go to "rejected" if it fails the staged rollout. (I think!)
    2. MichelD: I very much hope that the release in the rollout workflow gets only accepted if all EA reports are evaluated, and not already when the first EA report is positive! The semantics here is that as soon as the RT ticket is set to "Accepted" (By Mario) the Repo moves the software components to the appropriate areas (updates for minor and revision reeases, new are for a major release etc).
      As for the mentioned transition I think this is misplaced, and should not happen. I think it tries to model the use case of software components being phased out from production (e.g. gLite 3.1 components.)
  1. Based on the fact that the release.xml file should not be modified after a release has been submitted, the Repo downloads the release.xml from the RT, only when the transition is from Submitted -> Unverified. Agree on this?
    1. MDavid Agreed and decision is "yes" on my part, what do others think?.
    2. MichelD: Yes.
  1. The Repo ignores both the QualityCriteriaVerificationReport and the StageRolloutReport. These reports are useful only to the verifiers.
    1. MDavid correct, the action to pass from one repo to another is performed either by the RT state change or eventually by the person responsible for a given phase, i.e. in RT not in the repo.
    2. MichelD: The important thing is that the ticket change from "In verification"/"Waiting for response" to "StageRollout"/"Rejected" is done manually and the actions on the Repo (either provide release to StageRollout area, or clean up) is done automatically triggered by the ticket state change. Likewise for the StageRollout phase.
  1. Do we need one more value in the release.xml file (NRMS_New_Release_MetaDATA_schema ), that clearly defines, whether the release it a major, minor or update? (one option we have, it is to parse the <Version> value, but I think that this is not a robust solution, and from the other hand will this <Version>ing schema, be provided in a common manner by all SPs?)
    1. MichelD: As agreed at the F2F we will use a distinct numerical version scheme.
  1. Some validation rules should be added to the RT. At least, only the accepted transitions should be allowed, as those described in the state transition diagram (NSRW_IMPLEMENTATION_RT).
  2. I assume that we need one more, intermediate, level of discrimination in the release.xml schema, in order to describe the SoftwareComponents in a given release. I mean something like:
        <Release>
		<SoftwareComponentA>
			<packageA.1></packageA.1>
			<packageA.2></packageA.2>
			.............................................
                </SoftwareComponentA>
		<SoftwareComponentB>
			<packageB.1></packageB.1>
			.............................................
                </SoftwareComponentB>
		.............................................
	</Release>

Fields in RT Ticket

Field Description Values Read Write Status
RepositoryURL Holds a Pointer to the Release in EGI Repo. All Repo Implemented
Sync-Protocol indicates which protocol to be used to download the data rsync (later http, ftp will be added)

default: rsync

null: no

All SW-Provider Implemented
Sync-URL Points to the repo from which we download the release to our reposiotory null: no All SW-Provider Implemented
AssociatedMajorRelease Used to define into which Major Release this Release belongs to See above

null: no

All SW-Provider implemented
ReleaseType Indicates whether the given release, should be handled by the Repo as NonIncremental or as an Incremental one. MDavid: I think the default: Incremental because they will be more frequent NonIncremental Incremental

default: NonIncremental

null: no

All SW-Provider Implemented
EmergencyRelease See Above No, Yes

default: No

null: no

All SW-Provider Implemented
Status See Above See Above

default: new

null: no

All Verification teams Implemented
RolloutProgress See Above See Above

default: Submitted

null: no

All Verification teams,

Repo (in case of a failure)

Implemented
Verification-Report See Above See Above

null: no

All Verification teams Implemented
StageRollout-Report See Above See Above

null: no

All Stage-Rollout Verification team Implemented
Release-metadata See Above See Above

null: no

All SW-Provider Implemented
Owner See Above See Above

null: no

All - (??) Implemented
Watchers See Above See Above

null: no

All - (??) Implemented


Legend

  • Proposed: Field proposed by the developers awaiting implementation
  • Pending: Approuved by SA2 Awaiting Implementation
  • Implemented: Field already implemented in RT

Metrics for the REPO

Metrics from SLA with TPs

Metric Description Status
M.REPO-1 Number of releases delivered to EGI Per month Pending
M.REPO-2 Number of releases that passed the quality criteria verification Per Month. Pending
M.REPO-3 Number of releases that passed StageRollout verification Per month. Pending


EGI SA1.3 Metrics

Metric ID Metric Status
M.SA1.3-1 Number of EA teams that did staged rollout for any given release Pending
M.SA1.3-2 Number of releases with success in staged rollout Pending
M.SA1.3-3 Number of releases with reject in staged rollout Pending
M.SA1.3-4 Time taken from StagedRollout to Accepted Pending
M.SA1.3-5 Time taken from Accepted to report published of the staged rollout Pending
M.SA1.3-6 Number of releases a given EA participated per month Pending
M.SA1.3-7 Number of releases a given NGI (aggregate M.SA1.3-6) participated per month Pending

References

EMI Release Plan

Questions from the Developers