EGI-InSPIRE:NSRW IMPLEMENTATION RT
|EGI Inspire Main page|
General Comments / Assumptions
- External Technology Providers will use GGUS to submit tickets
- Internal Technolgy Providers will use RT to submit tickets
- Emergency Releases (External and Internal) will be handled only by RT
- This is Based on Feedback_from_M._Drescher
MDavid I think "Emergency Releases" should also start from GGUS tickets, we will have to check if this is to much overhead, but the urgency of it can be already in "High priority" in ggus
This is a check-box field used to flag a release as an emergency one which implies that the release should go straight to production
MDavid this should be evaluated in a case by case basis, I was planing that it may go through a very fast staged rollout.
Existing field. Takes the general RT status of the ticket.
- new - Ticket is created, but not assigned yet. Initial state once synched from GGUS.
- open - Ticket is assigned to a person or group (does that feature exist in RT?) and work is undertaken
- rejected - The ticket is faulty, invalid, duplicate or otherwise something that must not inflict work upon potential assignees.
- resolved - The request has been resolved with an outcome (and auditable trace of documentation etc.!). Causes the ticket to be synchronized back to GGUS.
- deleted - does not apply
- stalled - does not apply(?)
The RolloutProgress reflects both the phases of the SW release workflow, and the final outcome.
Allowed values: see Workflow
QualityCriteria Verification Report
This field holds a link to the report of the QC verification process.
This field holds a link to the consolidated report summarising individual reports of all EAs.
Holds the currently valid URL to download the installation package. So, if the ticket has the current RolloutProgress of "StageRollout" (or equivalent), the URL should point to the package in the StageRollout repository.
This "field" I envision not really a field, but an attachment to the ticket that holds the XML data for the release referred to in this ticket. Didn't know how to describe it. For better automation the attachment should always have the same name (for example, "release.xml") and be attached throught the WS integration with GGUS (if possible).
The contents of this XML file is currently under discussion, so I will not drill into it right now.
The owner of the ticket should change from person to person, depending on the progress through the workflow. I am not a fan of assigning the ticket to a group of people, as this often leads to confusion as to who should pick up the ticket. For now I would like to see group leaders, such as the Task leaders for QC verification (TSA2.3, Carlos Fernandez), and the Task Leader for SR (TSA1.3, Mario David) as standing assignees who then delegate the work appropriately, and collect and consolidate the reports (see other custom fields, and workflow comments).
MDavid Agree, and that is what I thought initially.
I would like a list of watchers added to the tickets to ensure proper monitoring. Again, I think the task leaders for QC verification and SR are mandatory watchers, as well as the task leader for repository management (TSA2.4, Kostas). I also would want the members of said tasks (TSA2.4 and TSA2.3 in particular) as watchers. The activity leader for SA2 (currently me, Michel) should be a mandatory watcher as well.
I am thinking of the task leader of TSA2.5, Michael, to also watch this queue as to give the DMSU a forecast on what is in the current pipeline of patches that may be rolled out into production.
I also would like to include the group of early adopters in SR to be watchers, to get their heads up for what's going on. But that needs more discussion that perhaps Mario David may lead.
MDavid early-adopters-XXX should be notified when the status passes to -> "StageRollout". Since watchers are individuals, I think we will need some field that when set (somehow) triggers a notification mail to the respective mlist, now maybe a tricky part is then that only the ones accepting the staged rollout will be included as watchers.
Associated Major Release OR Associated Software
Provides the association of the given release, with a higher level "object" (i.e. "Major release" or even a "Software"). This object should be characterized by a set of static/agreed attributes.
For example, the location on the production repository, where each release associated with an "object" should be populated.
One more (more pragmatic) example: Since the release ca-policy-egi-core-1.37-1 it is associated with the ca-1.0 major release, then the ca-policy-egi-core-1.37-1 should be populated at http://repository.egi.eu/sw/production/CAs/.
[From my point of view, these (they are more than one) kind of attributes, should be decided within the EGI and agreed (maybe) with the provider. They should be at a major release level (or at least at a software level) and they should be updated, for example on a major release sequence]
RollOutProgress has the following States
- Unverified (Initial Value): RT Notifies Repo and QCVFY
- InVerification: QCVFY performs verification, not Action from RT,REPO
- WaitingForResponse: Waiting For response from TP, not Action from RT,REPO
- StageRollOut: RT Notifies Repo, SR Team
- Deferred: Special State used by the Repo to signal that there is another release in SR
- Rejected: RT Notifies Repo, TP With Outcome for the Reason
- Production: RT: RT Notifies Repo,TP,SA1 Release Manager
- Failed: Special State used by the Repo to signal that there is a failure.
Two New States to be added. Both will work exactly as Rejected but recorded with different semantics as follows.
- Ignored - the submitted target does not match the UMD supported targets.
- Not published - the product passed the EGI provisioning process but was for the given reason not published.
Note: When ignoring, rejecting, or not publishing a product, the RT ticket has to be resolved.
This images Describes the Repo View of the above State Diagram with more detail
After testing of the notification mechanisms we (Marios and me) have realized that we are not completely sure who should be notified when during the ticket lifespan. Current implementation follows our discussions during December. Respective groups are added as ticket CCs when the Repo returns CommunicationStatus equal to OK and the RolloutProgress is as depicted in the table below. Note that the sw-rel-admin is always notified as AdminCcs of all changes for the tickets in the sw-rel queue.
|RolloutProgress||Groups to be notified|
|Waiting for response|
The notification mechanism works nicely. However, we have realized that the table above is probably not sufficient and we might want to have a full matrix of RolloutProgress states and groups to be notified. That would be probably implemented by purging the CCs on each CommunicationStatus == OK && RolloutProgress change and adding the groups from scratch, but this is not a big deal from the imlpementation point of view.
Progress / Issues
|Carlos||block the changes in one ticket along its workflow ("rolloutprogress"). e.g. one ticket in StageRollout should not change to "In verification"||This is generally feasible. The question is how to trigger the mechanism, which would block the rollout progress changes. Is there something we already have stored within the ticket or ticket custom fields which can be used to distinguish that the rollout progress should be blocked?||Implemented|
|Carlos||Record the time the ticket goes to different "rolloutprogress"||RT keeps track of the time being worked on a ticket, due dates and so on. However, I'm afraid that these mechanisms are not enough to keep track of time spent in different rollout progress states. I guess that We'll have to keep track on this in a separate custom field. Do you have some format that would suit your needs to store the progress? Would be something like CSV fine?||Pending|
|Kostas||Default Value RolloutProgress for a new ticket should be submitted||Was already in place. I have found a bug in my code which did not sanitize the RolloutProgress with "no value" (i.e., null) as the input. I have changed the sanitization condition from (if ($self->TicketObj->FirstCustomFieldValue('RolloutProgress')) )to unless ($self->TicketObj->FirstCustomFieldValue('RolloutProgress') eq 'Submitted').||Implemented|
|Kostas||when I changed the ticket from submitted to unverified I get the following error "Could not add new custom field value: Permission Denied"||This was actually more general issue where RT did not follow the ModifyCustomField rights quite correctly when modifying the ticket.||Solved|
|Kostas||E-mail notification to be sent to the corresponding teams depending on the RolloutProgress State e.g when in InVerification it should notify Carlo's team etc.||Implemented|
|DavidG||a label "Delete" would help clarifying the meaning of the tick box next to the "ReleaseMetadata" file name, when modifying the custom values of an existing ticket. It now just has a tickbox and the name of the file ("release.xml"). But the function of the tick box is left up to the reader to decide ...||ACK. Makes sense. I had to hack this directly in the RT source. It should be clear now.||Implemented|
A post API has been developed in order to receive post commands originated from the EGI RT system. The information required is: Ticket_Id, Current_Rollout_Progress, Previous_Rollout_Progress and a Yes/No flag that indicates whether it is an Emergency_Release or not. The API has the appropriate functionality to:
- Validate (in some level) the data received and in case of a failure (i.e. erroneous data), it responds with an error message to the post request made by the RT.
- The API it is also responsible for communicating the error to the main Repo system, the later appends an explanatory message (as a comment) to the corresponding RT ticket.
- All the messages received through the API, are stored into a DB (more specific to a table called Queue) for further processing by the Repo ssytem (see below).
The Repo system (lets call it Repo daemon) it is responsible to:
- Periodically, watches the Queue for any new request
- Upon a new request reception, it is responsible of perform all the necessary checks and actions
- For example, in case that the new request which concerns a transition of a given release from Submitted -> Unverified, the Repo daemon:
- makes a connection to the EGI RT system over the RT REST interface and acquires the necessary data
- performs the necessary checks to the newly arrived data, based on a checksums xml file that is provided by the RT
- checks if the mandatory fields are there
- based on the RolloutProgress transition type, retrieves the necessary configuration values from the database (f.e: if the transition is Submitted -> Unverified the target repository should be the /sw/unverified and so on…)
- downloads the release.xml (Metadata) file from the RT
- parses the XML and inserts the data into the database (currently, it parses "only" the <Release> high level section, as this provided at the NRMS_New_Release_MetaDATA_schema , DTD schema)
- creates a scratch structure, where it downloads the software using the rsync url provided by the SP (currently \*\*only\*\* rsync is supported)
- does the necessary housekeeping
- performs the necessary movements. If for example, the transition is Submitted -> Unverified, the newly submitted release appeared in the unverified repository, following the structure /sw/unverified/<DistributionShortName>/<Version>/ (for the <DistributionShortName> and <Version> values, we are using those that are included in the release.xml \[an example it is available at \])
NOTE: The unverified area/repository it is not in a public view, it requires an EGI SSO account for accessing it.
- In case of a failure, either a technical one \[i.e. Mysql problem\] or logical \[i.e. unaccepted transition from StageRollout->Unverified\], the Repo daemon appends a comment with an "explanatory" message to the corresponding EGI RT ticket, and sets the ticket\’s RolloutProgress field to \’Failed\’
- To continue our work and extent the functionality offered by the Repo, some issues should be clarified (most of them are also included at Questions_from_the_Developers ):
- should, the Repo, be able to manage both incremental (delta) and non-incremental releases?
- MDavid yes, as I see it a major release is a "non-incremental" release, and all other are "delta" only newer packages that should go to the "updates" repo associated to any given major release
- Do we need/want more than one releases of the same software in the "stage rollout" phase or should they enter this phase sequential? If we want many releases of the same software in "stage rollout" phase will they be approved/rejected independently? (i.e. rejecting/approving one automatically rejects/approves the other(s))
- MDavid there should be only one release at a time for any major release. Though, it might happen to be 2 releases at the same time if they refer to two major release, which are independent from each other.
- MichelD: Yes, this is indeed a requirement as EMI communicated that they will support more than one major release at a time given on its lifetime and support contracts.
- What is the purpose of the transition from Accepted (Production) to Rejected? Does, it means that the "last release" inserted in the production for a given distribution, should be rejected (Rollback). Correct?
- MDavid "accepted" is the state that at least one early adopter has accepted the staged rollout and thus it may go to "rejected" if it fails the staged rollout. (I think!)
- MichelD: I very much hope that the release in the rollout workflow gets only accepted if all EA reports are evaluated, and not already when the first EA report is positive! The semantics here is that as soon as the RT ticket is set to "Accepted" (By Mario) the Repo moves the software components to the appropriate areas (updates for minor and revision reeases, new are for a major release etc).
As for the mentioned transition I think this is misplaced, and should not happen. I think it tries to model the use case of software components being phased out from production (e.g. gLite 3.1 components.)
- Based on the fact that the release.xml file should not be modified after a release has been submitted, the Repo downloads the release.xml from the RT, only when the transition is from Submitted -> Unverified. Agree on this?
- MDavid Agreed and decision is "yes" on my part, what do others think?.
- MichelD: Yes.
- The Repo ignores both the QualityCriteriaVerificationReport and the StageRolloutReport. These reports are useful only to the verifiers.
- MDavid correct, the action to pass from one repo to another is performed either by the RT state change or eventually by the person responsible for a given phase, i.e. in RT not in the repo.
- MichelD: The important thing is that the ticket change from "In verification"/"Waiting for response" to "StageRollout"/"Rejected" is done manually and the actions on the Repo (either provide release to StageRollout area, or clean up) is done automatically triggered by the ticket state change. Likewise for the StageRollout phase.
- Do we need one more value in the release.xml file (NRMS_New_Release_MetaDATA_schema ), that clearly defines, whether the release it a major, minor or update? (one option we have, it is to parse the <Version> value, but I think that this is not a robust solution, and from the other hand will this <Version>ing schema, be provided in a common manner by all SPs?)
- MichelD: As agreed at the F2F we will use a distinct numerical version scheme.
- Some validation rules should be added to the RT. At least, only the accepted transitions should be allowed, as those described in the state transition diagram (NSRW_IMPLEMENTATION_RT).
- I assume that we need one more, intermediate, level of discrimination in the release.xml schema, in order to describe the SoftwareComponents in a given release. I mean something like:
<Release> <SoftwareComponentA> <packageA.1></packageA.1> <packageA.2></packageA.2> ............................................. </SoftwareComponentA> <SoftwareComponentB> <packageB.1></packageB.1> ............................................. </SoftwareComponentB> ............................................. </Release>
Fields in RT Ticket
|RepositoryURL||Holds a Pointer to the Release in EGI Repo.||All||Repo||Implemented|
|Sync-Protocol||indicates which protocol to be used to download the data||rsync (later http, ftp will be added)
|Sync-URL||Points to the repo from which we download the release to our reposiotory||null: no||All||SW-Provider||Implemented|
|AssociatedMajorRelease||Used to define into which Major Release this Release belongs to||See above
|ReleaseType||Indicates whether the given release, should be handled by the Repo as NonIncremental or as an Incremental one. MDavid: I think the default: Incremental because they will be more frequent||NonIncremental Incremental
|EmergencyRelease||See Above||No, Yes
|Status||See Above||See Above
|RolloutProgress||See Above||See Above
Repo (in case of a failure)
|Verification-Report||See Above||See Above
|StageRollout-Report||See Above||See Above
|All||Stage-Rollout Verification team||Implemented|
|Release-metadata||See Above||See Above
|Owner||See Above||See Above
|Watchers||See Above||See Above
- Proposed: Field proposed by the developers awaiting implementation
- Pending: Approuved by SA2 Awaiting Implementation
- Implemented: Field already implemented in RT
Metrics for the REPO
Metrics from SLA with TPs
|M.REPO-1||Number of releases delivered to EGI Per month||Pending|
|M.REPO-2||Number of releases that passed the quality criteria verification Per Month.||Pending|
|M.REPO-3||Number of releases that passed StageRollout verification Per month.||Pending|
EGI SA1.3 Metrics
|M.SA1.3-1||Number of EA teams that did staged rollout for any given release||Pending|
|M.SA1.3-2||Number of releases with success in staged rollout||Pending|
|M.SA1.3-3||Number of releases with reject in staged rollout||Pending|
|M.SA1.3-4||Time taken from StagedRollout to Accepted||Pending|
|M.SA1.3-5||Time taken from Accepted to report published of the staged rollout||Pending|
|M.SA1.3-6||Number of releases a given EA participated per month||Pending|
|M.SA1.3-7||Number of releases a given NGI (aggregate M.SA1.3-6) participated per month||Pending|