Difference between revisions of "EGI-InSPIRE:NSRW IMPLEMENTATION RT"

From EGIWiki
Jump to: navigation, search
(Requests)
(Requests)
Line 769: Line 769:
 
|Kostas
 
|Kostas
 
|E-mail notification to be sent to the corresponding teams depending on the RolloutProgress State e.g when in InVerification it should notify Carlo's team etc.
 
|E-mail notification to be sent to the corresponding teams depending on the RolloutProgress State e.g when in InVerification it should notify Carlo's team etc.
 +
|
 +
|Pending
 +
|-
 +
|DavidG
 +
|a label "Delete" would help clarifying the meaning of the tick box next to the "ReleaseMetadata" file name, when modifying the custom values of an existing ticket. It now just has a tickbox and the name of the file ("release.xml"). But the function of the tick box is left up to the reader to decide ...
 
|
 
|
 
|Pending
 
|Pending

Revision as of 16:27, 22 October 2010

General Comments / Assumptions

  • External Technology Providers will use GGUS to submit tickets
  • Internal Technolgy Providers will use RT to submit tickets
  • Emergency Releases (External and Internal) will be handled only by RT
  • This is Based on Feedback_from_M._Drescher

MDavid I think "Emergency Releases" should also start from GGUS tickets, we will have to check if this is to much overhead, but the urgency of it can be already in "High priority" in ggus

(Custom) fields

EmergencyRelease

This is a check-box field used to flag a release as an emergency one which implies that the release should go straight to production

MDavid this should be evaluated in a case by case basis, I was planing that it may go through a very fast staged rollout.

Status

Existing field. Takes the general RT status of the ticket.

Allowed values:

  • new - Ticket is created, but not assigned yet. Initial state once synched from GGUS.
  • open - Ticket is assigned to a person or group (does that feature exist in RT?) and work is undertaken
  • rejected - The ticket is faulty, invalid, duplicate or otherwise something that must not inflict work upon potential assignees.
  • resolved - The request has been resolved with an outcome (and auditable trace of documentation etc.!). Causes the ticket to be synchronized back to GGUS.
  • deleted - does not apply
  • stalled - does not apply(?)


RolloutProgress

The RolloutProgress reflects both the phases of the SW release workflow, and the final outcome.

Allowed values:

  • Submitted - This is the initial value of the ticket till the new release is downloaded in the "Unverified" Repo to be checked.
  • Unverified - The pertinent release has not yet been verified against the Quality Criteria (QC), but is available for that in the "Unverified" repository.
  • In verification - The release is under assessment against the QC.
  • Waiting for response - The QC verification officer is waiting for a response from the technology provider for one or more minor, uncritical issues of the package (missing documentation link, for example). Note that the state may oscillate between "In verification" and "Waiting for response". Each state change is recorded hence available for metrics.
  • StageRollout - The release has been successfully verified, and a link to the final verification report in the document database has been provided in the ticket.
  • Accepted - The EA have tested the release in their production infrastructure (open issue what is the testing process? Is this anywhere formalised?), and written reports of each EA were aggregated and consolidated by the assigned release manager. The consolidated report is stored in the document database with references to the individual EA reports. A link to the aggregated report is given in the ticket. The aggregated report also MUST report any warnings issued by any EA during the StageRollout phase of the release.
  • Rejected - The EA have tested the release in their production infrastructure. Same reporting requirements as for "Accepted" apply. A release may also fail the QC verification. In that case the report must be present reasoning about the failure of the verification.
  • Failed - Verifying the release has failed, due to technical issues, such as repository unavailability, network timeouts, etc. The detailed reason is given in the message when changing the ticket status. This most likely happens when moving the release from one repository to another, e.g. from "Unverified" to "StageRollout".

QualityCriteria Verification Report

This field holds a link to the report of the QC verification process.


StageRollout Report

This field holds a link to the consolidated report summarising individual reports of all EAs.


Repository URL

Holds the currently valid URL to download the installation package. So, if the ticket has the current RolloutProgress of "StageRollout" (or equivalent), the URL should point to the package in the StageRollout repository.

Release metadata

This "field" I envision not really a field, but an attachment to the ticket that holds the XML data for the release referred to in this ticket. Didn't know how to describe it. For better automation the attachment should always have the same name (for example, "release.xml") and be attached throught the WS integration with GGUS (if possible).

The contents of this XML file is currently under discussion, so I will not drill into it right now.


Owner

The owner of the ticket should change from person to person, depending on the progress through the workflow. I am not a fan of assigning the ticket to a group of people, as this often leads to confusion as to who should pick up the ticket. For now I would like to see group leaders, such as the Task leaders for QC verification (TSA2.3, Carlos Fernandez), and the Task Leader for SR (TSA1.3, Mario David) as standing assignees who then delegate the work appropriately, and collect and consolidate the reports (see other custom fields, and workflow comments).

MDavid Agree, and that is what I thought initially.

Watchers

I would like a list of watchers added to the tickets to ensure proper monitoring. Again, I think the task leaders for QC verification and SR are mandatory watchers, as well as the task leader for repository management (TSA2.4, Kostas). I also would want the members of said tasks (TSA2.4 and TSA2.3 in particular) as watchers. The activity leader for SA2 (currently me, Michel) should be a mandatory watcher as well.

I am thinking of the task leader of TSA2.5, Michael, to also watch this queue as to give the DMSU a forecast on what is in the current pipeline of patches that may be rolled out into production.

I also would like to include the group of early adopters in SR to be watchers, to get their heads up for what's going on. But that needs more discussion that perhaps Mario David may lead.

MDavid early-adopters-XXX should be notified when the status passes to -> "StageRollout". Since watchers are individuals, I think we will need some field that when set (somehow) triggers a notification mail to the respective mlist, now maybe a tricky part is then that only the ones accepting the staged rollout will be included as watchers.

Associated Major Release OR Associated Software

Provides the association of the given release, with a higher level "object" (i.e. "Major release" or even a "Software"). This object should be characterized by a set of static/agreed attributes.

For example, the location on the production repository, where each release associated with an "object" should be populated.

One more (more pragmatic) example: Since the release ca-policy-egi-core-1.37-1 it is associated with the ca-1.0 major release, then the ca-policy-egi-core-1.37-1 should be populated at http://repository.egi.eu/sw/production/CAs/.

[From my point of view, these (they are more than one) kind of attributes, should be decided within the EGI and agreed (maybe) with the provider. They should be at a major release level (or at least at a software level) and they should be updated, for example on a major release sequence]

Workflow

The workflow itself seem look fine. We have three distinct phases, i.e.

  • Preparation
  • Verification
  • StageRollout

We have enough details available to flesh out all three phases, but I am still a bit unclear about the handover between all three phases. I consider those handovers very important as the responsibility for the ticket changes from one task/group to another.

Having said that, I would like to explicitly state that the group leader for the Verification Phase has the authority to change the RolloutProgress from "Unverified" to "Rejected", "Waiting for Reply" or "StageRollout". Likewise, the group leader for the StageRollout has the authority to change the ticket status from "StageRollout" to "Accepted" or "Rejected". The status of the ticket may also be changd accordingly (e.g. to "Resolved").

 

 

Rollout Progress

Location in the Repo

 

To →

From↓

 

Submit

Unverified

In verification

Waiting for response

StageRollout

Accepted

Rejected

Rollout Progress

Submit

 

 

 

 

 

Emerg. Rel.

 

-

Unverified

 

 

 

 

 

 

 

unverified

In verification

 

 

 

 

 

 

 

unverified

Waiting for response

 

 

 

 

 

 

 

unverified

StageRollout

 

 

 

 

 

 

 

stagerollout

Accepted

 

 

 

 

 

 

 

production

Rejected

 

 

 

 

 

 

 

rejected (??)


MDavid you can (and should) have the transition "In verification" -> "StageRollout" since "Waiting for response" may not happen.

Progress / Issues

Requests

Requested By Description Response Status
Carlos block the changes in one ticket along its workflow ("rolloutprogress"). e.g. one ticket in StageRollout should not change to "In verification" This is generally feasible. The question is how to trigger the mechanism, which would block the rollout progress changes. Is there something we already have stored within the ticket or ticket custom fields which can be used to distinguish that the rollout progress should be blocked? Pending
Carlos Record the time the ticket goes to different "rolloutprogress" RT keeps track of the time being worked on a ticket, due dates and so on. However, I'm afraid that these mechanisms are not enough to keep track of time spent in different rollout progress states. I guess that We'll have to keep track on this in a separate custom field. Do you have some format that would suit your needs to store the progress? Would be something like CSV fine? Pending
Kostas Default Value RolloutProgress for a new ticket should be submitted Pending
Kostas when I changed the ticket from submitted to unverified I get the following error "Could not add new custom field value: Permission Denied" Pending
Kostas E-mail notification to be sent to the corresponding teams depending on the RolloutProgress State e.g when in InVerification it should notify Carlo's team etc. Pending
DavidG a label "Delete" would help clarifying the meaning of the tick box next to the "ReleaseMetadata" file name, when modifying the custom values of an existing ticket. It now just has a tickbox and the name of the file ("release.xml"). But the function of the tick box is left up to the reader to decide ... Pending

Progress

A post API has been developed in order to receive post commands originated from the EGI RT system. The information required is: Ticket_Id, Current_Rollout_Progress, Previous_Rollout_Progress and a Yes/No flag that indicates whether it is an Emergency_Release or not. The API has the appropriate functionality to:

  • Validate (in some level) the data received and in case of a failure (i.e. erroneous data), it responds with an error message to the post request made by the RT.
  • The API it is also responsible for communicating the error to the main Repo system, the later appends an explanatory message (as a comment) to the corresponding RT ticket.
  • All the messages received through the API, are stored into a DB (more specific to a table called Queue) for further processing by the Repo ssytem (see below).

The Repo system (lets call it Repo daemon) it is responsible to:

  • Periodically, watches the Queue for any new request
  • Upon a new request reception, it is responsible of perform all the necessary checks and actions
  • For example, in case that the new request which concerns a transition of a given release from Submitted -> Unverified, the Repo daemon:
      • makes a connection to the EGI RT system over the RT REST interface and acquires the necessary data
      • performs the necessary checks to the newly arrived data, based on a checksums xml file that is provided by the RT
      • checks if the mandatory fields are there
      • based on the RolloutProgress transition type, retrieves the necessary configuration values from the database (f.e: if the transition is Submitted -> Unverified the target repository should be the /sw/unverified and so on…)
      • downloads the release.xml (Metadata) file from the RT
      • parses the XML and inserts the data into the database (currently, it parses "only" the <Release> high level section, as this provided at the https://wiki.egi.eu/wiki/NRMS_New_Release_MetaDATA_schema , DTD schema)
      • creates a scratch structure, where it downloads the software using the rsync url provided by the SP (currently \*\*only\*\* rsync is supported)
      • does the necessary housekeeping
      • performs the necessary movements. If for example, the transition is Submitted -> Unverified, the newly submitted release appeared in the unverified repository, following the structure /sw/unverified/<DistributionShortName>/<Version>/ (for the <DistributionShortName> and <Version> values, we are using those that are included in the release.xml \[an example it is available at [1]\])

NOTE: The unverified area/repository it is not in a public view, it requires an EGI SSO account for accessing it.

  • In case of a failure, either a technical one \[i.e. Mysql problem\] or logical \[i.e. unaccepted transition from StageRollout->Unverified\], the Repo daemon appends a comment with an "explanatory" message to the corresponding EGI RT ticket, and sets the ticket\’s RolloutProgress field to \’Failed\’

Issues

  1. To continue our work and extent the functionality offered by the Repo, some issues should be clarified (most of them are also included at [2] ):
  2. should, the Repo, be able to manage both incremental (delta) and non-incremental releases?
    1. MDavid yes, as I see it a major release is a "non-incremental" release, and all other are "delta" only newer packages that should go to the "updates" repo associated to any given major release
  3. Do we need/want more than one releases of the same software in the "stage rollout" phase or should they enter this phase sequential? If we want many releases of the same software in "stage rollout" phase will they be approved/rejected independently? (i.e. rejecting/approving one automatically rejects/approves the other(s))
    1. MDavid there should be only one release at a time for any major release. Though, it might happen to be 2 releases at the same time if they refer to two major release, which are independent from each other.
  4. What is the purpose of the transition from Accepted (Production) to Rejected? Does, it means that the "last release" inserted in the production for a given distribution, should be rejected (Rollback). Correct?
    1. MDavid "accepted" is the state that at least one early adopter has accepted the staged rollout and thus it may go to "rejected" if it fails the staged rollout. (I think!)
  5. Based on the fact that the release.xml file should not be modified after a release has been submitted, the Repo downloads the release.xml from the RT, only when the transition is from Submitted -> Unverified. Agree on this?
    1. MDavid Agreed and decision is "yes" on my part, what do others think?.
  6. The Repo ignores both the QualityCriteriaVerificationReport and the StageRolloutReport. These reports are useful only to the verifiers.
    1. MDavid correct, the action to pass from one repo to another is performed either by the RT state change or eventually by the person responsible for a given phase, i.e. in RT not in the repo.
  7. Do we need one more value in the release.xml file ([3] ), that clearly defines, whether the release it a major, minor or update? (one option we have, it is to parse the <Version> value, but I think that this is not a robust solution, and from the other hand will this <Version>ing schema, be provided in a common manner by all SPs?)
  8. Some validation rules should be added to the RT. At least, only the accepted transitions should be allowed, as those described in the state transition diagram ([4]).
  9. I assume that we need one more, intermediate, level of discrimination in the release.xml schema, in order to describe the SoftwareComponents in a given release. I mean something like:
        <Release>
		<SoftwareComponentA>
			<packageA.1></packageA.1>
			<packageA.2></packageA.2>
			.............................................
                </SoftwareComponentA>
		<SoftwareComponentB>
			<packageB.1></packageB.1>
			.............................................
                </SoftwareComponentB>
		.............................................
	</Release>

Fields in RT Ticket

Field Description Values Read Write Status
RepositoryURL Holds a Pointer to the Release in EGI Repo. All Repo Implemented
Sync-Protocol indicates which protocol to be used to download the data rsync (later http, ftp will be added)

default: rsync

null: no

All SW-Provider Implemented
Sync-URL Points to the repo from which we download the release to our reposiotory null: no All SW-Provider Implemented
AssociatedMajorRelease Used to define into which Major Release this Release belongs to See above

null: no

All SW-Provider pending
ReleaseType Indicates whether the given release, should be handled by the Repo as NonIncremental or as an Incremental one. MDavid: I think the default: Incremental because they will be more frequent NonIncremental Incremental

default: NonIncremental

null: no

All SW-Provider Implemented
EmergencyRelease See Above No, Yes

default: No

null: no

All SW-Provider Implemented
Status See Above See Above

default: new

null: no

All Verification teams Implemented
RolloutProgress See Above See Above

default: Submitted

null: no

All Verification teams,

Repo (in case of a failure)

Implemented
Verification-Report See Above See Above

null: no

All Verification teams Implemented
StageRollout-Report See Above See Above

null: no

All Stage-Rollout Verification team Implemented
Release-metadata See Above See Above

null: no

All SW-Provider Implemented
Owner See Above See Above

null: no

All - (??) Implemented
Watchers See Above See Above

null: no

All - (??) Implemented


Legend

  • Proposed: Field proposed by the developers awaiting implementation
  • Pending: Approuved by SA2 Awaiting Implementation
  • Implemented: Field already implemented in RT

Metrics for the REPO

Metrics from SLA with TPs

Metric Description Status
M.REPO-1 Number of releases delivered to EGI Per month Pending
M.REPO-2 Number of releases that passed the quality criteria verification Per Month. Pending
M.REPO-3 Number of releases that passed StageRollout verification Per month. Pending


EGI SA2 Metrics

Metric ID Metric Task Status
M.SA2-1 Number of software components recorded in the UMD Roadmap TSA2.1 Pending
M.SA2-2 Number of UMD Roadmap Capabilities defined through validation criteria TSA2.2 Pending
M.SA2-3 Number of software incidents found in production that result in changes to quality criteria TSA2.2 Pending
M.SA2-4 Number of new releases validated against defined criteria TSA2.3 Pending
M.SA2-5 Mean time taken to validate a release TSA2.3 Pending
M.SA2-6 Number of releases failing validation TSA2.3 Pending
M.SA2-7 Number of new releases contributed into the Software Repository from all types of software providers TSA2.4 Pending
M.SA2-8 Number of unique visitors to the Software Repository TSA2.4 Pending
M.SA2-9 Number of releases downloaded from the Software Repository TSA2.4 Pending
M.SA2-10 Number of tickets assigned to DMSU TSA2.5 Pending
M.SA2-11 Mean time to resolve DMSU tickets TSA2.5 Pending

EGI SA1.3 Metrics

Metric ID Metric Status
M.SA1.3-1 Number of EA teams that did staged rollout for any given release Pending
M.SA1.3-2 Number of releases with success in staged rollout Pending
M.SA1.3-3 Number of releases with reject in staged rollout Pending
M.SA1.3-4 Time taken from StagedRollout to Accepted Pending
M.SA1.3-5 Time taken from Accepted to report published of the staged rollout Pending
M.SA1.3-6 Number of releases a given EA participated per month Pending
M.SA1.3-7 Number of releases a given NGI (aggregate M.SA1.3-6) participated per month Pending

References

EMI Release Plan

Questions from the Developers