EGI IGTF Release Process

From EGIWiki
Revision as of 21:33, 28 November 2010 by Davidg (talk | contribs) (Procedure)
Jump to: navigation, search

This page is in draft

About this distribution

This page describe the procedure for the announcement and propagation of a new release of the EGI Trust Anchor distribution. The EGI Policy on Approved Certification Authorities describes the set of trust anchors accepted by EGI and put forward for consideration by the NGIs to install as the 'agreed set' of Identity Management trust anchors. For the time being, only PKI trust achors ('Certification Authorities') are included in this release.

The announcement and distribution of trust anchors within EGI and the NGIs should be done in a well defined order to ensure consistent deployment and to ensure that the monitoring of the NGIs and sites matches the currently recommended version of the trust anchors.

Transition process

For historical reasons, most of the EGI sites (i.e. all those that do not have their own local or national distribution already), use as the meta-package that enshrines the EGI policy on approved CAs one called 'lcg-CA'. In the future, this package is scheduled to go away, for for the time being it is provided as a transitionary package in the repository. New sites can install the new 'ca-policy-egi-core' package directly. Sites that upgrade have to do nothing special apart from changing the repository URL. For them, 'lcg-CA' will trigger the 'ca-policy-egi-core' package as well as the 'ca-policy-lcg' package, both of which are for the time being contained in the new repository for compatibility purposes.

For the time being both policies are the same, i.e., the same dependencies are included in both RPMs and installing either one of them will trigger the installation of the same set of CAs. The EGI specific set of endorsed CAs is encoded in the 'ca-policy-egi-core' meta-package. However, it is not guaranteed that in the future the EGI and wLCG policies will stay aligned, so sites that relied on the 'lcg-CA' package distributed by EGEE to also comply with the wLCG policies on accepted CAs should keep this in mind. Also, the acceptable CA policy for your organisation, country or region may be different from the EGI one: for example, your organisation or NGI may have approved additional CAs to support training activities on your infrastructure, or have additional CAs for your country or your site! As a result, you may want to install additional CAs that are not on the EGI core list.

Teams involved

  • IGTF/EUGridPMA Liaison: David Groep (egi-igtf-liaison@nikhef.nl)
  • Repository and SR Team via RT: Kostas Koumantaros, Mario David, Michel Drescher
  • Monitoring/SAM Team (central monitoring configuration centre): Emir Imamagic <eimamagi@srce.hr>, Konstantin Skaburskas <same-devel@cern.ch>
  • NGI Nagios updates: via NGI operational contacts
  • Regional follow-up: NGI contacts and all sites (via CIC portal)
  • Security oversight and determination of grace period: EGI CSIRT/Mingchao Ma and Sven Gabriel

Procedure

  1. David Groep (under EGI contract O-E-15/SA1) will build the EGI-specific trust anchor package, currently called 'ca-policy-egi-core' and dependencies, and make that available on a special site only intended for INTERNAL use by the deployment process (there is also a ca-policy-lcg package). This "special" distribution contains:
    • all CAs accredited under the IGTF "classic", "mics", and "slcs" profiles (following the Policy on Accepted CAs)
    • the "ca-policy-egi-core" as well as the legacy/compatibility packages "lcg-CA" and its corresponding "ca-policy-lcg" meta package
    • any other EGI specific CAs or exception as decided by EGI (currently none)
    • the patch RPM for the Apache mod_ssl bug (the "dummycas" RPM)
    • the appropriate release.xml, meta-data, the readme file and repo headers
    This distribution will be unit tested before being released. These RPMs are made available on the dedicated EGI-IGTF liaison web site (not on production sites at that step):
     wget -r http://egi-igtf.ndpf.info/distribution/egi/
    or using rsync from rsync://egi-igtf.ndpf.info/egi-igtf/egi/
  2. David will submit a RT ticket to the sw-rel queue
    • the RT URL is https://rt.egi.eu/rt/Ticket/Create.html?Queue=24.
    • including the release.xml NRMS that gets produced as part of the repository at  http://egi-igtf.ndpf.info/distribution/egi/current/ca-policy-egi-core-VERSION-RELEASE.nrms (also called just release.xml).
    • ticket subject will be standard: "CA update, version X.Y.Z-R", with a comment inside saying: "please, follow the EGI-IGTF release process at https://wiki.egi.eu/wiki/EGI_IGTF_Release_Process".
    • The RT ticket that triggers the updates (which can be sent anytime), should systematically include a reminder not to start after mid-week, and preferably on a Monday.
    • the EGI specific change log in the release file. This change log is intended for direct distribution to the EGI sites
    • In RT, the monitoring team should be involved via a CC to eimamagi@srce.hr, and the security officers via mingchao.ma@stfc.ac.uk, sveng@nikhef.nl.
    this triggers the import of distribution to the EGI Repository "unverified" repository for starting the staged-rollout process via the RT link as explained in the NSRW New Software Release Workflow
    To ensure a smooth, parallel update of the repository and the monitoring/Nagios tests, the non-urgent updates should be be started prior to Wednesdays at mid-day.
  3. New ticket (with ticket ID) is announced by David Groep (egi-igtf-liaison@nikhef.nl) (following on or forwarding the EUGridPMA-Announce list) to
    • the EGI CSIRT via egi-csirt-team@mailman.egi.eu
    • the EGI MW unit via david@lip.pt, jorge@lip.pt (or later sw-rollout-management@mailman.egi.eu)
    • the monitoring teams to update the SAM CE probes via project-eu-egee-middleware-iteam@cern.ch (Konstantin.Skaburskas@cern.ch, eimamagi@srce.hr)
    • Mail paste list: egi-csirt-team@mailman.egi.eu, david@lip.pt, jorge@lip.pt, project-eu-egee-middleware-iteam@cern.ch, eimamagi@srce.hr, Konstantin.Skaburskas@cern.ch, noc-managers@mailman.egi.eu, mingchao.ma@stfc.ac.uk, sveng@nikhef.nl, egi-igtf-liaison@nikhef.nl
    • Either the EGI CSIRT, the EGI Operational Security Coordinators, or the MW unit can declare this update as urgent.
  4. The RT ticket will trigger the import into unverified, after which the initial acceptance process will check compliance with the apckage requirements and upload the report.
    • The Initial QA must be done within a few hours.
    • This will bring the package to .../sw/stagerollout/"
  5. At the transition to .../sw/stagerollout/
    • LIP quickly verifies if the package works in an at-most-one-day-SR process, but 'does not yet close the ticket. It gets assigned to the Monitoring team if SR is successful.
    • the monitoring team updates the old-style WN probres (CE-sft-caver) as described below, based on the data in the SR repository, to update the legacy CE tests if those are still in production use, as both changes are independent. The new-style probes use dynamic data and do not need to be updated.
  6. If staged roll-out is successful the next steps should be done in rapid succession by the monitoring team due to the structure of the CAdist-probe logic:
  7. Monitoring team:
    • updates the ca_dist.dat file using the new CA RPM list (see below!)
    • updates the ca_dist.conf file with the absolute calendar update timings used in Nagios (see below as well), in accordance with the EGI-CSIRT or liaison recommendation (8 days for regular, or 1 day for urgent)
    • builds and releases the updated org.sam probes via a very-quick SW process in parallel to the update. This will be the new sam release
  8. The monitoring team informs the NGI Operations managers to update the Nagios instances via noc-managers@mailman.egi.eu
  9. The monitoring team now closes the RT ticket, which will trigger the over-write of the .../sw/production/cas/1/ directory.
    • Please now verify that the update was non-incremental and that the whole contents of the ".../production/cas/1/" directory has indeed been replaced at  http://repository.egi.eu/sw/production/cas/1/.
    • If the repository has not properly updated by itself, contact the repo team (Kostas Koumantatos at <kkoum@GRNET.GR>) to have it fixed. It should be a non-incremental upgrade from version 1 to version 1, and the URL http://repository.egi.eu/sw/production/cas/1/current/ MUST point to the new release!
  10. On closure:
    • the IGTF liaison (DavidG) updates the EGI_IGTF_Release wiki page with installation information to list the new version number.
    • The Release Manager (Mario David/LIP?) sends an announcement with changelog (from the GGUS ticket) to the NGI contacts that the repository contains new content through the Operations Portal (select "To ROC Managers", "To Production Site Admin"), with a CC to the gLite EMT. Template emails including email subjects can be found here. The change log to be included in the mail is part of the GGUS ticket, and can also be found at http://egi-igtf.ndpf.info/distribution/egi/current/ as file ca-policy-egi-core-readme-X.Y.txt(with X.Y the version number).
    • the EGI-CSIRT verifies that the repository links and release notes of the CA related web pages ([1] and EGI_IGTF_Release) have been updated and that the relevant broadcast of step 8 is achieved by looking at https://cic.gridops.org/index.php?section=vo&page=broadcast_archive (limit the dates to the last few days, and fill "Criterion 3" with "CA").

Initial transition process

Of old, the distribution was released from the URL

http://glitesoft.cern.ch/LCG-CAs/current/RPMS.production/

which was owned and managed by the gLite EMT. This URL will now change, so a clear explanation must bee sent to all sites when going to release 1.38. This is also the proper time to announce the new symlinked/OpenSSL1 format.

The new repository data for yum is

[EGI-trustanchors]
name=EGI-trustanchors
baseurl=http://repository.egi.eu/sw/production/cas/1/current/
gpgkey=http://repository.egi.eu/sw/production/cas/1/GPG-KEY-EUGridPMA-RPM-3
gpgcheck=1
enabled=1

which is (for 1.38) also available from http://repository.egi.eu/sw/production/cas/1/current/repo-files/egi-trustanchors.repo.

For sites that are updating, a simple

yum clean all
yum update lcg-CA

is sufficient. They will then get ca-polocy-egi-core and the other compat packages in one go. New sites can do

yum install ca-policy-egi-core

and be done with it.

This information will also be given on the release page https://wiki.egi.eu/wiki/EGI_IGTF_Release.

Notes on the CAdist probe

  • the list of required packages is based on parsing the old-style Yum header.info file. Once yum-arch has been completely replaced, this file should be created by hand and contain at least this one line:
     0:ca_policy_igtf-classic-([\d.]+)-.*
    where [\d.]+ contains a version number like "1.35"
  • the release date of the EGI distribution is taked from the Last-Modified header sent by the web server hosting the header.info file
  • The latest IGTF version and release date are taken from the CHANGES file at http://dist.eugridpma.info/distribution/igtf/current/CHANGES, which the utility expects to be in a specific format (see sources)
  • the grace period is always 7 days -- there is no means to enforce critical updates from the 'outside' of a site

Notes on the SAM tests

  • SAM tests need to be updated, specifically the "lcg-sam-client-sensors" and "grid-monitoring-probes-org.sam" and these need to be installed in production:
Hi Emir,

I've found back the instructions for generating those magic Python
dictionaries for the worker-node CA probes.

You can do it in three steps:

- download the CE-sft-caver probe:
   wget http://svnweb.cern.ch/guest/sam/trunk/probes/src/wnjob/org.sam/probes/org.sam/sam/CE-sft-caver 

- download the to-be-latest RPMs in a separate directory:
    mkdir input && cd input && \
    wget -q -nH -nd -c -r -l 1 \
      http://repository.egi.eu/sw/stagerollout/cas/1/1.XX-R/current/RPMS/

  or if you want to use the production one
    wget -q -nH -nd -c -r -l 1 \
      http://repository.egi.eu/sw/production/cas/1/current/RPMS/

- REMOVE the 'dummyca' RPM:
    rm dummy-ca-certs-20090630-1.noarch.rpm

- move back up (keeps things clean)
    cd ..

- generate the dictionaries
    ./CE-sft-caver -a input -T
  (or use the "-u" option for an urgent update of 1 day)

- do something useful with the generated files

- update the RT ticket and allow it to be closed (thus installing
 the stagerollout/ version into production/)

However, there is a fundamental problem with the probe: it uses OpenSSL
to calculate the hashes, so the generated dictionaries work *either*
with OpenSSL 0.x, OR with OpenSSL1, but NOT with both. That is a
limitation of the probe itself.
So, generate the ca_data.dat file on a machine that matches the target
of the CE probes!

Anyway, the old-style probe has to go away, because it deals with
'removed' CAs in the wrong way. It will start complaining even if the
CA is still technically OK ...
Long live the new probe!