Talk:SPG:Drafts:Virtualisation Policy

From EGIWiki
Jump to: navigation, search

Issues and discussion[edit]

Comments from Dave Kelsey (16 June 2011)[edit]

Here is an extract from an email I sent to the HEPiX VM working group back on 10th April 2010. I send this just in case it helps in defining our VM classification.

There have been two attempts (AFAIK) to describe the overall model. First in time was the work by the NL BigGrid working group. https://wiki.nbic.nl/images/f/f6/ProgressReport-1.0.pdf

The report defines the following three classes of VM images.

  • Class 1: Site provided, similar to current worker nodes.
  • Class 2: From trustworthy sources, running as part of the trusted network fabric.
  • Class 3: Generated by end-users running in a DMZ.

JSPG at its recent meeting (but without actually considering in the BiGGrid report) defined 3 models as follows:

  • Model 1: The Computer Centre view. Increase the number of worker nodes by virtualising them. Fully controlled by the Site with full access to the batch system and network file system. Neither the VO nor the User has root access.
  • Model 2: The VO view. Images produced by a small number of trusted people on behalf of the VO. Similar to some aspects of the Amazon EC2 services and/or the CERNVM project. User probably needs root access to the VM instance to monitor and maintain their environment. This may be OK if the VM does not have access to the site batch system or site file system.
  • Model 3: Individual users are producing their own images. Difficult to see how this could be done in a trustworthy way except for full containment of the running image.

There is a pretty close match between the two classification schemes, the one difference perhaps being as to whether the image provider has root access to running instances of Class 2/Model 2.

JSPG in its discussions decided to concentrate on model 1 and noted that model 2 needed more discussion to define exactly what it was.

Comments from Peter Solagna (27 June 2011)[edit]

Definitions and VM types[edit]

VM operator[edit]

Does it mean that the VM operator is the entity who started the VM, who could stop it and have root access to the machine? Everyone who has root access to the machine is a Operator, or a user can have root access to the machine, not being its operator? I would explicitly wrote in the definitions who has the root access (or superuser) for the virtual machines and who has not. I agree that the operator has the responsibility to patch/update running machines (that can run for months), but I would share this responsibility with the Endorsers: who should provide an image as up-to-date and patched as it is possible.

From the meeting: Yes, the operator is really the person with root access. The document has been clarified to incorporate this.

Site originated VM[edit]

I would substitute "is not directly visible to users" with "virtualization is not directly accessible by users". Maybe it is the wrong wording, but I want just to consider in this definition also the case where a user choose the machine he wants to instantiate, from a list of images provided directly by the site (e.g. different o.s.).

From the meeting: Change incorporated.

Point 2, bullet 4[edit]

there should be no installed accounts, host/service certificates, ssh keys or user credentials of any form in an image; (Does not apply to all the classes above, needs to be reviewed) I can't understand the reasons for this rule applied in the different scenarios. I think that ssh-keys are necessary to let the operator or the user access the root account in the VM.

From the meeting: The separation between the endorser and the operator has been emphasised and the structure of the document revised in this respect. Also, further clarifications have made regarding credentials, as public SSH keys and private SSH keys involve different categories of risks.

Point 2[edit]

(the numbering restarted after the bullets)

From the meeting: Fixed.

as defined in the VM Image Catalogue document.
Is the document defined somewhere else? I would add the definition in the glossary.

From the meeting: The document is being worked on by the HEPIX Virtualisation Working Group, and is used here as a reference for good practice. A pointer to the final version of the document should be added as it becomes available.

Point 4[edit]

You must remove images from the approved list whenever a problem is found, e.g. a new security update is required. This removal must also be recorded locally in your auditable history.
Should it be "list of endorsed images" written in point 8?

From the meeting: Fixed.

Comments from Riccardo Brunetti (29 June 2011)[edit]

Definitions and VM types[edit]

Globally Unique Identifier[edit]

Is this supposed to be available in some public repository? If yes where? There could be also different kind of images. We could have, for example, images for simple data block devices. Are they supposed to be flagged by a different GUI?

VM Operator[edit]

In my opinion, if the VM Operator is responsible for the patching, vuln. management and logging capability of a VM, it can never be different from the Endorser. I can't see any reason to place those requirements on the head of the Operator when we are defining the figure of an Endorser who is responsible " that a particular VM image has been produced according to the requirements of this policy and states that the image can be trusted."

In fact those responsibilities are correctly defined in the Policy Requirements on the Endorser

Policy Requirements on the Endorser[edit]

Bullet 5-6[edit]

I still can't understand these two points. Do we require the VM images to be world readable or we are saying that the endorser should consider that the image can be inspected? If this is a requirement, why? and also who should be supposed to dig into them. What if some VM image contains some copyrighted software or some procedure that the VO wants to keep reserved? Why we should loose those potential use case as long as the image is not causing harm to the grid? Moreover, it is always possible to instantiate a VM which has this requirement and then read protect the filesystem after accessing it as administrator.

I already commented on bullet 6, this would exclude the possibility to have VM for services that require certificates.


Comments from Andrea Chierici (19 July 2011)[edit]

Just some trivial corrections:

  • In several "Use case classification" sub-paragraphs you use numbers, but paragraphs are not numbered at all

Romain: Fixed.

  • last line of "Endorser: Third party, VM operator: Third Party", "...Endorser and is may decide..."

Fixed.

Now for a request of clarification: In "Policy Requirements on the VM Operator" in point 1 you say: "You are responsible to fulfil all the operational security and incident response requirements expressed in other policies"

This sound too generic to me. Which other policies are we talking about? Local? Community? EGI? Where can I get those policies?

Riccardo Brunetti: The main policies we are referring to are those adopted inside the EGI community, concerning the site operations rules and procedures and the VOs management. You can find most of them on our NGI_IT web portal

http://www.italiangrid.org/grid_operations/grid-security/policies http://www.italiangrid.org/grid_operations/grid-security/operational-procedures

and all of them on the SPG web site

https://wiki.egi.eu/wiki/SPG:Documents.

Obviously, the local policies are also important and should be taken into account too. I think that the general attitude of the SPG policies has always been to define some common rules that can be a complement of those that might already exist.


Comments from Steven Newhouse (2nd August 2011)[edit]

  1. Endorser Definition: I see endorsers being appointed by the the infrastructure in addition to VOs and Sites (resource centres now...)

Romain: "infrastructure" added to the definition.

  1. Third Party Definition: This is used in the document but not defined.

Romain: "Definition added.

  1. In "Endorser: Third party, VM operator: resource centre" reference is made to section 3.3 which is not clear where this is.

Romain: Text deleted.

  1. In Policy Requirements point 4 - what is the Image Catalogue document? Where is it defined?

Romain: Reference deleted.

  1. In point 5: The phrase "full list of OS/packages/versions in VM Base Image and VO Environment" seems to be tied to a particular OS types and VM usage/preparation model. It is not clear to me what is the policy requirement that needs to be conveyed here?

Romain: Phrasing modified and made more general.

Comments from Jules Wolfrat (31 August 2011)[edit]

  1. I have a problem with the different use cases. I think that it makes a difference who is taking the two roles, so I don't think that the same rules can apply independent of the use case. For instance a resource center can be a legal body and already can have agreements with NGIs, etc. about their obligations. In another way expressed, should you not require from the VM operator in case of a third party that it will accept the same agreements and policies as a resource center? Then I can agree for instance that a team will be responsible for the VM operator role. It is in the first place the user that must trust the VM Operator.

Romain: Clarified in the text, and through discussions in the meeting. It is important to emphasise that other policy documents apply.

  1. Rule 8 for the endorser about the requirement for a security vulnerability assessment process in place: what requirements are there for this process? There should be some minimum requirements for this.

Romain: Clarified in the text, by replacing "assessment" with "patching".

Comments from Riccardo Brunetti (22 September 2011)[edit]

The first one concerns the introduction: "This policy does not compel ....". The point here is that it seems to be important to have this kind of statement in this document, in order to have it accepted by the community, but somehow it poses severe limit on the usefulness of it. Put in other words: if a site can instantiate VMs not compelling to this policy, what's this policy for? SPG: Introduction has been reworded.

The second point is related to bullet 4 of the endorser requirements: "you must provide and maintain a list of endorsed VM images": The document should contain at least a reference to some guidelines that must be followed to produce and maintain that list, or state that the details about VM image lists, metadata and so on are outside the scope of this document and will need to be specified elsewhere. SPG: Added " The related guidelines will be made available by the infrastucture organisation."

The third one concerns the bullet 8: "You are responsible for handling all problems related to the distribution....": Can this be practically said with regards to the specific laws and restrictions of the different nations?. Who is responsible for what is something that might depend on the specific law of the participants to the Grid. We can place here some responsibilities on the endorser, but are we sure that this is the real-life situation? Somewhere the law could say that the responsibility is anyway on the operator. In any case, the acceptance and operation by a resource centre of even an officially-endorsed VM image is subject to applicable local laws and policies. It is also explicitly recognized that a resource centre (or VM operator) may at any time revoke the possibility to operate a VM image. SPG: Responsibilities are a complex areas. It is important that endorsers consider their responsibilities. "problems" was replaced by "issues" for improved clarity.

Comments from the Life Sciences Community (12 September 2011)[edit]

What is most apparent is the lack of explanation of "why" this policy is like it is currently.

I find it difficult to "accept" a lot of the rules (read: restrictions) that are posed, as they put unnecessary burden on Endorsers, especially for the smaller communities. If you compare this with policies like that of commercial offerings it is not very appealing.

Of course I understand where this document comes from, but I don't think it is in broad interest of EGI.

SPG: Amazon EULA are complex (see: https://aws.amazon.com/agreement/). The introduction of the document has been reworded to clarify the intent of the document.

Old (HEPiX) policy text[edit]

Policy on the Endorsement of Virtual Machine Images[edit]

Introduction[edit]

This document describes the security-related policy requirements for the generation and endorsement of trusted virtual machine (VM) images for use on the Grid.

The aim is to enable Grid Sites to trust and instantiate endorsed VM images that have been generated elsewhere.

The virtualisation model addressed here is the use of virtual Grid worker nodes that act in a similar way to real worker nodes. Virtualisation provides an efficient way of managing different configurations of worker node, e.g. the operating system used, and importantly different pre-configured application environments for the VOs. The model addressed here, therefore, simply provides a different way of running authorized VO work, transparent to the end user, exactly the same as if the user payload was running on a real worker node. There should be no need to place more restrictions on virtual worker nodes running endorsed images as defined by this policy, than on real worker nodes in terms of access to trusted local services at the site.

This policy does not compel Sites to instantiate images endorsed in accordance with this policy nor limit the rights of a Site to decide to instantiate a VM image generated by any other non-compliant procedures, should they so desire. The Site is still bound by all applicable Grid security policies and is required to consider the security implications of such an action on other Grid participants.

Definitions[edit]

The following terms are defined.

  • VM base image: A VM image, including a complete operating system and all general

middleware, libraries, compilers, programmes and utilities. All kernel and root-level configurations, including any that may be VO-specific, are included here.

  • VO environment: The VO-specific middleware, application software, libraries, utilities, data

and configuration which may be necessary to provide the appropriate environment for use by members of a VO. No kernel modifications or root-level configurations are included here.

  • VM complete image: The VM image resulting from the combination of the VM base image and

the VO environment (if any).

  • Globally Unique Identifier: A unique identifier for a VM complete image.
  • Endorser: An individual who confirms that a particular VM complete image has been produced

according to the requirements of this policy and states that the image can be trusted.

Policy Requirements[edit]

An Endorser should be one of a limited number of authorised and trusted individuals appointed either by a VO or a Site. The appointing VO or Site must assume responsibility for the actions of the Endorser and must ensure that he/she is aware of the requirements of this policy.

Policy Requirements on the Endorser[edit]

By acting as an Endorser you agree to the conditions laid down in this document and other referenced documents, which may be revised from time to time.

  1. You are held responsible by the Grid and by the Sites for checking and confirming that a VM complete image has been produced according to the requirements of this policy and that there is no known reason, security-related or otherwise, why it should not be trusted.
  2. You recognise that VM base images, VO environments and VM complete images, must be generated according to current best practice, the details of which may be documented elsewhere by the Grid.
    These include but are not limited to:
  • any image generation tool used must be fully patched and up to date;
  • all operating system security patches must be applied to all images and be up to date;
  • images are assumed to be world-readable and as such must not contain any confidential information;
  • there should be no installed accounts, host/service certificates, ssh keys or user credentials of any form in an image;
  • images must be configured such that they do not prevent Sites from meeting the finegrained monitoring and control requirements defined in the Grid Security Traceability and Logging policy to allow for security incident response;
  • the image must not prevent Sites from implementing local authorisation and/or policy decisions, e.g. blocking the running of Grid work for a particular user.
  1. You must disclose to the Grid or to any Site on request the procedures and practices you use for checking and endorsing images.
  2. You must provide and maintain an up to date digitally signed list of your currently endorsed images together with the metadata relating to each VM image, as defined in the VM Image Catalogue document.
  3. You must keep an auditable history of every image endorsed including the Globally Unique Identifier, date/time of generation and full list of OS/packages/versions in both the VM Base Image and VO Environment. This must be made available to sites on demand.
  4. You must remove images from the approved list whenever a problem is found, e.g. a new security update is required. This removal must also be recorded locally in your auditable history.
  5. You are responsible for handling all problems related to the inclusion of any licensed software in a VM image. You shall ensure that any software included in a VM image which is used for its intended purposes, complies with applicable license conditions and you shall hold the Site running the image free and harmless from any liability with respect thereto.
  6. You must assist the Grid in security incident response and must have a security vulnerability assessment process in place.
  7. You recognise that the Grid, the Sites, and/or the VOs reserve the right to block any endorsed image or terminate any instance of a virtual machine and associated user workload for administrative, operational or security reasons.
  8. You recognise that if a Site runs an image which no longer appears on your list of endorsed images, that you are not responsible for any consequences of this beyond the time of your removal of the image from the list