Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI CSIRT:Critical Vulnerability Handling"

From EGIWiki
Jump to navigation Jump to search
(Created page with '{{Egi-csirt-header}} {{Template: Op menubar}} {{Template:Doc_menubar}} Category:Procedures Category:Security __TOC__ {| border="1" |- | '''Title''' | EGI-CSIRT Critical …')
 
Line 1: Line 1:
{{Egi-csirt-header}}
{{Egi-csirt-header}}
==Operations Links==


{{Template: Op menubar}}
{{Template: Op menubar}}

Revision as of 15:01, 13 April 2011


| Mission | Members | Contacts
| Incident handling | Alerts | Monitoring | Security challenges | Procedures | Dissemination



Operations Links

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators
Title EGI-CSIRT Critical Vulnerability Handling
Version 1.0
Document link https://wiki.egi.eu/wiki/SEC03
Last modified 16:16, 16 March 2011 (UTC)
Policy Group Acronym OMB, EGI-CSIRT
Policy Group Name Operations Management Board, EGI Computer Security Incident Response Team
Contact Person L. Cornwall, M. Ma
Document Status APPROVED
Approved Date March 15 2011
Procedure Statement After a problem has been assessed as critical, and a solution is available then sites are required to take action. This document primarily defines the procedure from this time, where sites are asked to take action, and what steps are taken if they do not respond or do not take action.

If a site fails to take action, this may lead to site suspension.


EGI-CSIRT Critical Vulnerability Handling

In order to prevent incidents, it is important to ensure that operational action is taken in a timely manner when a security problem has been found and a solution identified. A critical security problem is one where it is considered that urgent action needs to be taken, in order for both individual sites and the infrastructure as a whole to be secure. The most common type of critical security problem is where a software vulnerability has been found, and assessed as ‘critical’. After a problem has been assessed as critical, and a solution is available then sites are required to take action. This document primarily defines the procedure from this time, where sites are asked to take action, and what steps are taken if they do not respond or do not take action. If a site fails to take action, this may lead to site suspension by removing the site from the resource information system as defined in appropriate policy documents.

Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. "SHOULD", "SHOULD NOT", are to be interpreted as in the EU GridPMA IGTF as “do it this way or explain to the group why you are not doing it this way and gain mutual agreement”. In this case the explanation to the group should be to other members of the EGI CSIRT and to others involved, such as specific sites.

Intended Audience

This document is intended for grid site security contacts and site administrators in order to inform them what to expect and what is expected of them when they are asked to carry out an action to deal with a critical security problem. It is also intended to ensure that CSIRT members and the Security Officer on duty know what is expected of them. This document is also intended for Operations Management in order to give them visibility of the process, so that they may approve it.

Contact Points

  • site-security-contacts at mailman.egi.eu: this address reaches the security contacts at all grid sites. The mailing list is automatically populated from GOCDB.
  • ngi-security-contacts at mailman.egi.eu: this is the security contacts list of all Resource Infrastructure Providers – again automatically populated from the GOCDB.
  • noc-managers at mailman.egi.eu This is how to contact the Resource Infrastructure Providers Operations Mangers – This is more commonly known as the Operations Management Board and in most places in this document abbreviated to the OMB.

Introduction

The procedure for handling of incidents by the EGI Computer Security Incident Response Team (CSIRT) is described in the EGI Incident handling procedure. The procedure for handling software vulnerabilities (particularly in Grid Middleware which is part of the EGI UMD) by the EGI Software Vulnerability Group (SVG) is described in the Vulnerability issue handling process. These two procedures have been approved together as Milestone MS405. This describes the procedure for ensuring that any critical security vulnerability is addressed in a timely manner. The most common case is likely to result from a software vulnerability assessed as critical, either in Grid Middleware or other software. Other critical vulnerabilities may occur, such as a configuration problem which the EGI CSIRT team assesses as critical. This procedure is also aimed to give sites adequate warning before removing from the EGI resource information system if they fail to make their sites secure. The procedure is written to be agnostic relative to the systems used to establish site status, find information on sites, or track progress. This document focuses on affects on sites and CSIRT interaction with sites. Details of actual systems and what to do are maintained on the CSIRT private Wiki. More details of the procedure carried out by CSIRT including work carried out prior to establishing what action(s) should be taken by CSIRT is available in the [https:// documents.egi.eu/public/ShowDocument?docid=282 EGI Critical Vulnerability Handling procedure].

CSIRT actions and responsibilities

New critical vulnerability operational problem

Note that actions that fall outside of working hours [R6] are on a ‘best efforts’ basis. CSIRT actions may be carried out by the EGI CSIRT Security Officer on Duty, or by any member of the CSIRT team as agreed within the team.

Heads up issued

When a critical vulnerability has been identified, EGI CSIRT MAY send a ‘Heads Up’ to sites. This is to inform sites of the problem and that urgent action may be requested in the coming hours or days. This is OPTIONAL.

If sent, it SHOULD refer to any public information on the vulnerability, and why it is a problem in the EGI environment.

Find solution to problem

EGI CSIRT MUST define what actions should be taken. This may be to install new versions of software which does not contain the vulnerability or make a configuration change that mitigates or removes the vulnerability. In exceptional circumstances when no solution is found in a reasonable timescale this may be to suspend or stop running certain software or services. If such action results in severe service interruption, an explicit authorization from the Operations Management Board (OMB) (noc-managers at mailman.egi.eu) must be obtained prior to recommending that such action is taken.

Send advisory with 7 day deadline

The EGI-CSIRT Security-Officer-On-Duty MUST send an advisory, this advisory MUST state what action is to be taken by sites in order to eliminate the critical security problem. The covering letter MUST include the deadline for carrying out the action. The deadline MUST be at least 7 calendar days after the advisory is issued. If the deadline falls on a Friday, weekend, or common public holiday the deadline SHOULD be set to the first working day after allowing 7 days. It MUST also be clear that if sites do not carry out this action, and do not respond, then site suspension is likely. It should be clear that CSIRT will help if necessary, and that if sites do not understand what to do or need any help they MUST seek help either by contacting EGI-CSIRT or appropriate more local support.

For widespread problems, the letter and advisory MUST be sent to site-security-contacts at mailman.egi.eu and ngi-security-contacts at mailman.egi.eu and copied to noc-managers at mailman.egi.eu.

For problems which only affect a small number of individual sites, the sites can be informed individually of the problem.

3 days before the deadline

For each site that is still vulnerable, at least 3 calendar days before the deadline, CSIRT MUST open a ticket in the EGI tracker. The ticket may be open 4 or 5 days before the deadline if this falls on a weekend or public holiday. This MUST be sent to the site administrator, site security contact, and NGI security contact.

Tickets MUST include information that all sites might need to handle the issue. The Security-Officer-On-Duty then might add dedicated information for sites needing additional information during the follow up.

The Tickets MUST state that failure to act to resolve this problem will lead to site suspension shortly after the deadline.

For each vulnerable site

EGI CSIRT MUST check daily whether the site has acted. If it has, CSIRT SHOULD close the ticket and include ‘Thank you for addressing this problem’. If not, CSIRT MUST send a reminder.

24 hours before deadline

EGI CSIRT SHOULD produce a list of which sites have not updated, and send this to noc-managers at mailman.egi.eu. The site will also receive a final warning as the first step of the site suspension process.

If all sites have updated, CSIRT SHOULD inform noc-managers at mailman.egi.eu , and there is no need for further action.

Respond to any request and communicate with sites

CSIRT MUST respond to any request for more information or help from sites, and do all they can to help individual sites remove the critical security problem. Alternative solutions other than that may be agreed between the site and CSIRT (see Appendix I).

Re-introduction of critical security problem

Sometimes an ‘old’ critical security problem may apparently re-appear from the monitoring of site security. Sometimes a site that does not register as vulnerable (see see definition) may appear vulnerable later in the run up to the 7 day deadline. It is not expected that this will be a widespread problem. In both these cases, sites are given a minimum of 48 hours act and resolve the problem before the site is suspended. A similar procedure may also be used to deal with any other serious security problem at an individual site.

CSIRT must open a ticket in EGI tracker

CSIRT MUST open a ticket in the EGI tracker. This MUST be sent to the site administrator, site security contact, and NGI security contact. This ticket MUST include a reference to the previous advisory, SHOULD have the previous advisory appended, and allow the site a minimum of 48 hours to act and resolve the problem. Longer may be given to take account of a weekend or public holiday.

24 hours before deadline

After 24 hours, check whether the site has acted. If it has, CSIRT SHOULD close the ticket and include ‘Thank you for addressing this problem’. If not, CSIRT MUST send a reminder. If the site has not acted CSIRT SHOULD inform the OMB noc-managers at mailman.egi.eu of the problem. The site will also receive a final warning as the first step of the suspension process.

Respond to any requests communication with sites

CSIRT MUST respond to any request for more information or help from sites, and do all they can to help individual sites remove the critical security problem. Alternative solutions other than that may be agreed between the site and CSIRT (see Appendix I).

Procedure for site suspension

In the case where sites fail to act on a critical security problem the site suspension procedure MAY be invoked at the discretion of EGI CSIRT. The EGI Grid Site Operations Policy allows sites to be suspended by removing the site from the resource information system. Normally this procedure will be invoked in parallel to the steps defined here to handle a critical security problem. In extreme circumstances (e.g. where an individual site has behaved in a reckless manner) it may be invoked independently.

24 hours before Site Suspension

The following 2 steps MUST be carried out at approximately the same time, but may be an hour or two apart. They MUST be carried out at least 24 hours before site suspension is carried out.

Final reminder

The EGI CSIRT MUST notify the affected site's security contact and NGI security officers that unless they carry out the recommended action by the deadline the site will be suspended. Clearly state that failure to comply with the recommendations/advisories sent earlier will lead to site suspension. State that this is a final reminder. It MUST be made clear that if there is a problem carrying out the recommended action CSIRT will try and help to find a solution. Attempts MUST be made to find a solution with the site if at all possible, and site suspension should only be invoked in the case of no response or failure to find a way to prevent it.

Inform the Operations Management Board (OMB)

CSIRT MUST send a summary to the OMB (noc-managers@mailman.egi.eu) with the following information:

  • Which steps were taken by EGI-CSIRT.
  • For each site which is still vulnerable
    • The name of the site
    • If the site has simply not responded, state this.
    • If the site has stated why the recommendations could not be followed, include this.
    • Include any relevant information on plans/alternative mitigation for the site.
    • Include whether CSIRT is recommending suspension for the site.
  • State that if the situation does not change EGI CSIRT plans to suspend the site on the deadline.

On the site suspension deadline

If EGI CSIRT takes the decision to suspend a site, EGI CSIRT MUST inform the OMB (noc-managers at mailman.egi.eu) about the decision including which sites are being suspended and why. Send all these sites a notification that site suspension is about to be carried out and why.

Allowing more time for some specific sites

In some circumstances the deadline for specific sites may be delayed to allow more time to carry out suitable actions. This will be at the discretion of CSIRT. CSIRT will always work with the sites to try and prevent site suspension, and find an acceptable solution that can be carried out in the near future so that there is no need for site suspension. Site may also be given more time on the request of the OMB.

Not carrying out site suspension

Site suspension will not be carried out if the majority of the OMB state that it is not to be carried out. Site suspension will not be carried out if the COO states that it is not to be carried out. Note that it is up to the COO and/or the OMB to take action if when they are informed if they do not wish site suspension to happen. Approval of the procedure will imply that site suspension will happen according to the procedure unless the COO and/or the OMB take action to prevent it. Site suspension is seen as a last resort, and if the EGI CSIRT can find a solution with the site then site suspension will not be carried out.

Carry out Site suspension

Site suspension WILL be carried out by the EGI CSIRT co-ordinator or deputy. Site suspension is carried out by changing the status of the site in the GOCDB to ‘suspended’. All sites which are suspended MUST be recorded.

Getting suspended sites back into the infrastructure

If a suspended site has fixed an issue and wishes to be integrated back into the Production Resource of the EGI Infrastructure they should contact their ROD. The ROD will then set the site to ‘Uncertified’ and then the site certification procedure takes place. The ROD MUST request that the EGI CSIRT provides information to the ROD on the suitability of the site to be integrated back into the Production Resource of the EGI Infrastructure. The EGI CSIRT Team will inform the ROD when they have verified that the site has taken appropriate action to address the security issues; and that CSIRT agrees that there is no reason they are aware of to prevent the site being certified.

Sites are referred to the Operational Procedures for NGIS and sites to find further information.

Sites view and responsibilities

Site security is the responsibility of the site

Sites are responsible for their own security. CSIRT and other security groups in EGI exist to help keep sites to be as secure as possible. Sites MUST carry out some actions recommended by CSIRT in order for the site to remain part of the Grid, i.e. the site information being in the resource information system.

Sites will be informed of critical security problems, and given time to act

If a critical security problem has been identified, sites will be informed of what they need to do. Normally initially this will be a general e-mail and advisory given to all sites Sites SHOULD act to eliminate critical security problems as quickly as possible. Sites MUST act before the deadline, which will be at least 7 days away for any new problem. Individual sites will always be given at least 48 hours to act to resolve an issue, if an old problem is re-introduced or an unsafe configuration is found.

Sites should contact CSIRT or their NGI security contact if they cannot carry out the recommended action

If sites do not understand the advisory, or have problems acting on it, sites SHOULD contact CSIRT or their NGI security contact for help. If a site wishes to carry out a different action to that recommended by CSIRT, the site MUST contact CSIRT or their NGI security contact. CSIRT must agree that the alternative action by the site is satisfactory. If a site is unable to act or needs more time, the site MUST contact CSIRT or their NGI security contact and attempt to agree a solution. CSIRT and the NGI security contacts are available to help sites find a satisfactory solution.

Sites will receive at least 3 notifications before site suspension is invoked

New problems

Sites will at least receive an initial 7 day deadline notification and at least 2 further individual reminders including a final warning before site suspension is invoked. Sites will normally receive a reminder 3 days before the deadline and daily until the deadline, if they do not carry out the appropriate action. They will also receive a final notification that site suspension is about to proceed.

Old problems, and unsafe configuration problems

For the re-introduction of old problems, or unsafe site configurations, sites will receive an initial notification giving them at least 48 hours to act to resolve the problem, a reminder 24 hours later, and a final notification that site suspension is about to proceed.

Sites will receive a final warning at least 24 hours before site suspension

If a site fails to carry out appropriate action, sites will receive a warning at least 24 hours before site suspension.

If sites are suspended, they will need to follow the certification procedure to re-join the Grid

Sites are referred to the Operational Procedures for NGIs and sites to find how to get back to the certified state. Sites will need to ensure that they have carried out the appropriate action requested by the EGI CSIRT. If a suspended site wishes to become part of the Grid again they should indicate to their ROD that they want to be part of the Production Resource of the EGI Infrastructure again. The ROD then sets the site to ‘Uncertified’ and then the site certification procedure takes place. The ROD MUST ask the EGI CSIRT to verify that they are satisfied with the action the site has taken, and that CSIRT agrees that there is no reason they are aware of to prevent the site being certified.

OMB view

The OMB will be informed when action is requested to resolve a critical security problem

The notification and advisory, along with the deadline, will be copied to the OMB.

The OMB will be informed at least 24 hours before site suspension

At least 24 hours before the deadline, OMB will be informed of how many sites have failed to carry out the requested action (see case 1 or case 2) and that the site suspension will result if appropriate action is not carried out.

The OMB and/or the COO may take the decision that sites are not suspended

A majority decision of the OMB and/or the COO may overrule site suspension. Note that it is up to the COO and/or the OMB to take action if when they are informed if they do not wish site suspension to happen. Approval of the procedure will imply that site suspension will happen according to the procedure unless the COO and/or the OMB to prevent it.

The OMB will be informed if any sites get suspended

The OMB and the COO will be informed if any sites actually get suspended. This should include the name(s) of sites, the date and time they are suspended, and why.

APPENDIX I. Notes on acceptable actions by sites

This is not extensive, but is intended to provide some guidelines on which types of actions are acceptable to resolve a critical security problem and which are not.

Prior to CSIRT recommendations

Sites may take any reasonable action they wish when a critical security problem has been identified but no action has yet been recommended by the CSIRT team. This may include:

  • Install a version of software which has been re-compiled using own patch
  • Suspending services
  • Closing the site
  • Configuration changes that mitigate the problem

Different sites may be affected in different ways by a critical security problem, depending on their own situation, and are allowed to make their own judgements on what action to take prior to a recommended solution being available.

After CSIRT recommendations

Sites must take action, acceptable actions include:

  • The action recommended by CSIRT
  • For a critical software vulnerability,
    • Update to a version which does not contain the vulnerability
    • Carry out their own re-build with the vendors patch to resolve the vulnerability
    • Carry out other vendor recommended action
  • Action agreed with CSIRT
  • An explanation of why the site is not vulnerable which is satisfactory to CSIRT

Note that if a site thinks they have taken appropriate action yet CSIRT considers them vulnerable they MUST explain why they think they are not vulnerable. A solution MUST be found that is acceptable to CSIRT.

Actions which CSIRT is unlikely to see as acceptable

  • If monitoring suggests the site is vulnerable and the site has acted.
    • Fail to respond.
    • Fail to explain why they do not think their system is vulnerable.
    • Fail to explain what alternative actions have been carried out to deal with the critical security problem.

Revision History