EGI CSIRT:Incident reporting

From EGIWiki
Revision as of 12:36, 28 February 2012 by Ndias (talk | contribs) (Follow-up message)
Jump to: navigation, search


| Mission | Members | Contacts
| Incident handling | Alerts | Monitoring | Security challenges | Procedures | Dissemination



How to report a security incident

Please following the EGI incident response procedure to report a security incident to abuse at egi.eu. Below you will find some explanations about that incident response procedure.


Initial HEADS-UP message

This template is aimed at notifying the grid participants soon after the incident has been discovered (heads-up), as described in Step 2 of the incident response procedure.

FROM: <you>
TO: <site-security-contacts@mailman.egi.eu/abuse@egi.eu>
SUBJECT: Security incident suspected at <site> [EGI-<DATE>] TLP: AMBER
** AMBER Information – Limited Distribution                        **
** This may be shared with trusted security teams on a need-to-know basis **
** see https://wiki.egi.eu/wiki/EGI_CSIRT:TLP for distribution restrictions **
Dear security contacts,
A suspected security incident has been detected at <site>.
Summary of the information available so far:
<Ex: A malicious SSH connection was detected from 012.012.012.012. The extent of the incident is
unclear for now, and more information will be published in the coming hours as forensics are
progressing at our site. However, all sites should check for successful SSH connection from
012.012.012.012 as a precautionary measure.>

Follow-up message

This template can be used to provide a detailed view of the incident, and may be completed and reposted as the investigation progresses.

FROM: <you>
TO: <site-security-contacts@mailman.egi.eu/abuse@egi.eu>
SUBJECT: Security incident suspected at <site> [EGI-<DATE>] TLP:AMBER
** AMBER Information – Limited Distribution                        **
** This may be shared with trusted security teams on a need-to-know basis **
** see https://wiki.egi.eu/wiki/EGI_CSIRT:TLP for distribution restrictions **

Dear security contacts,

A security incident has been detected at <site>.

- Short summary of the incident
<Provide a high-level overview of the incident>

- Host(s) affected
<List of compromised hosts and/or hosts running suspicious user code.
ex: grid-worker-node-124.mysite.org (123.123.123.123)>

- Host(s) used as a local entry point to the site (ex: UI or WMS IP address)
<The host that the attacker is likely to have used to access the site.
ex: grid-ui-101.mysite.org (123.123.123.124)>

- Remote IP address(es) of the attacker
<The remote host from where the attacker is likely to have connected from.
ex: 123.adsl.somecorp.com (012.012.012.012)>

- Evidence of the compromise, including timestamps (ex: suspicious files or log entry)
<Ex: the attacker logged in has root from 123.adsl.somecorp.com. Times are UTC:
Mar 24 12:00:09 grid-ui-101 sshd[13896]: Accepted password for root from 012.012.012.012>

- What was lost, details of the attack
<Provide available details on the extent of the compromise. Ex:
System logs revealed the attacker guessed the root password of grid-ui-101 on Mar 24 12:00:09
(UTC) after hundreds of attempts. Then, the attacker [...] etc.>

- If available and relevant, the list of other sites possibly affected
<Ex: firewall logs reveal suspicious SSH connections from the compromised node to grid-
ui.friendlysite.org on Mar 24 13:01:03 (UTC). friendlysite.org has been contacted.>

- Possible vulnerabilities exploited by the attacker
<Ex: the attacker exploited a weak root password and gained further access by exploiting CVE-2009-
1234 against [...] etc.>

- Actions taken to resolve the incident
<Ex: Disk images have been saved, hosts have been reinstalled from scratch with new, strong root
passwords, and SSH has been configured to prevent "root" logins with password.>

- Recommendations for other sites, actions suggested
<Ex: Sites should check and report any successful SSH connection from grid-ui-101 between Mar 24
12:00:09 (UTC) and Mar 24 17:00:00 (UTC).
It is also recommended to avoid direct SSH access, and to configure sshd with "PermitRootLogin
without-password".>

- Timeline of the incident
<Ex:
2009-03-24 09:12:43 UTC Multiple SSH connection attempts from 12.012.012.012
2009-03-24 12:00:09 UTC Attacker connects as root on grid-ui-101.mysite.org from 012.012.012.012
2009-03-24 13:01:03 UTC SSH scan from grid-ui-101 against grid-ui.friendlysite.org
[...]
2009-03-24 15:00:00 UTC Site security team investigating
2009-03-24 15:34:00 UTC EGI security contacts informed [...]>

About the EGI security incident handling procedure

EGI-CSIRT developed the EGI security incident handling procedure . The document have been approved by EGI OMB and PMB. EGI sites must follow this procedure when handling security incident.

The "Security Incident Handling Procedure" define site and incident coordinator responsibilities when handling Grid-related security incident. We strongly encourage our security contacts and system administrators to have a printing copy of this procedure.

    Site Incident Response checklist: [[1]]
    Incident Response Flowchart: [[2]]

Below is a summary of action site should take when handling a security incident:

1. Immediately inform your local security team, your NGI Security Officer and the EGI CSIRT via abuse at egi.eu This step MUST be completed within 4 hours after the suspected incident has been discovered. You are encouraged to use the templates listed below.

2. Do NOT reboot or power off the host. In case no support is shortly available, whenever feasible and, if admitted by your local security procedure and if you are sufficiently familiar with the host/service to take responsibility for this action, try to contain the incident. For instance by unplugging all connections (network, storage, etc) to the host. Please note down carefully what actions you take with a timestamp; that would be very important for later analysis as well as if the incident ends up in a legal case. This step SHOULD be completed as soon as possible, and MUST be completed within one working day after the suspected incident has been discovered.

3. Confirm the incident, with assistance from your local security team and the EGI CSIRT.

4. If applicable, announce downtime for the affected hosts in accordance with the EGI operational procedures, with “Security operations in progress” as the reason. If applicable, this step MUST be completed within one working day after the suspected incident has been discovered.

5. Perform appropriate analysis and take necessary corrective actions as per Appendix A. The objective is to understand the source and the cause of the incident, the affected credentials and services, and the possible implications for the infrastructure. Throughout step 5, requests from the EGI CSIRT MUST be followed-up within 4 hours.

6. Coordinate with your local security team and the EGI CSIRT to send an incident closure report within 1 month following the incident to all the sites via site-securitycontacts at mailman.egi.eu, including lessons learnt and resolution. This report should be labelled AMBER or higher, according to the Traffic Light Protocol.

7. Restore the service and, if needed, update the service documentation and procedures to prevent recurrence as necessary.