Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI CSIRT:Security challenges"

From EGIWiki
Jump to navigation Jump to search
(Deprecate and redirect page)
Tag: Replaced
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{Egi-csirt-header}}
{{New-Egi-csirt-header}}
 
{{DeprecatedAndMovedTo|new_location=https://confluence.egi.eu/display/EGIBG/Security+challenges}}
= Security challenges: what is it about ? =
 
The goals of the security drills are:
* to investigate whether sufficient information is available to be able conduct an audit trace as part of an incident response, and to ensure that appropriate communications channels are available.
* to assess the incident response capabilities of the involved security teams.
* to evaluate the efficiency of the various incident response operations aiming at containment.
* trigger and improve the collaboration of the full incident response chain, involving security teams from the RCs, NGIs, EGI, VOs and CAs.
 
 
== Scenario: Stolen Credentials ==
A common problem in distributed environments is that user credentials get compromised resulting in illicit usage of resources.
 
This might happen as a result of brute force attacks on weak passwords, lost/stolen hardware, phishing, or following an earlier incident where this data was harvested by the attacker.
In addition, in the Cloud environment, we rather often see that users choose insecure (default) configuration for services they install or introduce other vulnerabilities which are then quickly exploited by automated attacks constantly targeting all systems connected to the internet.
 
Stolen or brute forced (ssh) credentials in distributed environments carry the additional risk that such incidents can spread rapidly, affecting multiple resource centres in multiple countries. Therefore proper access management is crucial in incident response.
In the EGI infrastructure access to resources is usually controlled based on x509 certificates.
 
x509 access management can happen on different levels, each action has a certain delay until it takes effect and a certain scope.
* Resource Center / Service level, immediately, bans the user at the RC/Service
* Suspend DN at VOMS, up to 1 week, already issued voms-proxies remain valid, no new proxies will be issued. Scope VO wide, certificate could also be used within other VOs.
* CA revokes certificate, takes effect when the new CRLs are loaded to the services, up to 48 hours, globally. Certificate will not be accepted at an service.
* The FedCloud user management may not be fully integrated in the central suspension and therefore requires some manual intervention of the RC admins to make sure that the DN in question can not access the interfaces to start/stop/delete VMs.
 
Since suspending at the RC service level is immediately effective it is crucial that the RC security teams, as well as the VO security teams, managing the access to their resources are trained to suspend a reported malicious certififcate DN on all of there systems, to stop all running processes related to that DN, and to trace back a IP/VM to the controlling DN.
 
At the same time the state of the VM in question should be preserved for later investigation and further access to it suspended.
 
== Security challenges: what is expected from sites ? ==
 
=== What is important to bear in mind ? ===
 
The sites contacted for a challenge are asked to follow the normal security incident response procedure, and react as if the incident was real, with the two following exceptions:
<pre>
      1. No sanctions must be applied against the Virtual
        Organization (VO) that was used to submit the job / start the VM.
        In case of of
 
      2. All "multi-destination" alerts must be addressed to
        the e-mail list which has been designated for the test:
 
                    abuse(at)egi.eu
 
        for Security Service Challenges. Instead, insert the
        originally intended "multi-destination" address(es) in
        the body of your message.
        Make sure to have the string:
                   
                    [SSC]
 
        in the subject of the message.
</pre>
 
== Scope of the SSC / Information to be gathered at the sites ==
In this challenge the following basic Incident Response activities will get evaluated:
* Communications:Provide in time information to be used in Incident Response
* Containment:
** Suspend DNs from accessing, starting, deleting a VM
** Snapshot a live VM associated to a reported IP, including its memory
* Traceability:
**  IP based, given a time-stamp and an IP, find a DN using a VM under the IP in question.
**  DN based: given a DN, find the IPs associated to VMs running under the DN in question
* Forensics
** Network connections of IP in time range X
 
=== For an initial response and first directions try to find answers to the following questions ===
 
*NETWORK:
- Is network monitoring data (e.g. netflows) available?
- Are there any other suspicious connections open to/from a reported IP or a VM running under a reported DN?
  If so, to which IPs?
- What are the DN owning the VMs associated to the reported IP?
 
*CONTAINMENT:
- From where (IPs) was the VM created?
- From where (IPs) did logins happen to the VM.
- To which VO is the user/certificate affiliated?
- Which grid-certificates (DN) are involved in this test-incident?
    # Example: DN-1: CN=John Doe, O=<SomeInstitute>,O=<Something>, ..."
- Since when were the VM running?
# Example: YYYY:MM:DD hh:mm
Date:
 
The sites should provide the security teams asap with this information at the latest within one working day.
The time needed to pass this information to EGI-CSIRT  by replying to the alarm mail will be measured and evaluated.
 
== What is the normal security incident response procedure? ==
 
This exercise will also test the current [https://wiki.egi.eu/wiki/SEC01 Incident Response Procedure], and here in particular [https://wiki.egi.eu/wiki/SEC01#Incident_Analysis_Guideline step 5], which covers the information collected for the coordinated incident response.
 
Please try to follow this procedure where possible, and note/report any problems with it
 
<pre>     
          PLEASE REMEMBER THAT FOR THE CHALLENGE
          THE PROCEDURE IS APPLIED WITH RESTRICTIONS
          AS STATED IN THE PREVIOUS SECTION.
          For questions please contact: fedcloud-ssc(at)mailman.egi.eu
</pre>
 
More informations about EGI security procedures ( flowchart, formal document, forensic howto ... ) can be found here : https://wiki.egi.eu/wiki/EGI_CSIRT:Policies
 
Please also visit our [[Forensic Howto]] wiki pages. If you want to contribute, just send your input to egi-csirt-team(at)mailman.egi.eu.
 
== Evaluation - Report generation ==
 
We distinguish  between
 
1) Measurable per site operations (with target times):
#initial feedback: 4h
#found malicious job/processes/stop them: 4h
#ban problematic certificate: 4h
#contain the malicious binary and sent it to the incident-coordinator: 24h
 
These will be measured by the ssc-monitor and the scores the sites get are
calculated according to the formula stated on the wiki  page.
Times are relative to the alarm to the site, we try to make sure that the
alarms will be send during office-hours (09:00 - 18:00, local time).
 
== Participating sites ==
 
Currently the following sites can be used for the SSC
# Format GOC-Name  NGI-NAME VO=FedCloud)
BEgrid-BELNET
CESNET-MetaCloud
CYFRONET-CLOUD
FZJ
IISAS-FedCloud
IN2P3-IRES
INFN-CATANIA-STACK
INFN-PADOVA-STACK
RECAS-BARI
TR-FC1-ULAKBIM
 
== Post processing, clean up ==
 
As part of the incident handling, Grid authorizations may have been withdrawn from the DN that was used to submit the job. When the incident response procedure is complete, the test operator will explicitly request restoration of any such authorizations to their original state.
 
== FedCloud-SSC Evaluation Form ==
[[File:Fedcloud-report-table.png|800px]]
 
= De-briefing =
 
When the challenge has been completed on a representative number of Sites, the test operator will ask for de-briefing input from the participating Sites. Material submitted will be used to edit a report. The report will be circulated to the contributors for comments before being presented to the EGI-CSIRT.

Latest revision as of 18:05, 14 February 2023