Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI CSIRT:Security challenges"

From EGIWiki
Jump to navigation Jump to search
 
(85 intermediate revisions by 9 users not shown)
Line 3: Line 3:
= Security challenges: what is it about ? =
= Security challenges: what is it about ? =


The goal of security drills, is to investigate whether sufficient information is available to be able conduct an audit trace as part of an incident response, and to ensure that appropriate communications channels are available.
The goals of the security drills are:
* to investigate whether sufficient information is available to be able conduct an audit trace as part of an incident response, and to ensure that appropriate communications channels are available.
* to assess the incident response capabilities of the involved security teams.
* to evaluate the efficiency of the various incident response operations aiming at containment.
* trigger and improve the collaboration of the full incident response chain, involving security teams from the RCs, NGIs, EGI, VOs and CAs.


EGI-CSIRT action on this thematic is at two points:
<BR/> - development of drills framework. They are available for egi sites; this is to help them verify their security maturity.
<BR/> - challenges at egi levels. Some wide level security challenge campaign are organized; this contributes to security at project level.


For further informations, you can contact ssc-monitor(at)zwaan.nikhefhousing.nl .
== Scenario: Stolen Credentials ==
A common problem in distributed environments is that user credentials get compromised resulting in illicit usage of resources.


= Security challenges: what is expecting from sites ? =
This might happen as a result of brute force attacks on weak passwords, lost/stolen hardware, phishing, or following an earlier incident where this data was harvested by the attacker.
In addition, in the Cloud environment, we rather often see that users choose insecure (default) configuration for services they install or introduce other vulnerabilities which are then quickly exploited by automated attacks constantly targeting all systems connected to the internet.


== What is important to bear in mind ? ==
Stolen or brute forced (ssh) credentials in distributed environments carry the additional risk that such incidents can spread rapidly, affecting multiple resource centres in multiple countries. Therefore proper access management is crucial in incident response.
In the EGI infrastructure access to resources is usually controlled based on x509 certificates.
 
x509 access management can happen on different levels, each action has a certain delay until it takes effect and a certain scope.
* Resource Center / Service level, immediately, bans the user at the RC/Service
* Suspend DN at VOMS, up to 1 week, already issued voms-proxies remain valid, no new proxies will be issued. Scope VO wide, certificate could also be used within other VOs.
* CA revokes certificate, takes effect when the new CRLs are loaded to the services, up to 48 hours, globally. Certificate will not be accepted at an service.
* The FedCloud user management may not be fully integrated in the central suspension and therefore requires some manual intervention of the RC admins to make sure that the DN in question can not access the interfaces to start/stop/delete VMs.
 
Since suspending at the RC service level is immediately effective it is crucial that the RC security teams, as well as the VO security teams, managing the access to their resources are trained to suspend a reported malicious certififcate DN on all of there systems, to stop all running processes related to that DN, and to trace back a IP/VM to the controlling DN.
 
At the same time the state of the VM in question should be preserved for later investigation and further access to it suspended.
 
== Security challenges: what is expected from sites ? ==
 
=== Rules ===


The sites contacted for a challenge are asked to follow the normal security incident response procedure, and react as if the incident was real, with the two following exceptions:
The sites contacted for a challenge are asked to follow the normal security incident response procedure, and react as if the incident was real, with the two following exceptions:
<pre>
<pre>
       1. No sanctions must be applied against the Virtual
       1. No sanctions must be applied against the Virtual
         Organization (VO) that was used to submit the job.
         Organization (VO) that was used to submit the job / start the VM.
       


       2. All "multi-destination" alerts must be addressed to
       2. All "multi-destination" alerts must be addressed to
         the e-mail list which has been designated for the test:
         the e-mail list which has been designated for the test:


              ssc-monitor(at)zwaan.nikhefhousing.nl
        DO NOT use:
                     abuse(at)egi.eu
                     abuse(at)egi.eu


Line 31: Line 47:
         originally intended "multi-destination" address(es) in
         originally intended "multi-destination" address(es) in
         the body of your message.
         the body of your message.
        Make sure to have the string:
                   
                    [SSC]
        in the subject of the message.
</pre>
</pre>


== Information to be gathered at the sites ==
== Scope of the SSC / Information to be gathered at the sites ==
For an initial response and first directions answers to the following questions might be useful.
In this challenge the following basic Incident Response activities will get evaluated:
* Communications: Provide in time information to be used in Incident Response
* Containment:
** Suspend DNs from accessing, starting, deleting a VM
** Snapshot a live VM associated to a reported IP, including its memory
* Traceability:
**  IP based, given a time-stamp and an IP, find a DN using a VM under the IP in question.
**  DN based: given a DN, find the IPs associated to VMs running under the DN in question
* Forensics
** Network connections of IP in time range X
 
== For an initial response and first directions try to find answers to the following questions ==


*NETWORK:  
*NETWORK:  
- Are there any other suspicious connections open? If so to which IPs
   
   
  - Is network monitoring data (e.g. netflows) available?
  - Are there any other suspicious connections open to/from a reported IP or jobs running under a reported DN?
  If so, to which IPs?
- What are the DNs associated to the reported IP?


*CONTAINMENT:
*CONTAINMENT:
  - Does the process belong to a batch job or an interactive login?
  - If possible suspend
- From where (IPs) were the jobs submitted?
   
   
  - From where was the login/job submission done?
  - From where (IPs) did l.
   
   
  - In case it is a Grid-Job, the following questions are important:
  - To which VO is the user/certificate affiliated?
    -To which VO is the user/certificate affiliated?
   
   
    - Which grid-certificates (DN) are involved in this test-incident?
- Which grid-certificates (DN) are involved in this test-incident?
     # Example: DN-1: CN=John Doe, O=<SomeInstitute>,O=<Something>, ..."
     # Example: DN-1: CN=John Doe, O=<SomeInstitute>,O=<Something>, ..."
   
   
  - Since when were the jobs running?
  - Since when were the VM running?
  # Example: YYYY:MM:DD hh:mm
  # Example: YYYY:MM:DD hh:mm
  Date:
  Date:


 
The sites should provide the security teams asap with this information at the latest within one working day.
The sites should provide the security teams asap with this information at latest within one working day.
The time needed to pass this information to EGI-CSIRT  by replying to the alarm mail will be measured and evaluated.
The time needed to pass this information to EGI-CSIRT  by replying to the alarm mail will be measured and evaluated.
Replying to the alarm mail will automatically use the above sketched RTIR system.


== What is the normal security incident response procedure? ==
== What is the normal security incident response procedure? ==


This exercise will also test the current [https://wiki.egi.eu/wiki/SEC01 Incident Response Procedure], and here in particular [https://wiki.egi.eu/wiki/SEC01#Incident_Analysis_Guideline step 5], which covers the information collected for the coordinated incident response.


Following is site checklist for normal incident response procedure.
Please try to follow this procedure where possible, and note/report any problems with it
 
<pre>      PLEASE REMIND THAT FOR THE CHALLENGE
          THE PROCEDURE IS APPLIED WITH RESTRICTIONS
          STATED IN THE PREVIOUS SECTION
          In case of doubt please contact: ssc-monitor(at)zwaan.nikhefhousing.nl
</pre>
 
https://wiki.egi.eu/w/images/c/cb/Checklist-screenshot.png


<pre>      PLEASE REMIND THAT FOR THE CHALLENGE
<pre>       
          PLEASE REMEMBER THAT FOR THE CHALLENGE
           THE PROCEDURE IS APPLIED WITH RESTRICTIONS
           THE PROCEDURE IS APPLIED WITH RESTRICTIONS
           STATED IN THE PREVIOUS SECTION
           AS STATED IN THE PREVIOUS SECTION.
           In case of doubt please contact: ssc-monitor(at)zwaan.nikhefhousing.nl
           For questions please contact: ssc(at)mailman.egi.eu
</pre>
</pre>


More informations about EGI security procedures ( flowchart, formal document, forensic howto ... ) can be found here : https://wiki.egi.eu/wiki/EGI_CSIRT:Policies
More informations about EGI security procedures ( flowchart, formal document, forensic howto ... ) can be found here : https://wiki.egi.eu/wiki/EGI_CSIRT:Policies


Please also visit our [[Forensic Howto]] wiki pages. If you want to contribute, just send your input to egi-csirt-team(at)mailman.egi.eu.
Please also visit our [[Forensic Howto]] wiki pages. If you want to contribute, just send your input to irtf(at)mailman.egi.eu.


== Evaluation - Report generation ==
== Scores, Evaluation - Report generation ==


We distinguish  between
We distinguish  between
Line 91: Line 118:
#initial feedback: 4h
#initial feedback: 4h
#found malicious job/processes/stop them: 4h
#found malicious job/processes/stop them: 4h
#ban problematic certificate: 8h
#ban problematic certificate: 4h
#contain the malicious binary and sent it to the incident-coordinator: 24h
#contain the malicious binary and sent it to the incident-coordinator: 24h


These will be measured by the ssc-monitor and the points the sites get are  
These will be measured by the ssc-monitor and the scores the sites get are  
calculated according to the formula stated on the wiki  page.
calculated according to the formula stated on the wiki  page.
Times are relative to the alarm to the site, we try to make sure that the  
Times are relative to the timestamp in the alarm ticket sent to the site, we try to make sure that the  
alarms will be send during office-hours (09:00 - 18:00, local time).
alarms will be send during office-hours (09:00 - 18:00, local time).
The target times might change, will be in the final version on the wiki page.


2) Collaborative investigations:
and 2) per site Forensic operations:
Since we want to achieve cross site communication, and possibly collaboration
#all sites will receive "malware" with unique per site artifacts like UUIDs, URLs, IPs etc  finding them may require more advanced forensic operations, like memory analysis. These findings should be reported to IRTF within the ssc incident ticket as soon as they are found.
on the "malware" forensics the evaluation schema has changed accordingly.
I..e Network forensics are needed, but we don't measure this, since due to the
overall SSC set-up, most of this information should already be available to  
the "more western" sites relative to the initially alarmed sites.


ban/unban of the pilot-job-submitter DN is based on local policies. It will  
The reported artifacts will be extracted from the transactions in the ticket. The available scores will decrease over time.
not be measured, but a statement on the decision, whether to ban/unban the
pilot-job-submitter or not, is expected.
 
= Security challenge: how is it operated ? =
== Participating sites ==
# Format GOC-Name PANDA-Name NGI-NAME VO)
Taiwan-LCG2            ANALY_TAIWAN    APAC atlas
Australia-ATLAS        ANALY_AUSTRALIA APAC atlas
CA-SCINET-T2            ANALY_SCINET    ROC-CA atlas
CA-VICTORIA-WESTGRID-T2 ANALY_VICTORIA-WG1  ROC-CA atlas
TRIUMF-LCG2            ANALY_TRIUMF    ROC-CA atlas
BEIJING-LCG2            ANALY_BEIJING  ROC-CA atlas
CERN-PROD              ANALY_CERN  CERN atlas
CYFRONET-LCG2          ANALY_CYF  PL  atlas
praguelcg2              ANALY_FZU  CZ  atlas
DESY-HH                ANALY_DESY-HH  DE atlas
FZK-LCG2                ANALY_FZK  DE atlas
GoeGrid                ANALY_GOEGRID  DE atlas
HEPHY-UIBK              ANALY_HEPHY-UIBK    DE atlas
TUDresden-ZIH          ANALY_DRESDEN  DE atlas
UAM-LCG2                ANALY_UAM  SPAIN  atlas
pic                    ANALY_PIC  SPAIN  atlas
IFAE                    ANALY_IFAE  SPAIN  atlas
IFIC-LCG2              ANALY_IFIC  SPAIN  atlas
csTCDie                ANALY_CSTCDIE IE atlas
IL-TAU-HEP              ANALY_IL-TAU-HEP IL atlas
TECHNION-HEP            ANALY_TECHNION-HEP  IL atlas
WEIZMANN-LCG2          ANALY_WEIZMANN  IL atlas
INFN-FRASCATI          ANALY_INFN-FRASCATI Italy atlas
INFN-MILANO-ATLASC      ANALY_INFN-MILANO-ATLASC Italy atlas
INFN-ROMA1              ANALY_INFN-ROMA1    Italy atlas
INFN-T1                ANALY_INFN-T1  Italy atlas
NIKHEF-ELPROD          ANALY_NIKHEF-ELPROD NL atlas
SARA-MATRIX            ANALY_SARA  NL atlas
LIP-Coimbra            ANALY_LIP-Coimbra P atlas
LIP-Lisbon              ANALY_LIP-Lisbon    P atlas
NCG-INGRID-PT          ANALY_NCG-INGRID-PT P atlas
ITEP                    ANALY_ITEP RU atlas
JINR-LCG2              ANALY_JINR  RU atlas
RRC-KI                  ANALY_RRC-KI    RU atlas
RU-Protvino-IHEP        ANALY_IHEP  RU atlas
ru-PNPI                ANALY_PNPI  RU atlas
ARC-SITE-SI            ARC-pikolit.ijs.si SI  atlas
ARC-SITE-CH            ARC-ce.lhep.unibe.ch    CH  atlas
ARC-SITE-liu-SE        ARC-arc-ce.smokerings.nsc.liu.se    SE atlas
ARC-SITE-umu-SE        ARC-jeannedarc.hpc2n.umu.se SE  atlas
UKI-SCOTGRID-GLASGOW    ANALY_GLASGOW  UK atlas
UKI-NORTHGRID-LANCS-HEP ANALY_LANCS UK atlas
UKI-SOUTHGRID-CAM-HEP  ANALY_CAM  UK atlas
IN2P3-LPSC              ANALY_LPSC  F  atlas
 
== Tools ==
 
A framework has been developped to automate the operation of EGI security challenges.
 
The release of may 2011 contains: the panda framework for job submission, a prototype of the new EGI-CSIRT ticketing system based on RTIR.
 
The test malware is not intrusive, it does not try to get elevated priviledges.
 
More informations about the framework are given at [[security drills framework]].


== Post processing, clean up ==
== Post processing, clean up ==


As part of the incident handling, Grid authorizations may have been withdrawn from the DN that was used to submit the job. When the incident response procedure is complete, the test operator will explicitly request restoration of any such authorizations to their original state.
As part of the incident handling, Grid authorizations may have been withdrawn from the DN that was used to submit the job. When the incident response procedure is complete, the test operator will explicitly request restoration of any such authorizations to their original state.
== SSC Evaluation Form ==
[[File:Lhcb score table template.png|800px]]


= De-briefing =
= De-briefing =
Line 177: Line 142:
When the challenge has been completed on a representative number of Sites, the test operator will ask for de-briefing input from the participating Sites. Material submitted will be used to edit a report. The report will be circulated to the contributors for comments before being presented to the EGI-CSIRT.
When the challenge has been completed on a representative number of Sites, the test operator will ask for de-briefing input from the participating Sites. Material submitted will be used to edit a report. The report will be circulated to the contributors for comments before being presented to the EGI-CSIRT.


== Feedback ==
== Communication Template Debriefing ==
''' Please all NGI Security Officers participating in SSC5 put your comments here. '''
Dear all,  
Comments from NGI Security Officer as well as from site point of view both kindly welcome. Please indicate:
thank you for your contributions to the SSC-19.03
* what kind of problems you have encountered, what problems sites had,
* ideas to solve mentioned problems (of course if you have, not obligatory field :) ),
This message is about to inform you that the SSC-19.03 is now over. You should receive the site report the next days.
* whether procedure and broadcasted information were clear enough for you and for sites,
* which parts of SSC sites liked and consider useful and which they don't and why?,
* if you think that situation during SSC5 run revealed some weakness of our procedure, please show where,
As a clean-up step we would now ask the challenged sites to restore eventually banned credentials, in particular:
* tips from sites, how to do what, maybe we can build later extend tutorial for dealing with incidents,
* what questions appeared from sites, maybe we can add some more info on wiki pages/templates/procedure to make it even more clear,
     CN=Amelie Caillet, CN=700993, CN=acaillet, OU=Users, OU=Organic Units, DC=cern, DC=ch
* and all other stuff, which you believe can help us improve our work.
 
and
If you see a problem, but someone else has mentioned it, please write it as well, this will show the scope.
 
 
    CN=Cindy Denis, CN=759002, CN=cidenis, OU=Users, OU=Organic Units, DC=cern, DC=ch
=== NGI: XY Security Officer: Name (Template) ===
'''Problems encountered and ideas for solutions:'''
# Problem One - and solution for it
# Problem Two - no idea how to solve it
'''Ideas for improvements:'''
# Let's do this in a different way, such as...
'''Other comments:'''
* comment
 
=== NGI: NL Security Officer: Sven Gabriel ===
 
=== NGI: DE Security Officer: Ursula Epting ===
Problems encountered and ideas for solutions:
 
  1. Roles/levels were mixed up (EGI-CSIRT/NGI/site)- ex. NGI/Site-sec.officers had background knowledge
      if also member of EGI-CSIRT, not easy to decide which knowledge can be used to solve the incident,
      which not. This was communicated in the chat room, but did not reach all people.
      Tasks of NGI-sec.officer for their sites was unclear
 
  2. To many goals of SSC5 - test sites, test RTIR, test chat, test CSIRT-Team. SSC5 consumed a lot of
      manpower at each site, it would be regrettable if evaluation can't be done right, because to many
      things were tested at the same time.
 
  3. Flood of information, many mails arrived three times at my mailbox via different information routes.
 
Ideas for improvements:
 
  1. Clearly state duties, don't provide background knowledge to tested people
 
  2. Isolate tests for sites from the other (CSIRT internal) ones to have meaningful results for the sites.
 
  3. Don't know...
 
Other comments:
 
     * comment
 
=== NGI: PL Security Officer: Adam Smutnicki ===
 
=== NGI: UK Security Officer: MingChao ===
 
=== NGI: GRNET Security Officer: Christos Triantafylldis ===
'''Problems encountered and ideas for solutions:'''
# Investigation ownerships
It appears that whenever someone tried to steal an investigation it got the whole incident and all the investigation. This is not the foreseen reaction. This was solved by adding a new custom-field (Security Officer) to store the responsible security officer for each investigation. This also solved the issue of having 2 people responsible for one investigation i.e. in Italy's case.
# Mail flow
There are many mails that were repeating the same information (from other source). Ideally only the responsible people for each investigation should get these mails while everyone should only get the updates at the incident ticket.
# Single view of the status of all investigation
To ease the investigation follow-up i created a dashboard (https://ssc-rt.nikhef.nl/Dashboards/365/Current%20investigations) to have an overview of the current situation. It would be nice if such views could be created in a less manual way
 
'''Ideas for improvements:'''
# It would be nice to be able to communicate information to all involved contacts but also keep information at a central point for EGI CSIRT needs. I would propose to use the incident ticket for this were replies should go to every contact (like broadcast but only to sites/services that are involved) and comments to store the internal information that EGI CSIRT has before releasing them


'''Other comments:'''
* I think this time we have achieved the target of having each person with one role in the whole procedure (with exception of Leif and Ursula who also had the site hat). In future i think we should also distinguish the infrastructure that is used (i.e. it appears like our RTIR, the main communication channel, was co-hosted with the intruder)


=== NGI: IE Security Officer: David O'Callaghan ===
Remark: Not sure if it is worth to also ask them to clean up the worker nodes where the payload ran. This should have happened meanwhile automatically, the sites where we saw long living bots  got informed to kill them (in May ;-))

Latest revision as of 02:21, 9 July 2019


| Mission | Members | Contacts
| Incident handling | Alerts | Monitoring | Security challenges | Procedures | Dissemination



Security challenges: what is it about ?

The goals of the security drills are:

  • to investigate whether sufficient information is available to be able conduct an audit trace as part of an incident response, and to ensure that appropriate communications channels are available.
  • to assess the incident response capabilities of the involved security teams.
  • to evaluate the efficiency of the various incident response operations aiming at containment.
  • trigger and improve the collaboration of the full incident response chain, involving security teams from the RCs, NGIs, EGI, VOs and CAs.


Scenario: Stolen Credentials

A common problem in distributed environments is that user credentials get compromised resulting in illicit usage of resources.

This might happen as a result of brute force attacks on weak passwords, lost/stolen hardware, phishing, or following an earlier incident where this data was harvested by the attacker. In addition, in the Cloud environment, we rather often see that users choose insecure (default) configuration for services they install or introduce other vulnerabilities which are then quickly exploited by automated attacks constantly targeting all systems connected to the internet.

Stolen or brute forced (ssh) credentials in distributed environments carry the additional risk that such incidents can spread rapidly, affecting multiple resource centres in multiple countries. Therefore proper access management is crucial in incident response. In the EGI infrastructure access to resources is usually controlled based on x509 certificates.

x509 access management can happen on different levels, each action has a certain delay until it takes effect and a certain scope.

  • Resource Center / Service level, immediately, bans the user at the RC/Service
  • Suspend DN at VOMS, up to 1 week, already issued voms-proxies remain valid, no new proxies will be issued. Scope VO wide, certificate could also be used within other VOs.
  • CA revokes certificate, takes effect when the new CRLs are loaded to the services, up to 48 hours, globally. Certificate will not be accepted at an service.
  • The FedCloud user management may not be fully integrated in the central suspension and therefore requires some manual intervention of the RC admins to make sure that the DN in question can not access the interfaces to start/stop/delete VMs.

Since suspending at the RC service level is immediately effective it is crucial that the RC security teams, as well as the VO security teams, managing the access to their resources are trained to suspend a reported malicious certififcate DN on all of there systems, to stop all running processes related to that DN, and to trace back a IP/VM to the controlling DN.

At the same time the state of the VM in question should be preserved for later investigation and further access to it suspended.

Security challenges: what is expected from sites ?

Rules

The sites contacted for a challenge are asked to follow the normal security incident response procedure, and react as if the incident was real, with the two following exceptions:

      1. No sanctions must be applied against the Virtual
         Organization (VO) that was used to submit the job / start the VM.
         

      2. All "multi-destination" alerts must be addressed to
         the e-mail list which has been designated for the test:

                     abuse(at)egi.eu

         for Security Service Challenges. Instead, insert the
         originally intended "multi-destination" address(es) in
         the body of your message.
         Make sure to have the string: 
                    
                     [SSC] 

         in the subject of the message.

Scope of the SSC / Information to be gathered at the sites

In this challenge the following basic Incident Response activities will get evaluated:

  • Communications: Provide in time information to be used in Incident Response
  • Containment:
    • Suspend DNs from accessing, starting, deleting a VM
    • Snapshot a live VM associated to a reported IP, including its memory
  • Traceability:
    • IP based, given a time-stamp and an IP, find a DN using a VM under the IP in question.
    • DN based: given a DN, find the IPs associated to VMs running under the DN in question
  • Forensics
    • Network connections of IP in time range X

For an initial response and first directions try to find answers to the following questions

  • NETWORK:
- Are there any other suspicious connections open to/from a reported IP or jobs running under a reported DN? 
  If so, to which IPs?

- What are the DNs associated to the reported IP?
  • CONTAINMENT:
- If possible suspend 
- From where (IPs) were the jobs submitted?

- From where (IPs) did l.

- To which VO is the user/certificate affiliated?

- Which grid-certificates (DN) are involved in this test-incident?
   # Example: DN-1: CN=John Doe, O=<SomeInstitute>,O=<Something>, ..."

- Since when were the VM running?
# Example: YYYY:MM:DD hh:mm
Date:

The sites should provide the security teams asap with this information at the latest within one working day. The time needed to pass this information to EGI-CSIRT by replying to the alarm mail will be measured and evaluated.

What is the normal security incident response procedure?

This exercise will also test the current Incident Response Procedure, and here in particular step 5, which covers the information collected for the coordinated incident response.

Please try to follow this procedure where possible, and note/report any problems with it

      
           PLEASE REMEMBER THAT FOR THE CHALLENGE
           THE PROCEDURE IS APPLIED WITH RESTRICTIONS
           AS STATED IN THE PREVIOUS SECTION.
           For questions please contact: ssc(at)mailman.egi.eu

More informations about EGI security procedures ( flowchart, formal document, forensic howto ... ) can be found here : https://wiki.egi.eu/wiki/EGI_CSIRT:Policies

Please also visit our Forensic Howto wiki pages. If you want to contribute, just send your input to irtf(at)mailman.egi.eu.

Scores, Evaluation - Report generation

We distinguish between

1) Measurable per site operations (with target times):

  1. initial feedback: 4h
  2. found malicious job/processes/stop them: 4h
  3. ban problematic certificate: 4h
  4. contain the malicious binary and sent it to the incident-coordinator: 24h

These will be measured by the ssc-monitor and the scores the sites get are calculated according to the formula stated on the wiki page. Times are relative to the timestamp in the alarm ticket sent to the site, we try to make sure that the alarms will be send during office-hours (09:00 - 18:00, local time).

and 2) per site Forensic operations:

  1. all sites will receive "malware" with unique per site artifacts like UUIDs, URLs, IPs etc finding them may require more advanced forensic operations, like memory analysis. These findings should be reported to IRTF within the ssc incident ticket as soon as they are found.

The reported artifacts will be extracted from the transactions in the ticket. The available scores will decrease over time.

Post processing, clean up

As part of the incident handling, Grid authorizations may have been withdrawn from the DN that was used to submit the job. When the incident response procedure is complete, the test operator will explicitly request restoration of any such authorizations to their original state.

SSC Evaluation Form

Lhcb score table template.png

De-briefing

When the challenge has been completed on a representative number of Sites, the test operator will ask for de-briefing input from the participating Sites. Material submitted will be used to edit a report. The report will be circulated to the contributors for comments before being presented to the EGI-CSIRT.

Communication Template Debriefing

Dear all, 
thank you for your contributions to the SSC-19.03

This message is about to inform you that the SSC-19.03 is now over. You should receive the site report the next days.


As a clean-up step we would now ask the challenged sites to restore eventually banned credentials, in particular:

   CN=Amelie Caillet, CN=700993, CN=acaillet, OU=Users, OU=Organic Units, DC=cern, DC=ch

and
 
   CN=Cindy Denis, CN=759002, CN=cidenis, OU=Users, OU=Organic Units, DC=cern, DC=ch


Remark: Not sure if it is worth to also ask them to clean up the worker nodes where the payload ran. This should have happened meanwhile automatically, the sites where we saw long living bots got informed to kill them (in May ;-))