Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI CSIRT:Central emergency suspension"

From EGIWiki
Jump to navigation Jump to search
Line 1: Line 1:
{{New-Egi-csirt-header}} {{TOC_right}}  
{{New-Egi-csirt-header}} {{TOC_right}}  
<br>
This page describe status of implementation of EGI&nbsp;Central emergency suspension infrastructure.
<br>


== Central emergency suspension procedure  ==
== Central emergency suspension procedure  ==
Line 11: Line 5:
The document describing the central emergency suspension procedure is available at [https://documents.egi.eu/secure/ShowDocument?docid=1018 EGI CSIRT Operational Procedure for Compromised Certificates]. <br>  
The document describing the central emergency suspension procedure is available at [https://documents.egi.eu/secure/ShowDocument?docid=1018 EGI CSIRT Operational Procedure for Compromised Certificates]. <br>  


== Argus Infrastructure Deployment <br> ==
== Argus Infrastructure Deployment  ==
 
=== Argus Deployment  ===


*'''Central Argus Instance''' at CERN  
*'''Central Argus Instance''' at CERN  
*'''NGI Argus Instance''': [https://goc.egi.eu/portal/index.php?Page_Type=Service_Group&id=1184 EGI CoreArgus Service Group]<br>  
*'''NGI Argus Instance''': [https://goc.egi.eu/portal/index.php?Page_Type=Service_Group&id=1184 EGI CoreArgus Service Group]<br>  
**All NGIs should run a Argus instance  
** All NGIs should run a Argus instance  
**NGIs that don't have a Site/RC that uses Argus don't need to run a Argus service  
** NGIs that don't have a Site/RC that uses Argus don't need to run a Argus service  
**NGI Argus instance should be registered in GOC&nbsp;DB&nbsp;with service type <span style="vertical-align: middle;">emi.ARGUS                               </span>
** NGI Argus instance should be registered in GOC&nbsp;DB&nbsp;with service type ngi.ARGUS
**The NGI-Argus servers have to be configured/maintained carefully. A potential attacker getting privileged access to this system could block all jobs that are submitted to the sites using this NGI-Argus service.  
** The NGI-Argus servers have to be configured/maintained carefully. A potential attacker getting privileged access to this system could block all jobs that are submitted to the sites using this NGI-Argus service.  
** NGI-Argus Systems contain personal data and shall limit access this service to the site Argus (like) systems in the NGI.
** NGI-Argus Systems contain personal data and shall limit access this service to the site Argus (like) systems in the NGI.
** ACLs can be constructed by pulling the list of egi.Argus'es for the resp. NGI from goc-db
** ACLs can be constructed by pulling the list of egi.Argus'es for the resp. NGI from goc-db
*'''Site Argus Instance'''  
*'''Site Argus Instance'''  
**Sites in the NGIs pull policies from NGI Argus  
** Sites in the NGIs pull policies from NGI Argus  
**Small sites that don't have the expertise to run a local Argus could use the NGI Argus   
** Small sites that don't have the expertise to run a local Argus could use the NGI Argus   
**Site Argus instance should be registered in GOC&nbsp;DB&nbsp;with service type <span style="vertical-align: middle;">emi.ARGUS                               </span>
** Site Argus instance should be registered in GOC&nbsp;DB&nbsp;with service type emi.ARGUS
 
* Non Argus Sites/RCs  
=== Non Argus Infrastructures/NGIs/RCs  ===
** Pull the list directly from NGI-Argus, feed it into their fabric management, deploy it at all services at the RC  
 
** Scripts Documentation available at [http://wiki.nikhef.nl/grid/Argus_Global_Banning_Setup_Overview Nikhef wiki Argus_Global_Banning_Setup_Overview ]
*Non Argus Sites/RCs  
**Pull the list directly from NGI-Argus, feed it into their fabric management, deploy it at all services at the RC  
**Scripts Documentation available at [http://wiki.nikhef.nl/grid/Argus_Global_Banning_Setup_Overview Nikhef wiki Argus_Global_Banning_Setup_Overview ]


== Argus Monitoring<br> ==
== Argus Monitoring  ==


'''Goal:''' Nagios probe for NGI Argus run centrally (secmon.egi.eu)
=== NGI Argus Monitoring ===


'''Note:''': *ONLY* the <span style="color:#FF0000"> NGI-Argus servers (ngi.argus service type) should accept nagios probes </span>.  
The ''eu.egi.Argus-DNs'' metric checks if an Argus server is properly configured and still pulling suspension information from the Central Argus Instance.


'''Note:''' <span style="color:#FF0000">'''Site-Argus systems''' must not </span> expose this service to the internet.  
Every day the Central Argus Instance suspends a new DN: the probe verifies if this DN is present on the NGI argus.


'''List of Services to monitor with Nagios''': [https://goc.egi.eu/portal/index.php?Page_Type=Service_Group&id=1184 Goc-DB NGI-Argus Servers]
The return values of that probe can indicate the following problems:


*Probe: https://rt.egi.eu/guest/Ticket/Attachment/354893/1515343/argus-fetch.py
{| class="wikitable"
|-
! Return value
! Problem
! Potential solution
|-
| ARGUS WARN - connection error
| The probe was not able to connect to the Argus server
| Please make sure that the argus pap port (8150) is accessible remotely from argo-mon.egi.eu, argo-mon2.egi.eu and argo-mon-test.cro-ngi.hr
|-
| ARGUS WARN - Authorization error
| The probe was able to connect but was denied access
| Please make sure that the following certificates are given the "POLICY_READ_LOCAL|POLICY_READ_REMOTE|CONFIGURATION_READ" permissions are given to "/DC=EU/DC=EGI/C=HR/O=Robots/O=SRCE/CN=Robot:argo-egi@cro-ngi.hr" and "/DC=EU/DC=EGI/C=GR/O=Robots/O=Greek Research and Technology Network/CN=Robot:argo-egi@grnet.gr"
|-
| ARGUS CRIT - Expected DN not found!
| The probe didn't find a recent DN in the Argus configuration
| Please check your argus logs to see what is blocking the synchronization
|-
| ARGUS WARN - Found outdated DN
| The probe only found an outdate DN and not the current one
| Please check your argus logs to see what is delaying the synchronization. The synchronization delay might be too long
|-
| ARGUS OK - Found expected DN
| Everything is good!
|
|}


(The main modification is the addition of a loop: instead of listing the "default" PAP, it's first listing all the PAPs using "getAllPaps" on "/pap/services/PAPManagementService?wsdl"
For more details on the Argus configuration see bellow.


Note: as discussed, I believe, during one of our meetings, the getAllPaps requires the ListPapsOperation right.)
=== Site Monitoring ===


<br> '''What to monitor:'''  
Site Arguses (or equivalent solutions) should not be exposed to the internet and thus cannot be directly monitored
However the EGI CSIRT is considering submitting jobs from suspended DNs, but such monitoring of the sites' emergency suspension systems is not yet in place.


*System UP
== Argus Support ==
**Fetch the suspension list from those argus servers
**Try to submit a job with a suspended DN - this would only look at a single component where the proxy-certificates are used. We need to look at gacl/l,scas at CE, WMS, SEs (perhaps more).
*Last update of ban information fetched from the central instance at CERN. - will not be run against argus services, here we only want to monitor that the ban information gets updated.
 
== Argus Support<br>  ==


Support is provided through [[GGUS:ARGUS FAQ|ARGUS&nbsp;Support unit]] in GGUS  
Support is provided through [[GGUS:ARGUS FAQ|ARGUS&nbsp;Support unit]] in GGUS  


<br>
== Documentation  ==
 
#INFN supports PAP component
#*Could take PDP + PEPd on board if e.g. INDIGO-DataCloud gets approved
#NIKHEF supports C clients
#*Used e.g. by gLExec
#EGI
#*Release management, staged rollout, deployment<br>campaigns
#*1st and 2nd level support
#*Scale testing with partner sites
#**MW Readiness Validation activity <br>
 
Potential new partners<br>
 
#CESNET
#*Testing, maybe development
#UNICORE
#*Connection via CANL
#ARC
#*Client needs fixing
 
== Documentation<br> ==
 
Documentation on possible problems and solutions with certain deployment scenarios are in [http://wiki.nikhef.nl/grid/Argus_Global_Banning_Setup_Overview Nikhef wiki, Argus Global Banning Setup Overview]
 
== Workplan  ==
 
Members:
 
*Sven Gabriel (EGI&nbsp;CSIRT)
*Małgorzata Krakowian (EGI Operations)
*Peter Solagna (EGI Operations)
*Cristina Aiftimiei (EGI Operations)
*Emir Imamagic (Monitoring)
*V. Brillaut (Monitoring probes)<br>
 
<br><br>
 
#NGI Argus Services are deployed (coordinated by EGI Operations, action on NGIs, ggus tickets opened) '''DONE'''
#Information of the NGI Argus services is in the appropriate format in goc db (action on goc-db/NGIs, coordinated by EGI Operations)'''DONE'''
#Monitoring that NGI-Argus services have updated banning information, monitoring results available to EGI-CSIRT for example via security dashboard (coordinated by EGI Operations, action on Nagios Monitoring group) Remark: probe is available from V. Brillaut
#Test if ban information propagates to the sites services: CE/SE/WMS (action on EGI-CSIRT)
#?<br>
 
<br>
 
<br>


<br>
Documentation on possible problems and solutions with certain deployment scenarios are in [http://wiki.nikhef.nl/grid/Argus_Global_Banning_Setup_Overview Nikhef wiki, Argus Global Banning Setup Overview]

Revision as of 16:35, 30 May 2017

EGI-CSIRT web site EGI-CSIRT Public wiki EGI-CSIRT Contacts EGI-CSIRT Activities EGI-CSIRT Private wiki



Central emergency suspension procedure

The document describing the central emergency suspension procedure is available at EGI CSIRT Operational Procedure for Compromised Certificates.

Argus Infrastructure Deployment

  • Central Argus Instance at CERN
  • NGI Argus Instance: EGI CoreArgus Service Group
    • All NGIs should run a Argus instance
    • NGIs that don't have a Site/RC that uses Argus don't need to run a Argus service
    • NGI Argus instance should be registered in GOC DB with service type ngi.ARGUS
    • The NGI-Argus servers have to be configured/maintained carefully. A potential attacker getting privileged access to this system could block all jobs that are submitted to the sites using this NGI-Argus service.
    • NGI-Argus Systems contain personal data and shall limit access this service to the site Argus (like) systems in the NGI.
    • ACLs can be constructed by pulling the list of egi.Argus'es for the resp. NGI from goc-db
  • Site Argus Instance
    • Sites in the NGIs pull policies from NGI Argus
    • Small sites that don't have the expertise to run a local Argus could use the NGI Argus
    • Site Argus instance should be registered in GOC DB with service type emi.ARGUS
  • Non Argus Sites/RCs

Argus Monitoring

NGI Argus Monitoring

The eu.egi.Argus-DNs metric checks if an Argus server is properly configured and still pulling suspension information from the Central Argus Instance.

Every day the Central Argus Instance suspends a new DN: the probe verifies if this DN is present on the NGI argus.

The return values of that probe can indicate the following problems:

Return value Problem Potential solution
ARGUS WARN - connection error The probe was not able to connect to the Argus server Please make sure that the argus pap port (8150) is accessible remotely from argo-mon.egi.eu, argo-mon2.egi.eu and argo-mon-test.cro-ngi.hr
ARGUS WARN - Authorization error The probe was able to connect but was denied access POLICY_READ_REMOTE|CONFIGURATION_READ" permissions are given to "/DC=EU/DC=EGI/C=HR/O=Robots/O=SRCE/CN=Robot:argo-egi@cro-ngi.hr" and "/DC=EU/DC=EGI/C=GR/O=Robots/O=Greek Research and Technology Network/CN=Robot:argo-egi@grnet.gr"
ARGUS CRIT - Expected DN not found! The probe didn't find a recent DN in the Argus configuration Please check your argus logs to see what is blocking the synchronization
ARGUS WARN - Found outdated DN The probe only found an outdate DN and not the current one Please check your argus logs to see what is delaying the synchronization. The synchronization delay might be too long
ARGUS OK - Found expected DN Everything is good!

For more details on the Argus configuration see bellow.

Site Monitoring

Site Arguses (or equivalent solutions) should not be exposed to the internet and thus cannot be directly monitored However the EGI CSIRT is considering submitting jobs from suspended DNs, but such monitoring of the sites' emergency suspension systems is not yet in place.

Argus Support

Support is provided through ARGUS Support unit in GGUS

Documentation

Documentation on possible problems and solutions with certain deployment scenarios are in Nikhef wiki, Argus Global Banning Setup Overview