Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

EGI-CSIRT:TDG/best pract

From EGIWiki
Revision as of 16:25, 16 October 2012 by Krakow (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

EGI-CSIRT Public wiki EGI-CSIRT Private wiki


EGI-CSIRT Contacts | Back to TDG Main


Best Practices

Protecting Administrative Credentials

Credentials to access systems or Grid services should be well protected to prevent attackers from gaining access to the system or one of its services.

For instance, for Grid services, the certificate private key (generally stored in userkey.pem or *.p12 files) is the secret part of the information representing the identity of its owner. This information is secret and must remain readable only by its owner. If the private key becomes known to an attacker, he/she will have the ability to impersonate the owner of the certificate on the Grid. While protecting private keys is under the responsibility of their owners, when allowed, site administrators are encouraged to periodically search for publicly readable private keys on their hosts. Unprotected and publicly readable private keys should be sent to the relevant CA for revocation.

Also, SSH bruteforce attacks are very common and it is recommended to use SSH keys authentication instead of password authentication to authenticate against remote SSH servers. Indeed, once the SSH public key is stored on the remote server, it is possible to authenticate against it by using the relevant SSH private key, which is protected by a passphrase.

Of course, again, this mechanism is efficient only if the private key is protected by a good passphrase!

Business Continuity

A critical area of the system documentation is the business continuity plan. This document is part of the system risk analysis, in the event of a major disaster.

The aim of this document is to provide detailed procedures that need to be followed in order to maintain a service when a major disaster occurs. For instance, hot spare backup machines, backup tapes or off-site mirror can help restoring a service prompty when required. Ideally, a business continuity plans contains enough details to be followed and fully executed by non-expert staff.

It is far from easy to define a reasonable business continuity plan. Every aspect of the plan needs to be carefully defined in order to ensure that it can be used at anytime. Therefore it needs to be defined and agreed in advanced in order to be fully effective.

Updating the CRLs

Even though digital certificates have limited lifetime, information in a certificate can become invalid even before the certificate expires, e.g., when he corresponding private key is compromised or lost, the affiliation of certificate owner is changed, etc. In such a case, it is necessary to label the certificate as invalid so it cannot be accepted anymore and the issuing CA must revoke the certificate. In order to make information about revoked certificates widely available, each standard CA regularly publishes certificate revocation lists -- CRLs identifying certificates that have been revoked by the CA. Each machine in a Grid must maintain a local copy of the CRLs of all CAs it trusts. Proper checking of the CRLs must always be an integral part of certificate verification.

The gLite distribution contains a cron script (fetch-crl) that automates the periodical retrieval of CRLs from all CAs. Always make sure the script is installed and enabled. CRLs installed on a local system are automatically checked by most gLite services. However, other services must be explicitely configured to do the CRL checks, therefore, always consult documentation, especially when configuring a third-party application, such as Apache.

System Documentation

Writing and keeping system documentation up-to-date is an ongoing task, which enables the system manager to have an overview of the system at anytime regarding:

  • The purpose of the system
  • The hardware configuration
  • The filesystem layout
  • The network configuration, including firewall rules
  • The various groups and users
  • The recovery procedures
  • The business continuity plan
  • The system backup
  • The list of people having privileged access
  • The logs configuration
  • The change control procedures
  • The system monitoring
  • The system patching
  • The files integrity checking
  • The dependency on other systems

If a security incident occurs, having an online and up-to-date system documentation can greatly reduce the impact of the incident on the system and on the organisation it belongs to.