Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Agenda-16-04-2012

From EGIWiki
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security



Detailed agenda: Grid Operations Meeting 16 April 2012 14h00 Amsterdam time

EVO direct link Pwd: gridops
EVO details Indico page


1. Middleware releases and staged rollout

1.1 EMI-1 release status

1.2 Staged Rollout

2. Operational Issues

2.1 BDII instability

  • On April 12th several Site-BDIIs in Ibergrid and NGI_IT were malfunctioning:
  • Symptomps:
    • Site-BDIIs failing SAM probes, failing attempts to restart them
    • High CPU usage of LDAP services, even after the restart
    • gLite3.2 and EMI services affected
    • Similar problems may have affected the Top-BDIIs: the logs report that during the same period some of the Top-BDIIs in the HA cluster of Ibergrid were removed because not responsive.
  • During the same hours a GEANT network problem was reported: in particular caused by a router in Geneva (12th April 2012 around 16 CET)

Possible connection between the problems (G.Borges):

  • GEANT problem reported as intermittent
  • Connections with the client -broken because of the network- remained pending (default timeout 60 seconds)
  • Clients reconnected to the BDIIs multiplying the number of connections to the server, causing the effect of a sort of DoS attack

For all NGIs:

  • Check the Site-BDII and Top-BDII logs and report similar problems
  • If possible upgrade BDII instances to BDII Site 1.1.0
    • This version adds dependencies to OpenLdap 2.4 (increased stability), and reduces memory and disk usage
    • This may not solve this specific issue

4. AOB

4.1 Next meetings