Agenda-16-04-2012
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Detailed agenda: Grid Operations Meeting 16 April 2012 14h00 Amsterdam time
EVO direct link | Pwd: gridops |
EVO details | Indico page |
1. Middleware releases and staged rollout
1.1 EMI-1 release status
1.2 Staged Rollout
2. Operational Issues
2.1 BDII instability
- On April 12th several Site-BDIIs in Ibergrid and NGI_IT were malfunctioning:
- Symptomps:
- Site-BDIIs failing SAM probes, failing attempts to restart them
- High CPU usage of LDAP services, even after the restart
- gLite3.2 and EMI services affected
- Similar problems may have affected the Top-BDIIs: the logs report that during the same period some of the Top-BDIIs in the HA cluster of Ibergrid were removed because not responsive.
- During the same hours a GEANT network problem was reported: in particular caused by a router in Geneva (12th April 2012 around 16 CET)
Possible connection between the problems (G.Borges):
- GEANT problem reported as intermittent
- Connections with the client -broken because of the network- remained pending (default timeout 60 seconds)
- Clients reconnected to the BDIIs multiplying the number of connections to the server, causing the effect of a sort of DoS attack
For all NGIs:
- Check the Site-BDII and Top-BDII logs and report similar problems
- If possible upgrade BDII instances to BDII Site 1.1.0
- This version adds dependencies to OpenLdap 2.4 (increased stability), and reduces memory and disk usage
- This may not solve this specific issue