Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-16-04-2012"

From EGIWiki
Jump to navigation Jump to search
Line 18: Line 18:
=== 1.1 EMI-1 release status ===
=== 1.1 EMI-1 release status ===


=== 1.2 Staged Rollout ===
=== 1.2 Staged Rollout ===
 
*Currently no products in staged rollout.
*LFC_oracle 1.8.2 in the final stage of Verification by one Tier1.
*IGE SAGA adaptors: http://www.saga-project.org/ , is currently under verification, but we need to know if there are sites interested in it to do staged rollout. Already had some expression of interest from 1 site.
 
Upcoming EMI update 15 (see Cristina's presentation)
 
EMI2 is scheduled for the 7th of May.


== 2. Operational Issues  ==
== 2. Operational Issues  ==

Revision as of 12:28, 16 April 2012

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security



Detailed agenda: Grid Operations Meeting 16 April 2012 14h00 Amsterdam time

EVO direct link Pwd: gridops
EVO details Indico page


1. Middleware releases and staged rollout

1.1 EMI-1 release status

1.2 Staged Rollout

  • Currently no products in staged rollout.
  • LFC_oracle 1.8.2 in the final stage of Verification by one Tier1.
  • IGE SAGA adaptors: http://www.saga-project.org/ , is currently under verification, but we need to know if there are sites interested in it to do staged rollout. Already had some expression of interest from 1 site.

Upcoming EMI update 15 (see Cristina's presentation)

EMI2 is scheduled for the 7th of May.

2. Operational Issues

2.1 BDII instability

  • On April 12th several Site-BDIIs in Ibergrid and NGI_IT were malfunctioning:
  • Symptomps:
    • Site-BDIIs failing SAM probes, failing attempts to restart them
    • High CPU usage of LDAP services, even after the restart
    • gLite3.2 and EMI services affected
    • Similar problems may have affected the Top-BDIIs: the logs report that during the same period some of the Top-BDIIs in the HA cluster of Ibergrid were removed because not responsive.
  • During the same hours a GEANT network problem was reported: in particular caused by a router in Geneva (12th April 2012 around 16 CET)

Possible connection between the problems (G.Borges):

  • GEANT problem reported as intermittent
  • Connections with the client -broken because of the network- remained pending (default timeout 60 seconds)
  • Clients reconnected to the BDIIs multiplying the number of connections to the server, causing the effect of a sort of DoS attack

For all NGIs:

  • Assess the status of Site-BDIIs and Top-BDIIs and report similar problems
  • If possible, upgrade BDII instances to BDII Site 1.1.0
    • This version adds dependencies to OpenLdap 2.4 (increased stability), and reduces memory and disk usage
    • Note: This may not solve this specific issue, but it will increase the general BDII performance

2.2 VOMS Admin fails notifying about exipiring membership

A bug was recently found affecting the VOMS Admin process of sending warning messages when user membership is about to expire. The VOMS Admin versions affected by the bug are:

  • gLite 3.2: versions 2.5.3-1 and 2.5.5-1
  • EMI 1: VOMS Admin 2.6.1

4. AOB

4.1 Next meetings