Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Agenda-16-04-2012

From EGIWiki
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security

Detailed agenda: Grid Operations Meeting 16 April 2012 14h00 Amsterdam time

EVO direct link Pwd: gridops
EVO details Indico page


1. Middleware releases and staged rollout

1.1 EMI release status

1.2 Staged Rollout

  • Currently no products in staged rollout.
  • LFC_oracle 1.8.2 in the final stage of Verification by one Tier1.
  • IGE SAGA adaptors: http://www.saga-project.org/ , is currently under verification, but we need to know if there are sites interested in it to do staged rollout. Already had some expression of interest from 1 site.

Upcoming EMI update 15 (see Cristina's presentation)

EMI2 is scheduled for the 7th of May.

2. Operational Issues

2.1 BDII instability

  • On April 12th several Site-BDIIs in Ibergrid and NGI_IT were malfunctioning:
  • Symptomps:
    • Site-BDIIs failing SAM probes, failing attempts to restart them
    • High CPU usage of LDAP services, even after the restart
    • gLite3.2 and EMI services affected
    • Similar problems may have affected the Top-BDIIs: the logs report that during the same period some of the Top-BDIIs in the HA cluster of Ibergrid were removed because not responsive.
  • During the same hours a GEANT network problem was reported: in particular caused by a router in Geneva (12th April 2012 around 16 CET)

Possible connection between the problems (G.Borges):

  • GEANT problem reported as intermittent
  • Connections with the client -broken because of the network- remained pending (default timeout 60 seconds)
  • Clients reconnected to the BDIIs multiplying the number of connections to the server, causing the effect of a sort of DoS attack

For all NGIs:

  • Assess the status of Site-BDIIs and Top-BDIIs and report similar problems
  • If possible, upgrade BDII instances to BDII Site 1.1.0
    • This version adds dependencies to OpenLdap 2.4 (increased stability), and reduces memory and disk usage
    • Note: This may not solve this specific issue, but it will increase the general BDII performance

2.2 VOMS Admin fails notifying about exipiring membership

2.2.1 Description of the problem

A bug was found affecting the VOMS Admin process of sending warning messages when user membership is about to expire. The VOMS Admin versions affected by the bug are:

  • gLite 3.2: versions 2.5.3-1 and 2.5.5-1
  • EMI 1: VOMS Admin 2.6.1

These VOMS Admin versions enforce the user membership expiration (default every 12 months). The bug prevents the sending of a warning email before the users' membership expiration, VO managers are notified only once the user is suspended, after the membership expiration.

2.2.2 Possible workarounds

  1. Extension of VO membership up to 30 September 2012
    • For users whose membership is expiring before 30 Sep 2012
    • This will allow VOMS administrator to upgrade their services once the bug fix is available
    • instructions
  2. Manual notification of the list of users with expiring membership
    • Every month VOMS server administrator should produce, for every VO affected by the bug, a report of the users whose membership are expiring during the month and communicate those lists to the VO managers.

2.2.3 AuP grace period

For the VOs using the same VOMS releases reported above, users are requested to re-sign the VO AuP, as an additional step to renew the membership. Currently the default grace period is 24 hours, VOMS administrators are strongly suggested to extend this grace period to 7 days, a warning email is sent to the users at the beginning of the grace period, this change in configuration will give more time to users to perform their step (the users who do not resign the AuP are suspended at the end of the grace period).

2.3 New GOCDB roles in production

New roles were rolled in production on April 10th.

GOCDB users were automatically assigned to the new roles:

  • At Site level
    • Site Administrator -> Site Operations Manager. In order to maintain the previous users' permissions
    • Security Officer -> Site Security Officer
  • At Regional level
    • Regional Operations Staff -> "Regional Staff (ROD)"
    • "Deputy Regional Manager" -> "NGI Operations Deputy Manager"
    • "Regional Manager" -> "NGI Operations Manager"
    • "Security Officer" -> "NGI Security Officer"
    • No users have been automatically assigned to the new role "Regional First Line Support"

Important: For the operations tools nothing have changed after the switch to the new roles, if you notice any different behavior in the the tools, please report it in a GGUS ticket.

2.3.1 Problem reported and solved

  • ROD teams were not able to access the ROD dashboard
    • BUG fixed and rolled in production on Friday April 13th by the Operations Portal team
  • New roles were not showing in the User details view of the GOCDB GUI
    • BUG fixed and rolled in production on Friday April 13th by the GOCDB team

4. AOB

4.1 Next meetings