Detailed agenda: Grid Operations Meeting 07 January 2013

1. Middleware releases and staged rollout

1.1. Update on the status of EMI updates

1.2. Staged Rollout

2. Operational Issues

2.1 Status of unsupported middleware update

2.2 Updates from DMSU

lcg-gt problems with dcache SE

Details GGUS #90807

The current version of lcg-util and gfal (1.13.9-0) return the following error, apparently only when using dCache SEs:

$ lcg-gt -D srmv2 -T srmv2 srm:// gsiftp
[ERROR] No request token returned with SRMv2

Instead, using an lder version of lcg-utils, like the one deployed in gLite, the command lcg-gt works fine. Indeed nagios doesn't detect this problem because it is still gLite-based (lcg_util-1.11.16-2 and GFAL-client-1.11.16-2)

the develpers are investigating on this issue

LFC-Oracle problem

Details GGUS #90701

The error is occurring with EMI2 emi-lfc_oracle-1.8.5-1.el5 and Oracle 11:

#lfc-ls lfc-1-kit:/grid

send2nsd: NS002 - send error : client_establish_context: The server had a problem while authenticating our connection
lfc-1-kit:/grid: Could not secure the connection

Experts suspect it is due to the use of Oracle 11 client when the LFC code has been compiled against the Oracle 10 API. The LFC developers expect to provide rpms built against Oracle 11 shortly.

list-match problem with EMI2 WMS

Details GGUS #90240

Some CEs have enabled only a group or a role in their queues, not the entire VO:

GlueCEAccessControlBaseRule: VOMS:/gridit/ansys
GlueCEAccessControlBaseRule: VOMS:/gridit/ansys/Role=SoftwareManager

so, when your primary attribute is:

attribute : /gridit/ansys/Role=NULL/Capability=NULL

if you use an EMI-2 WMS, you cannot match those resources (instead you can if use EMI-1 WMS)

It seems that the problem is in the value of WmsRequirements contained the file /etc/glite-wms/glite_wms.conf: the filter set in that variable is different from the one used in the EMI-1 WMS. The developers are investigating on it

proxy renewal problems on EMI1 WMS

Details GGUS #89801

Under some circumstances, ICE cannot renew the user credentials due to glite-wms-ice-proxy-renew hanging processes. It is believed that the guilty is this Savannah bug. The bug is already solved in EMI2.

Problems with aliased DNS names of myproxy

Details GGUS #89105

DNS aliases of myproxy server (i.e. used to implement round-robin load balance and/or high availability) may cause problems to proxy renewal when all DNS aliases, including the canonical name, are not included in the host certificate of the myproxy server SubjectAltNames extensions.

The failure may not appear always (it depends on multiple conditions like versions of globus etc.), however, sites are encouraged to use certificates which cover all the DNS aliases thoroughly.

EMI-2 WN: yaim bug for cleanup-grid-accounts

Detail GGUS #90486

For a bug, the cleanup-grid-accounts procedure doesn't properly work, so the occupied space on WNs may increase.

the yaim function config_lcgenv unsets the path $INSTALL_ROOT, so it isn't valid the path usesd by the cron cleanup-grid-accounts:

# cat /etc/cron.d/cleanup-grid-accounts
36 3 * * * root /sbin/ -v >> /var/log/cleanup-grid-accounts.log 2>&1
# tail /var/log/cleanup-grid-accounts.log
/bin/sh: /sbin/ No such file or directory
# ls -l /sbin/
ls: /sbin/ No such file or directory
# ls -l /usr/sbin/
-rwxr-xr-x 1 root root 6747 May 16  2012 /usr/sbin/

Until the fix is released in production a workaround could be applied by changing the cleanup-grid-accounts cron with the correct path, like:

# cat /etc/cron.d/cleanup-grid-accounts
16 3 * * * root /usr/sbin/ -v >> /var/log/cleanup-grid-accounts.log 2>&1

Another workaround is also possible:

/opt/glite/yaim/bin/yaim -r -s -n WN -n TORQUE_client -n GLEXEC_wn -f config_users

This does not execute config_lcgenv, therefore $INSTALL_ROOT is set correctly (to /usr).

3. AOB

3.2 Next meeting

4. Minutes

