Agenda-28-01-2013
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Detailed agenda: Grid Operations Meeting 07 January 2013
Audio conference link | No password |
Audio conference details | Indico page |
1. Middleware releases and staged rollout
1.1. Update on the status of EMI updates
- EMI-2 update #8, 28 Jan 2013
- EMI-1 update #23, 28 Jan 2013
- Future updates: EMI planned updates
- For the February updates currently scheduled WMS and CREAM
1.2. Staged Rollout
- For UMD-1 (UMD-1.9.0) we have 2 products
- Gridsite 1.7.25 which is under verification
- L&B 3.2.9 in Stage Rollout (same version in UMd-2 production)
- For UMD-2 (UMD-2.4.0)
- Several products ready for production:
- Cream 1.14.2
- Cream Torque 2.0.0-2
- Blah 1.18.2
- All ARC 2.0.1 components
- Glite Mpi - 1.4.0
- Grobus rls 5.2.2
- Globus MyProxy 5.2.2
- Gridway - 5.12.0
- In Stagged Rollout:
- WMS 3.4.0 (some issues found)
- Globus default security 5.2.2
- Security Integration - 2.2.1
- Gridsite 1.7.25
- Several products ready for production:
- other products
- CA update, version 1.52-1 (under SR) to be released at 30-01-2013
- SAM-Update 20 (SR done)
- New Products
- IGE gridSAM (in verification)
2. Operational Issues
2.1 Status of unsupported middleware update
2.2 Updates from DMSU
lcg-gt problems with dcache SE
Details GGUS #90807
The current version of lcg-util and gfal (1.13.9-0) return the following error, apparently only when using dCache SEs:
$ lcg-gt -D srmv2 -T srmv2 srm://srm.triumf.ca/dteam/generated/2013-01-25/filed56b1d3e-76f8-4f5a-9b32-94e6d038ab4b gsiftp gsiftp://dpool13.triumf.ca:2811/generated/2013-01-25/filed56b1d3e-76f8-4f5a-9b32-94e6d038ab4b [ERROR] No request token returned with SRMv2
Instead, using an lder version of lcg-utils, like the one deployed in gLite, the command lcg-gt works fine. Indeed nagios doesn't detect this problem because it is still gLite-based (lcg_util-1.11.16-2 and GFAL-client-1.11.16-2)
the develpers are investigating on this issue
LFC-Oracle problem
Details GGUS #90701
The error is occurring with EMI2 emi-lfc_oracle-1.8.5-1.el5 and Oracle 11:
#lfc-ls lfc-1-kit:/grid send2nsd: NS002 - send error : client_establish_context: The server had a problem while authenticating our connection lfc-1-kit:/grid: Could not secure the connection
Experts suspect it is due to the use of Oracle 11 client when the LFC code has been compiled against the Oracle 10 API. The LFC developers expect to provide rpms built against Oracle 11 shortly.
list-match problem with EMI2 WMS
Details GGUS #90240
Some CEs have enabled only a group or a role in their queues, not the entire VO:
GlueCEAccessControlBaseRule: VOMS:/gridit/ansys GlueCEAccessControlBaseRule: VOMS:/gridit/ansys/Role=SoftwareManager
so, when your primary attribute is:
attribute : /gridit/ansys/Role=NULL/Capability=NULL
if you use an EMI-2 WMS, you cannot match those resources (instead you can if use EMI-1 WMS)
It seems that the problem is in the value of WmsRequirements contained the file /etc/glite-wms/glite_wms.conf: the filter set in that variable is different from the one used in the EMI-1 WMS. The developers are investigating on it
proxy renewal problems on EMI1 WMS
Details GGUS #89801
Under some circumstances, ICE cannot renew the user credentials due to glite-wms-ice-proxy-renew hanging processes. It is believed that the guilty is this Savannah bug. The bug is already solved in EMI2.
Problems with aliased DNS names of myproxy
Details GGUS #89105
DNS aliases of myproxy server (i.e. used to implement round-robin load balance and/or high availability) may cause problems to proxy renewal when all DNS aliases, including the canonical name, are not included in the host certificate of the myproxy server SubjectAltNames extensions.
The failure may not appear always (it depends on multiple conditions like versions of globus etc.), however, sites are encouraged to use certificates which cover all the DNS aliases thoroughly.
EMI-2 WN: yaim bug for cleanup-grid-accounts
Detail GGUS #90486
For a bug, the cleanup-grid-accounts procedure doesn't properly work, so the occupied space on WNs may increase.
the yaim function config_lcgenv unsets the path $INSTALL_ROOT, so it isn't valid the path usesd by the cron cleanup-grid-accounts:
# cat /etc/cron.d/cleanup-grid-accounts PATH=/sbin:/bin:/usr/sbin:/usr/bin 36 3 * * * root /sbin/cleanup-grid-accounts.sh -v >> /var/log/cleanup-grid-accounts.log 2>&1
# tail /var/log/cleanup-grid-accounts.log /bin/sh: /sbin/cleanup-grid-accounts.sh: No such file or directory
# ls -l /sbin/cleanup-grid-accounts.sh ls: /sbin/cleanup-grid-accounts.sh: No such file or directory
# ls -l /usr/sbin/cleanup-grid-accounts.sh -rwxr-xr-x 1 root root 6747 May 16 2012 /usr/sbin/cleanup-grid-accounts.sh
Until the fix is released in production a workaround could be applied by changing the cleanup-grid-accounts cron with the correct path, like:
# cat /etc/cron.d/cleanup-grid-accounts PATH=/sbin:/bin:/usr/sbin:/usr/bin 16 3 * * * root /usr/sbin/cleanup-grid-accounts.sh -v >> /var/log/cleanup-grid-accounts.log 2>&1
Another workaround is also possible:
/opt/glite/yaim/bin/yaim -r -s -n WN -n TORQUE_client -n GLEXEC_wn -f config_users
This does not execute config_lcgenv, therefore $INSTALL_ROOT is set correctly (to /usr).
Currently active surveys
- [new] Survey for sites: Use of configuration management tools in the EGI resource centres, deadline March 1st 2013
- Survey for NOC Managers: Federation of NGI services and central coordination