Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI-InSPIRE:DMSU digests"

From EGIWiki
Jump to navigation Jump to search
 
Line 1: Line 1:
{{Template:EGI-Inspire menubar}}
{{Template:Inspire_reports_menubar}}
{{TOC_right}}
'''This page was a proof-of-the-concept only. It is not maintained anymore, it was superseded by [[Middleware_issues_and_solutions]].'''
'''This page was a proof-of-the-concept only. It is not maintained anymore, it was superseded by [[Middleware_issues_and_solutions]].'''



Latest revision as of 18:26, 9 January 2015

EGI Inspire Main page


Inspire reports menu: Home SA1 weekly Reports SA1 Task QR Reports NGI QR Reports NGI QR User support Reports



This page was a proof-of-the-concept only. It is not maintained anymore, it was superseded by Middleware_issues_and_solutions.


The purpose of this page is brief description and indexing of issues solved within DMSU that are likely to have broader impact on EGI Operations therefore it is worth to gather digests, to outline workarounds, to provide pointers to furhter details.

Jul 17, 2011

VOMS server fails with high number of VOs

VOMS server of gLite 3.2 is more memory greedy, it starts failing when configured to serve more than 10 (approx.) VOs.

Change -XX:MaxPermSize parameter of CATALINA_OPTS to the value of at least 512m in /etc/tomcat5/tomcat5.conf

 CATALINA_OPTS="-Xmx1508M -server -Dsun.net.client.defaultReadTimeout=240000 -XX:MaxPermSize=512m"

and add

 * soft nofile 2048
 * hard nofile 2048

into /etc/security/limits.conf.

GGUS ticket #72136


EMI GFAL does not work with multiple BDIIs

Unlike previous versions released in gLite, lcg-util 1.11.18 released with EMI 1.0.0 don't support setting multiple BDII endpoints in the LCG_GFAL_INFOSYS variable.

There is no known workaround besides using just one BDII.

GGUS ticket #72196

CREAM CE with SGE leaks memory

BUpdaterSGE released with gLite update #30 leaks memory. A workaround is restarting blahparser at least once a day through cron:

 /opt/glite/etc/init.d/glite-ce-blahparser restart

A fix is attached with #72494 before it makes through the official release path.

GGUS tickets #72494 #72325

Jun 30, 2011

Growing LB database

Cron job responsible for regular purging of LB database fails in glite 3.2, yielding indefinite growth of the database.

The problem is result of wrong default setting of the purge program agruments (a bug present since long time), which are not overriden by YAIM in these versions.

Fix is available with lb-client 5.0.5-1 released with EMI-1.

A workaround is this modification to /opt/glite/sbin/glite-lb-export.sh

 @@ -55,7 +55,7 @@
  # directory with exported data (file per job)
  GLITE_LB_EXPORT_JOBSDIR=${GLITE_LB_EXPORT_JOBSDIR:-$GLITE_LOCATION_VAR/lbexport}
  # purge args (timeouts)
 -GLITE_LB_EXPORT_PURGE_ARGS=${GLITE_LB_EXPORT_PURGE_ARGS:---cleared 2d --aborted 2w --cancelled 2w --other 60d}
 +GLITE_LB_EXPORT_PURGE_ARGS=${GLITE_LB_EXPORT_PURGE_ARGS:---cleared 2d --aborted 15d --cancelled 15d --other 60d}
  # Book Keeping Server
  GLITE_LB_SERVER_PORT=${GLITE_LB_SERVER_PORT:-9000}
  GLITE_LB_EXPORT_BKSERVER=${GLITE_LB_EXPORT_BKSERVER:-localhost:$GLITE_LB_SERVER_PORT}
 

GGUS ticket #67151


Jun 28, 2011

Insufficient heuristics in reversing DN

Jobs submitted to CREAM through WMS get aborted with the reason like:

 Transfer to CREAM failed due to exception: Failed to create a delegation id
 for job https://grid-lb3.desy.de:9000/1RMsuRv7r8Whlgr41N7enA: reason is Client
 'O=GermanGrid,OU=DESY,CN=Alexander Fomenko' is not issuer of proxy
 'O=GermanGrid,OU=DESY,CN=Alexander Fomenko,CN=proxy,CN=proxy'

(visible through glite-wms-job-status)

This hapens on DN's starting with "O=" and is due to bug #83426 in trustmanager component.

GGUS tickets #71434, #71436

logrotate not really rotating logs for ARC components

The daemons keep writing to rotated (with .1 suffix files).

It's known bug in current ARC release, going to be fixed in the next one. A workaround is sending SIGHUP to A-REX service and restarting gridftp after rotating the files.

GGUS #71901

Failed Nagios probes of ARC CE make it appear unavailable

The problem is caused by stale jobid file left by a failed test which prevents the Nagios probe to submit a new job.

The problem seriously affects the site state in Gridview, hence also the computed site availability metrics.

Currently the responsibility for the probe is unclear, EMI/ARC team refuses to fix it.

GGUS #70997