Difference between revisions of "EGI-InSPIRE:DMSU digests"
m (moved InSPIRE-SA2:DMSU digests to EGI-InSPIRE:DMSU digests) |
|||
Line 1: | Line 1: | ||
{{Template:EGI-Inspire menubar}} | |||
{{Template:Inspire_reports_menubar}} | |||
{{TOC_right}} | |||
'''This page was a proof-of-the-concept only. It is not maintained anymore, it was superseded by [[Middleware_issues_and_solutions]].''' | '''This page was a proof-of-the-concept only. It is not maintained anymore, it was superseded by [[Middleware_issues_and_solutions]].''' | ||
Latest revision as of 19:26, 9 January 2015
EGI Inspire Main page |
Inspire reports menu: | Home • | SA1 weekly Reports • | SA1 Task QR Reports • | NGI QR Reports • | NGI QR User support Reports |
This page was a proof-of-the-concept only. It is not maintained anymore, it was superseded by Middleware_issues_and_solutions.
The purpose of this page is
brief description and indexing of issues solved within DMSU
that are likely to have broader impact on EGI Operations
therefore it is worth to gather digests,
to outline workarounds, to provide pointers to furhter details.
Jul 17, 2011
VOMS server fails with high number of VOs
VOMS server of gLite 3.2 is more memory greedy, it starts failing when configured to serve more than 10 (approx.) VOs.
Change -XX:MaxPermSize parameter of CATALINA_OPTS to the value of at least 512m in /etc/tomcat5/tomcat5.conf
CATALINA_OPTS="-Xmx1508M -server -Dsun.net.client.defaultReadTimeout=240000 -XX:MaxPermSize=512m"
and add
* soft nofile 2048 * hard nofile 2048
into /etc/security/limits.conf.
GGUS ticket #72136
EMI GFAL does not work with multiple BDIIs
Unlike previous versions released in gLite, lcg-util 1.11.18 released with EMI 1.0.0 don't support setting multiple BDII endpoints in the LCG_GFAL_INFOSYS variable.
There is no known workaround besides using just one BDII.
GGUS ticket #72196
CREAM CE with SGE leaks memory
BUpdaterSGE released with gLite update #30 leaks memory. A workaround is restarting blahparser at least once a day through cron:
/opt/glite/etc/init.d/glite-ce-blahparser restart
A fix is attached with #72494 before it makes through the official release path.
Jun 30, 2011
Growing LB database
Cron job responsible for regular purging of LB database fails in glite 3.2, yielding indefinite growth of the database.
The problem is result of wrong default setting of the purge program agruments (a bug present since long time), which are not overriden by YAIM in these versions.
Fix is available with lb-client 5.0.5-1 released with EMI-1.
A workaround is this modification to /opt/glite/sbin/glite-lb-export.sh
@@ -55,7 +55,7 @@ # directory with exported data (file per job) GLITE_LB_EXPORT_JOBSDIR=${GLITE_LB_EXPORT_JOBSDIR:-$GLITE_LOCATION_VAR/lbexport} # purge args (timeouts) -GLITE_LB_EXPORT_PURGE_ARGS=${GLITE_LB_EXPORT_PURGE_ARGS:---cleared 2d --aborted 2w --cancelled 2w --other 60d} +GLITE_LB_EXPORT_PURGE_ARGS=${GLITE_LB_EXPORT_PURGE_ARGS:---cleared 2d --aborted 15d --cancelled 15d --other 60d} # Book Keeping Server GLITE_LB_SERVER_PORT=${GLITE_LB_SERVER_PORT:-9000} GLITE_LB_EXPORT_BKSERVER=${GLITE_LB_EXPORT_BKSERVER:-localhost:$GLITE_LB_SERVER_PORT}
GGUS ticket #67151
Jun 28, 2011
Insufficient heuristics in reversing DN
Jobs submitted to CREAM through WMS get aborted with the reason like:
Transfer to CREAM failed due to exception: Failed to create a delegation id for job https://grid-lb3.desy.de:9000/1RMsuRv7r8Whlgr41N7enA: reason is Client 'O=GermanGrid,OU=DESY,CN=Alexander Fomenko' is not issuer of proxy 'O=GermanGrid,OU=DESY,CN=Alexander Fomenko,CN=proxy,CN=proxy'
(visible through glite-wms-job-status)
This hapens on DN's starting with "O=" and is due to bug #83426 in trustmanager component.
logrotate not really rotating logs for ARC components
The daemons keep writing to rotated (with .1 suffix files).
It's known bug in current ARC release, going to be fixed in the next one. A workaround is sending SIGHUP to A-REX service and restarting gridftp after rotating the files.
GGUS #71901
The problem is caused by stale jobid file left by a failed test which prevents the Nagios probe to submit a new job.
The problem seriously affects the site state in Gridview, hence also the computed site availability metrics.
Currently the responsibility for the probe is unclear, EMI/ARC team refuses to fix it.
GGUS #70997