Information System Open Issues

From EGIWiki
Revision as of 10:33, 16 November 2011 by Apaolini (talk | contribs)
Jump to: navigation, search

List of current problems affecting BDII and the Information System in general

Known Issues

Known issues affecting the BDII version 1.0.1

Frequent Problems

  • Performance/Load issues: memory leak, stuck processes
    • often solved by installing openldap2.4, but some of them have to be well investigated
    • openldap2.4 isn't installed by default yet
    • Recommendation for site-bdii:
      • small/medium sites: at least one instance with 2 cores, 10GB Hard disk space, 4 GB RAM, no XEN virtualization (see later).
      • medium/large sites: more than one instance, DNS round robin based. The same setup used for top-bdii is recommended (details at EGI Operations Manuals 5)
  • Past issues:
    • bdii user change from "edguser" to "ldap"
    • enabling the start of bdii daemon and disabling the ldap one when a machine boot

Glue2 Information

  • It isn't still clear when Glue2 will be used (GSTAT still check Glue1.3 information): we don't know how the information are coherent.
    • We should check the information already published and find out any anomalies in advance
  • Operational issue: since Glue2 is case sensitive, if the site-name case published by the site-bdii and the one set in Giis_Url field on GOC-DB is different, that site won't be published by top-BDIIs

Tickets in open status

glite-BDII hangs under heavy network load

GGUS 71578

  • the installation of openldap2.4 should have solved the problem, but the ticket is still "on hold" status because there is a feature request to detect the hanging issue using the status command:

EMI1 top-bdii bug: missing /etc/bdii/gip/glite-info-site-defaults.conf

GGUS 72561

  • Issue still present (it is reported in the Known Issues page), also for the site-bdii; the consequence is that
[root@topbdii01 ~]# cat /var/lib/bdii/gip/provider/glite-info-provider-service-bdii-top
# Check for the existence of the configuration file.
if [ -f ${DEFAULTS} ]; then
source ${DEFAULTS}

SITE_NAME=${SITE_NAME:-$(hostname -d) }
  • it sets the sitename equal to the topbdii hostname, that is wrong
    • this causes critical errors in GSTAT

Ramdisk space usage on top BDII

GGUS 73102

  • the usage of ramdisk space kept increasing and eventually full in about two or three weeks (top BDII on a XEN virtual machine)
    • adding dncachesize into /etc/bdii/bdii-top-slapd.conf and setting it to double of cachesize makes slower the memory usage increment, but it doesn't limit it

bdii crash

GGUS 73406

  • another performance issue solved by installing openldap2.4
    • asked EMI to consider this version the default (it isn't yet)

emi-bdii-top-1.0.0-1.sl5.x86_64 packaging issue

GGUS 73823

2 issues in this ticket:

  1. "chkconfig bdii on" to start up the BDII at boot => solved using the new glite-yaim-bdii, release with EMI 1 Update 6
  2. the file /opt/glite/etc/gip/top-urls.conf still ends up in '/opt/glite/etc' rather than '/etc/'.

It is created in

> rpm -qf /etc/glite/glite-info-update-endpoints.conf

bdii memory leak

GGUS 73840

  • updated to glite-BDII_top 3.2.12 and increased the VM memory to 4 GB. The situation improved but still after few days the memory usage keeps increasing.
    • in the file /etc/bdii/DB_CONFIG suggested to comment out the following lines:
#set_flags DB_LOG_INMEMORY
#set_flags DB_TXN_NOSYNC

and the memory usage was reduced considerably; it isn't still clear which process causes the increase of i/o load

bdii freshness problem

GGUS 76256

  • EMI site-bdii, VMware machine, openldap2.4 installed: nagios freshness check is failing

bdii tmpfs fills up and silently fails

GGUS 76337

  • The tmpfs filesystem used by the top level bdii slowly fills up over time. Once completely full no updates are done, but 'service bdii status' still reports OK.