Difference between revisions of "Information System Open Issues"

From EGIWiki
Jump to: navigation, search
 
(5 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
[[Category: Grid Operations Meetings]]
 
List of current problems affecting BDII and the Information System in general
 
List of current problems affecting BDII and the Information System in general
  
 
== Known Issues ==
 
== Known Issues ==
  
[https://wiki.egi.eu/wiki/UMD-1:UMD-1.3.0#emi.bdii-site.sl5.x86_64 Known issues] affecting the BDII version 1.0.1
+
[[UMD-1:UMD-1.3.0#emi.bdii-site.sl5.x86_64| Known issues]] affecting the BDII version 1.0.1
  
 
== Frequent Problems ==
 
== Frequent Problems ==
Line 12: Line 13:
 
** Recommendation for site-bdii:
 
** Recommendation for site-bdii:
 
*** small/medium sites: at least one instance with 2 cores, 10GB Hard disk space, 4 GB RAM, no XEN virtualization (see later).
 
*** small/medium sites: at least one instance with 2 cores, 10GB Hard disk space, 4 GB RAM, no XEN virtualization (see later).
*** medium/large sites: more than one instance, DNS round robin based. The same setup used for top-bdii is recommended (details at [https://wiki.egi.eu/wiki/MAN05 EGI Operations Manuals 5])
+
*** medium/large sites: more than one instance, DNS round robin based. The same setup used for top-bdii is recommended (details at [[MAN05| EGI Operations Manuals 5]])
 
** Recommendation for top-bdii:
 
** Recommendation for top-bdii:
*** More than one instance, DNS round robin based, details about top-bdii high availability setup at [https://wiki.egi.eu/wiki/MAN05 EGI Operations Manuals 5])
+
*** More than one instance, DNS round robin based, details about top-bdii high availability setup at [[MAN05| EGI Operations Manuals 5]])
 
* Past issues:
 
* Past issues:
 
** bdii user change from "edguser" to "ldap"
 
** bdii user change from "edguser" to "ldap"
Line 93: Line 94:
  
 
*EMI site-bdii, VMware machine, openldap2.4 installed: nagios freshness check is failing
 
*EMI site-bdii, VMware machine, openldap2.4 installed: nagios freshness check is failing
 +
** '''SOLVED''': the time on the VM machine was not synchronized
  
 
=== bdii tmpfs fills up and silently fails ===
 
=== bdii tmpfs fills up and silently fails ===
 
[https://ggus.eu/tech/ticket_show.php?ticket=76337 GGUS 76337]
 
[https://ggus.eu/tech/ticket_show.php?ticket=76337 GGUS 76337]
  
*The tmpfs filesystem used by the top level bdii slowly fills up over time. Once completely full no updates are done, but 'service bdii status' still reports OK.
+
*The tmpfs filesystem used by the top level bdii slowly fills up over time. Once completely full no updates are done, but 'service bdii status' still reports OK, but the org.bdii.Freshness check report a problem. This probe has been recently added to the site-bdii instance probes used in the SAM framework.

Latest revision as of 11:49, 20 December 2012

List of current problems affecting BDII and the Information System in general

Known Issues

Known issues affecting the BDII version 1.0.1

Frequent Problems

  • Performance/Load issues: memory leak, stuck processes
    • often solved by installing openldap2.4, but some of them have to be well investigated
    • openldap2.4 isn't installed by default yet
    • Recommendation for site-bdii:
      • small/medium sites: at least one instance with 2 cores, 10GB Hard disk space, 4 GB RAM, no XEN virtualization (see later).
      • medium/large sites: more than one instance, DNS round robin based. The same setup used for top-bdii is recommended (details at EGI Operations Manuals 5)
    • Recommendation for top-bdii:
  • Past issues:
    • bdii user change from "edguser" to "ldap"
    • enabling the start of bdii daemon and disabling the ldap one when a machine boot

Glue2 Information

  • It isn't still clear when Glue2 will be used (GSTAT still check Glue1.3 information): we don't know how the information are coherent.
    • We should check the information already published and find out any anomalies in advance
  • Operational issue: since Glue2 is case sensitive, if the site-name case published by the site-bdii and the one set in Giis_Url field on GOC-DB is different, that site won't be published by top-BDIIs

Tickets in open status

glite-BDII hangs under heavy network load

GGUS 71578

  • the installation of openldap2.4 should have solved the problem, but the ticket is still "on hold" status because there is a feature request to detect the hanging issue using the status command:

EMI1 top-bdii bug: missing /etc/bdii/gip/glite-info-site-defaults.conf

GGUS 72561

  • Issue still present (it is reported in the Known Issues page), also for the site-bdii; the consequence is that
[root@topbdii01 ~]# cat /var/lib/bdii/gip/provider/glite-info-provider-service-bdii-top
....
DEFAULTS=/etc/bdii/gip/glite-info-site-defaults.conf
...
# Check for the existence of the configuration file.
if [ -f ${DEFAULTS} ]; then
source ${DEFAULTS}
fi

SITE_NAME=${SITE_NAME:-$(hostname -d) }
....
  • it sets the sitename equal to the topbdii hostname, that is wrong
    • this causes critical errors in GSTAT

Ramdisk space usage on top BDII

GGUS 73102

  • the usage of ramdisk space kept increasing and eventually full in about two or three weeks (top BDII on a XEN virtual machine)
    • adding dncachesize into /etc/bdii/bdii-top-slapd.conf and setting it to double of cachesize makes slower the memory usage increment, but it doesn't limit it

bdii crash

GGUS 73406

  • another performance issue solved by installing openldap2.4
    • asked EMI to consider this version the default (it isn't yet)

emi-bdii-top-1.0.0-1.sl5.x86_64 packaging issue

GGUS 73823

2 issues in this ticket:

  1. "chkconfig bdii on" to start up the BDII at boot => solved using the new glite-yaim-bdii, release with EMI 1 Update 6
  2. the file /opt/glite/etc/gip/top-urls.conf still ends up in '/opt/glite/etc' rather than '/etc/'.

It is created in

/etc/glite/glite-info-update-endpoints.conf
> rpm -qf /etc/glite/glite-info-update-endpoints.conf
glite-info-update-endpoints-2.0.7-1.el5.noarch


bdii memory leak

GGUS 73840

  • updated to glite-BDII_top 3.2.12 and increased the VM memory to 4 GB. The situation improved but still after few days the memory usage keeps increasing.
    • in the file /etc/bdii/DB_CONFIG suggested to comment out the following lines:
#set_flags DB_LOG_INMEMORY
#set_flags DB_TXN_NOSYNC

and the memory usage was reduced considerably; it isn't still clear which process causes the increase of i/o load

bdii freshness problem

GGUS 76256

  • EMI site-bdii, VMware machine, openldap2.4 installed: nagios freshness check is failing
    • SOLVED: the time on the VM machine was not synchronized

bdii tmpfs fills up and silently fails

GGUS 76337

  • The tmpfs filesystem used by the top level bdii slowly fills up over time. Once completely full no updates are done, but 'service bdii status' still reports OK, but the org.bdii.Freshness check report a problem. This probe has been recently added to the site-bdii instance probes used in the SAM framework.