Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

KEDB

From EGIWiki
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


This page provides a central database for Known Errors, namely identified problems for which an underlying cause has been identified already and a workaround is available. Known errors are shared on EGI wiki for the following reasons:

  • known errors are tracked here for use by HelpDesk team, espacially 1st and 2nd line support, so that known issues can be referenced on new GGUS tickets reporting correlated incidents
  • the HelpDesk team can suggest new known errors and add them to the shared KEDB and make use of them in case of shifts
  • users, VO members, RC/OC operators can be referenced with known errors a slong as they are in use
  • workarounds can be referenced/reported together with each known error (the known error will be included in this page even if no workarounds are available yet)
  • an ID is provided in order to identify easily the known error
  • a quick reference to an incident is always associated to a known error, in order to switch to a concrete example of incident related to the known err

This page can host also problems who root cause is known, but no workaround is yet available, in order to proactively provide a reference to the EGI HelpDesk.

Known errors and corresponding workarounds are also provided:

  • by Technology Providers in the Release Notes of their products,
  • by service providers that are part of EGI in the documentation provided for their services

For these cases, the references are not provided in the Central database for Known Errors and no other central source of information, as aligning the original source and the central replication of information would lead to synchronisation issues and useless information maintenance overload. Instead, references to direct sources are provided in the Product Documentation Guidepost.

  • Each KE is flagged as OPEN if no workarounds are available and no solution is provided.
  • If workaround/mitigation is in place, but no solution has been provided, it is marked as WORKAROUND.
  • If solution is provided, it is flagged as SOLVED.
  • From time to time Known Errors are moved below in the Archived Known Errors section as soon as we are 100% sure they are not affecting anymore the infrastructure. Then we can say that the problem is completely "closed".

Updating the KEDB:

Please use the following template (just copy/paste):

== [$CREATION-DATE-(YYYY/MM/DD)] $TITLE (SOLVED/OPEN) ==
*'''Services affected:''' 
*'''Middleware products:'''
*'''Entities impacted:''' 
*'''Description:''' 
*'''References:''' 
*'''Mitigations/Workarounds:''' 
*'''Solution:''' 


Known Errors not solved yet

Date Name Services affected Middleware products Entities impacted Description References Mitigations/Workarounds Solution Status
2020-07-13 xrootd-voms mishandles DN and VOMS attributes Online Storage xrootd, DPM users, Resource Centres VOMS module in xrootd-voms package which comes with xrootd>=4.12.2 changes space characters in DN and VOMS atrributes to tabulators, it can lead to various permission problems and creation of unexpected new user entries in DPM offering an xrootd endpoint GGUS 147668 use older version with vomsxrd fixed with xrootd 4.12.3 released in EPEL, not yet in UMD open
2018-07-12 CREAM CE fails to start after updating canl-java and bouncycastle High-Throughput Compute CREAM-CE users, Resource Centres problem with the new bouncycastle and can-java versions; bug in the yaim script. Ticket closed as "unsolved" in Feb 2020 since CREAM-CE is EOL. GGUS 136074 CREAM-CE 1.16.7 release notes

The update process on Scientific Linux 6 requires the following additional steps:

  • After the update it is necessary to remove the packages bouncycastle-mail and bouncycastle (version 1.46).
  • Any broken link in /var/lib/tomcat6/webapps/ce-cream/WEB-INF/lib and /usr/share/tomcat6/lib must be manually deleted.
  • If YAIM is used for the configuration it is necessary to create a symbolic link from /usr/share/java/bcprov-1.58.jar into /usr/share/java/bcprov.jar and re-running the configurator.
the problem with bouncycastle and canl-java has been addressed in UMD 4.7.1.

the bug in the yaim script hasn't fixed yet.

workaround

[2020-07-13] xrootd-voms mishandles user certificate DN and VOMS attributes (OPEN)

  • Services affected: Online Storage
  • Middleware products: xrootd, DPM
  • Entities impacted: users, Resource Centres
  • Description: VOMS module in xrootd-voms package which comes with xrootd>=4.12.2 changes space characters in DN and VOMS atrributes to tabulators, it can lead to various permission problems and creation of unexpected new user entries in DPM offering an xrootd endpoint
  • References: GGUS 147668
  • Mitigations/Workarounds: use older version with vomsxrd
  • Solution: fixed with xrootd 4.12.3 released in EPEL, not yet in UMD

Archived Known Errors

Date Name Services affected Middleware products Entities impacted Description References Mitigations/Workarounds Solution Status
2019-09-01 BDII udpate & base64 encoded ldif entries High-Throughput Compute, Online Storage, Archive Storage BDII users, Resource Centres bdii-update does not handle base64-encoded values in LDIF correctly, non-ASCCII characters or passwords invoking mandatory base64 use are not present in BDII information normally, but bdii-update should handle it anyway GGUS 142928 n/a the fix https://github.com/EGI-Foundation/bdii/pull/21 has been released in UMD 4.12.3 SOLVED
2019-07-25 CREAM-CE at CentOS 7: could not create connection to database server High-Throughput Compute CREAM-CE Resource Centres Puppet recipe for CREAM-CE requests presence of DB server, but does not handle its configuration fully (even the parameters set by Puppet on the CREAM side), manual checking and adjustment may be necessary GGUS 142425 setting in /etc/glite-ce-cream/cream-config.xml: url="jdbc:mysql://localhost:3306/..."

instead of: url="jdbc:mysql://lgdce01.jinr.ru:3306/..."

setting in /etc/glite-ce-cream/cream-config.xml: url="jdbc:mysql://localhost:3306/..."

instead of: url="jdbc:mysql://lgdce01.jinr.ru:3306/..."

SOLVED
2018-05-24 Incompatible updates in EPEL break CREAM and voms-clients-java High-Throughput Compute CREAM, WN, UI users, Resource Centres EPEL has published an incompatible updates of voms-api-java and canl-java packages, new versions of packages relying on them are not available in UMD repository yet GGUS 135307 GGUS 135414 CREAM-186 configure "exclude=voms-api-java* canl-java*" for EPEL repository, keep (rollback to) voms-api-java and canl-java from UMD repository until new UMD release using changed EPEL packages is available released into UMD http://repository.egi.eu/2018/07/24/release-umd-4-7-1/ SOLVED
2017-04-12 CREAM-CE connection problems in SLC 6.9 High-Throughput Compute CREAM in UMD3/UMD4 for SL6 CREAM based Resource Centres connection error after upgrade to latest packages GGUS 127656 upgrade Java to java-1.7.0-openjdk 1.7.0.95 upgrade Java to java-1.7.0-openjdk 1.7.0.95 SOLVED
2017-02-10 apt returns "Unable to find expected entry 'main/binary-i386/Packages'on CMD-OS for Trusty CMD-OS 1.0.0 for Ubuntu Trusty Resource Centres using CMD-OS (OpenStack based cloud sites) sources.list distributed with cmd-os-release makes apt default to i386 instead of amd64, preventing from fetching the correct repositories; as a consequence apt cannot fetch Ubuntu Trusty CMD-OS repository and returns "Unable to find expected entry 'main/binary-i386/Packages' in Release file (Wrong sources.list entry or malformed file)" Email reported to UMD team list Add [arch=amd64] manually to repo line in sources.list: released in CMD-OS 1.1.1 SOLVED
2016-12-13 Services using JGlobus fail with RFC proxies from certificates from some CAs Online Storage dCache < v2.14, BeStMan Services using JGlobus fail with RFC proxies having Non-Repudiation key usage flag set, e.g. those created by usual voms-proxy-init from Grid Canada certificate GGUS 124650 two stage proxy (plain RFC proxy, then voms-proxy-init -noregen) works dCache>=2.14, unknown for BeStMan; problem can be archived after July 2017, when dCache 2.13 will be considered out of production and >=2.16 mandatory for the overall EGI infrastructure SOLVED
2016-10-10 canL upgrade of UMD 3.14.4 and UMD 4.2.1 can break proxy renewal on CREAM High-Throughput Compute CREAM All VOs using proxy renewal on CREAM after canL upgrade CREAM services can stop submitting jobs using proxy renewal UMD 4.2.1 post-mortem CNAF CE WLCG meeting In order to minimize impact of disruptions by upgrades, RCs should use minimal installation of the OS in order to minimize conflicts, and not upgrade all the services at the same time (if there are several instances running for the same service); also not upgrade all the services automatically remove dracut-fips packagebefore upgrading SOLVED
2016-09-06 clock skew on client FedCloud UI causes authentication problems on well-working sites FedCloud UI HOWTO11 All users using FedCloud command line tools HOWTO11 the problems are observed when (re)using a VM used as UI to access the FedCloud resources; restarting a VM from a snapshot/suspension can very easily bring to unsynced CRL and clock skew on the VM GGUS 119839 GGUS 120343 GGUS 123580 GGUS 125530 To mitigate, we advise users to run fetch-crl and ntpdate pool.ntp.org, and usually everything gets fixed (if the root causes are CRL+clock). As a workaround, run fetch-crl at start time and every 6 hours and install and configure ntp to avoid clock skews Fetch CRLs is done at the moment of installation. fetch-crl package includes a cron for updating them. In any case the UI already has this to run on boot. For clock-skew, the image needed ntpd, it has been added to the image usually users use to start the VM. Eventually all running VMs will have fetch-crl and ntp correctly configured. SOLVED
2016-06-29 umd-release-3.0.1 rpm has stopped working, preventing new installation of the UMD3 Validated Software and Repository UMD 3.14.2 All services based on UMD3 umd-release packaged, used for adding the UMD3 repository to an existing installation, is not working anymore GGUS 122424 yum can be invoked with the "--nogpgcheck" new versions of umd-release shipped with UMD 3.14.7 and UMD 4.3.2 fix this both for new installations and updates SOLVED

[2019-09-01] BDII update & base64 encoded ldif entries (SOLVED)

  • Services affected:
  • Middleware products: BDII
  • Entities impacted: users, Resource Centres
  • Description: bdii-update does not handle base64-encoded values in LDIF correctly, non-ASCII characters or passwords invoking mandatory base64 use are not present in BDII information normally, but bdii-update should handle it anyway
  • References: GGUS 142928
  • Mitigations/Workarounds: n/a
  • Solution: the fix https://github.com/EGI-Foundation/bdii/pull/21 has been released in UMD 4.12.3

[2019-07-25 CREAM-CE] at CentOS 7: could not create connection to database server

  • Services affected: High-Throughput Compute
  • Middleware products: CREAM
  • Entities impacted: Resource Centres
  • Description: Puppet recipe for CREAM-CE requests presence of DB server, but does not handle its configuration fully (even the parameters set by Puppet on the CREAM side), manual checking and adjustment may be necessary
  • References: GGUS 142425
  • Mitigations/Workarounds: setting in /etc/glite-ce-cream/cream-config.xml: url="jdbc:mysql://localhost:3306/..."

instead of: url="jdbc:mysql://lgdce01.jinr.ru:3306/..."

  • Solution: setting in /etc/glite-ce-cream/cream-config.xml: url="jdbc:mysql://localhost:3306/..."

instead of: url="jdbc:mysql://lgdce01.jinr.ru:3306/..."

[2018-05-24] Incompatible update in EPEL breaks CREAM and voms-clients (SOLVED)

  • Services affected: High-Throughput Compute
  • Middleware products: CREAM, WN, UI
  • Entities impacted: users, Resource Centres
  • Description: EPEL has published an incompatible updates of voms-api-java and canl-java packages, new versions of packages relying on them are not available in UMD repository yet
  • References: GGUS 135307 GGUS 135414 CREAM-186
  • Mitigations/Workarounds: configure "exclude=voms-api-java* canl-java*" for EPEL repository, keep (rollback to) voms-api-java and canl-java from UMD repository until new UMD release using changed EPEL packages is available
  • Solution: released into UMD http://repository.egi.eu/2018/07/24/release-umd-4-7-1/

[2017-04-12] CREAM-CE connection problems in SLC 6.9 (SOLVED)

  • Services affected: High-Throughput Compute
  • Middleware products: CREAM in UMD3/UMD4 for SL6
  • Entities impacted: CREAM based Resource Centres
  • Description: connection error after upgrade to latest packages
$ glite-ce-job-submit -a -r cream-ce.kipt.kharkov.ua:8443/cream-pbs-cms test.jdl
2017-04-10 20:37:36,236 FATAL - Connection to service [https://cream-ce.kipt.kharkov.ua:8443/ce-cream/services/gridsite-delegation] failed: FaultString=[SSL error] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] - FaultDetail= [SSL authentication failedin tcp_connect(): check password, key file, and ca file.]

$ openssl s_client -connect cream-ce.kipt.kharkov.ua:8443
140600093460296:error:14082174:SSLroutines:SSL3_CHECK_CERT_AND_ALGORITHM:dh key too small:s3_clnt.c:3345:
Server Temp Key: DH, 768 bits

[2017-02-10] apt returns "Unable to find expected entry 'main/binary-i386/Packages'on CMD-OS for Trusty (SOLVED)

  • Services affected: Cloud Compute
  • Middleware products: CMD-OS 1.0.0 for Ubuntu Trusty
  • Entities impacted: Resource Centres using CMD-OS (OpenStack based cloud sites)
  • Description: sources.list distributed with cmd-os-release makes apt default to i386 instead of amd64, preventing from fetching the correct repositories; as a consequence apt cannot fetch Ubuntu Trusty CMD-OS repository and returns "Unable to find expected entry 'main/binary-i386/Packages' in Release file (Wrong sources.list entry or malformed file)"
  • References: Email reported to UMD team list
  • Mitigations/Workarounds: Add [arch=amd64] manually to repo line in sources.list:
 $ cat CMD-OS-1-base.list

    # CMD-OS-1-base

    deb [arch=amd64]
    [http://repository.egi.eu/sw/production/cmd-os/1/ubuntu/ http://repository.egi.eu/sw/<wbr></wbr>production/cmd-os/1/ubuntu/] trusty main

[2016-12-13] Services using JGlobus fail with RFC proxies from certificates from some CAs (SOLVED)

  • Services affected: Online Storage
  • Middleware products: dCache < v2.14, BeStMan
  • Entities impacted:
  • Description: Services using JGlobus fail with RFC proxies having Non-Repudiation key usage flag set, e.g. those created by usual voms-proxy-init from Grid Canada certificate
  • References: https://ggus.eu/?mode=ticket_info&ticket_id=124650
  • Mitigations/Workarounds: two stage proxy (plain RFC proxy, then voms-proxy-init -noregen) works
  • Solution: dCache>=2.14, unknown for BeStMan; problem can be archived after July 2017, when dCache 2.13 will be considered out of production and >=2.16 mandatory for the overall EGI infrastructure

[2016-10-10] canL upgrade of UMD 3.14.4 and UMD 4.2.1 can break proxy renewal on CREAM (SOLVED)

[2016-09-06] clock skew on client FedCloud UI causes authentication problems on well-working sites (SOLVED)

  • Services affected: Cloud Compute
  • Middleware products: FedCloud UI https://wiki.egi.eu/wiki/HOWTO11
  • Entities impacted: All users using FedCloud command line tools https://wiki.egi.eu/wiki/HOWTO11
  • Description: the problems are observed when (re)using a VM used as UI to access the FedCloud resources; restarting a VM from a snapshot/suspension can very easily bring to unsynced CRL and clock skew on the VM
  • References: This problem was recurring:

https://ggus.eu/index.php?mode=ticket_info&ticket_id=119839 https://ggus.eu/index.php?mode=ticket_info&ticket_id=120343 https://ggus.eu/index.php?mode=ticket_info&ticket_id=123580 https://ggus.eu/index.php?mode=ticket_info&ticket_id=125530

As a workaround, run fetch-crl at start time and every 6 hours and install and configure ntp to avoid clock skews

  • Solution: Fetch CRLs is done at the moment of installation. fetch-crl package includes a cron for updating them. In any case the UI already has this to run on boot. For clock-skew, the image needed ntpd, it has been added to the image usually users use to start the VM. Eventually all running VMs will have fetch-crl and ntp correctly configured.

[2016-06-29] umd-release-3.0.1 rpm has stopped working, preventing new installation of the UMD3 (SOLVED)

  • Services affected: Validated Software and Repository
  • Middleware products: UMD 3.14.2
  • Entities impacted: All services based on UMD3
  • Description: umd-release packaged, used for adding the UMD3 repository to an existing installation, is not working anymore
  • References: https://ggus.eu/?mode=ticket_info&ticket_id=122424
  • Mitigations/Workarounds: yum can be invoked with the "--nogpgcheck"
  • Solution: new versions of umd-release shipped with UMD 3.14.7 and UMD 4.3.2 fix this both for new installations and updates