Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Tools/Manuals/TS108

From EGIWiki
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators



Back to Troubleshooting Guide


Software installation tags not published

Background

For each supported VO "xyz" a site should normally define a variable VO_XYZ_SW_DIR giving the path to an area that is shared between all the worker nodes and available for the VO to install VO-specific software etc. Such an area must be writable only for the software managers (traditionally the "sgm" users) of that VO, who therefore need to be mapped to a set of privileged ("sgm") accounts for that VO.

When a software manager has installed a particular software package, it is normal (but not required) to let each CE of the site publish the availability of that software, such that a user job can be matched to sites that have a particular software installed that is required by the job. The software manager can use the lcg-tags or lcg-ManageVOTag utility to let a CE publish a tag for each distinct software package that is available on that CE, according to conventions mostly defined by the VO itself.

The tags are stored in text files named $vo.list that traditionally are located under /opt/edg/var/info/$vo for LCG-CE and gLite-CREAM services, while /opt/glite/var/info/$subcluster/$vo may be used at sites that have multiple batch system subclusters or have EMI-CREAM services. (A subcluster is a disjoint set of worker nodes that are of sufficiently homogeneous types: same CPU architecture and OS, but the CPU speed, the amount of memory or the disk space per machine might vary.)

When a site has multiple CEs in front of the same batch system subcluster, those CEs ought to share the directories in which tags may be published, because it greatly simplifies the process of installing or removing packages and updating the list of tags correspondingly. Otherwise the software manager will need to loop over the relevant set of CEs and adjust the list of tags for each one separately.

Diagnosis

If the tags are not published for a particular CE, the cause can be:

  • The tag management command was not run for the CE in question. In particular, the site may have multiple CEs that do not share the tag directories.
  • The tag management command failed, e.g. due to incorrect ownership or permissions for the relevant $vo.list file or due to the use of an unprivileged proxy (without "sgm" role) by the software manager. The command would have reported an error in such cases.
  • The tag directory for the VO suffered some problem, e.g. it was not preserved when the CE was re-installed or it sits in a file system that has stale NFS file handles or currently is not mounted.