FAQ VO Service Availability Monitoring
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
Frequently Asked Questions & Troubleshooting
General
Before proceeding, check if your question is not already answered in
Instalation
perl-DBD-MySQL dependency problems?
For installation problems related with perl-DBD-MySQL package, check that you have the repository priorities set as explained in the release notes. The proper package should be fetched from RPM FORGE EXTRA repository: RPM FORGE repository
perl-SOAP-Lite dependency problems?
Fri Mar 11 13:13:54 CET 2011 : Can't locate Class/Inspector.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /usr/lib/perl5/vendor_perl/5.8.8/SOAP/Lite.pm line 435. Fri Mar 11 13:13:54 CET 2011 : BEGIN failed--compilation aborted at /usr/lib/perl5/vendor_perl/5.8.8/SOAP/Lite.pm line 435. Fri Mar 11 13:13:54 CET 2011 : Compilation failed in require at /usr/bin/voms2htpasswd line 12. Fri Mar 11 13:13:54 CET 2011 : BEGIN failed--compilation aborted at /usr/bin/voms2htpasswd line 12.
Please check if you have "perl-Class-Inspector" installed.
$yum install 'perl(Class::Inspector)
Configuration
How to monitor more than one VO
Include a white space separated VO list in your YAIM configuration file for
VO="vo1 vo2" NCG_VO="vo1 vo2"
and the VOMS_SERVERS, VOMSES, VOMS_CA:DN and WMS_HOSTS for each VO
# VOMS server definition for vo1 VO_<vo1>_VOMS_SERVERS="'vomss://voms.your.domain:8443/voms/vo1?/vo1/'" # VOMSES server definition for vo1 VO_<vo1>_VOMSES="'vo1 voms.your.domain 15000 /C=PT/O=LIPCA/O=LIP/OU=Lisboa/CN=voms.your.domain vo1'" # DN of the CA which issued the VOMS Certificate VO_<vo1>_VOMS_CA_DN="/C=PT/O=LIPCA/CN=LIP Certification Authority" # WMS used to submit jobs to vo1 VO_<vo1>_EU_WMS_HOSTS="wms.your.domain" # VOMS server definition for vo1 VO_<vo2>_VOMS_SERVERS="'vomss://voms.your.domain:8443/voms/vo2?/vo2/'" # VOMSES server definition for vo1 VO_<vo2>_VOMSES="'vo2 voms.your.domain 15001 /C=PT/O=LIPCA/O=LIP/OU=Lisboa/CN=voms.your.domain vo2'" # DN of the CA which issued the VOMS Certificate VO_<vo2>_VOMS_CA_DN="/C=PT/O=LIPCA/CN=LIP Certification Authority" # WMS used to submit jobs to vo2 VO_<vo2>_EU_WMS_HOSTS="wms.your.domain"
Can I start 2 different proxies to submit jobs to the different VOs?
Yes. You can have different proxy for each VO. Just use different user certificate when creating MyProxy credential. For example:
# For vo1 $ export X509_USER_CERT=~/.globus/usercert-vo1.pem $ export X509_USER_KEY=~/.globus/userkey-vo1.pem $ myproxy-init -l nagios -s $PX_HOST -k NagiosRetrieve-NAGIOS_HOSTNAME-vo1 -c 1000 -x -Z "NAGIOS_HOSTNAME_DN" # For vo2 $ export X509_USER_CERT=~/.globus/usercert-vo2.pem $ export X509_USER_KEY=~/.globus/userkey-vo2.pem $ myproxy-init -l nagios -s $PX_HOST -k NagiosRetrieve-NAGIOS_HOSTNAME-vo2 -c 1000 -x -Z "NAGIOS_HOSTNAME_DN"
Same principle applies to any other VO supported by that instance. Of course you can use the same user cert if it is member of multiple VOs. Easier solution would be to use robot certificates.
Can a catch-all VO SAM provide a dedicated VO view?
Nagios web interface was never about obvious presentation. However, there is the service group view where NCG generates service group aggregating all VO dependent checks for each VO. For example:
Can I configure VO SAM to use a unique LFC and central SE for all VOs?
Yes. Include the following definitions in your YAIM configuration variables. Implicitly there is the assumption that the unique LFC and central SE do support all monitored VOs.
# LFC and SE definitions JOBSUBMIT_WN_LFC=lfc-allvos.my.domain JOBSUBMIT_WN_SE_REP=se-allvos.my.domain
Can I configure VO dependent LFCs and central SEs in a VO SAM?
There is a way to do this, though slightly more complicated. Make sure that you don't have line like anywhere in localdb:
/etc/ncg/ncg-localdb.d/jobsubmit:MODIFY_METRIC_PARAMETER!org.sam.CREAMCE-JobState!--wn-lfc!lfc.my.domain /etc/ncg/ncg-localdb.d/jobsubmit:MODIFY_METRIC_PARAMETER!org.sam.CE-JobState!--wn-lfc!lfc.my.domain /etc/ncg/ncg-localdb.d/jobsubmit:MODIFY_METRIC_PARAMETER!org.sam.CREAMCE-JobState!--wn-se-rep!se.my.domain /etc/ncg/ncg-localdb.d/jobsubmit:MODIFY_METRIC_PARAMETER!org.sam.CE-JobState!--wn-se-rep!se.my.domain
anywhere in /etc/ncg/*localdb*, and put the following instead:
VO_ATTRIBUTE!vo1!WN_SE_REP!se-vo1.my.domain VO_ATTRIBUTE!vo2!WN_SE_REP!se-vo2.my.domain MODIFY_METRIC_ATTRIBUTE!org.sam.CE-JobState!WN_SE_REP!--wn-se-rep VO_ATTRIBUTE!vo1!WN_LFC!lfc-vo1.my.domain VO_ATTRIBUTE!vo2!WN_LFC!lfc-vo2.my.domain MODIFY_METRIC_ATTRIBUTE!org.sam.CE-JobState!WN_LFC!--wn-lfc
How can I provide access to non-allowed member?
At configuration time, and dependending of the NCG_ROLE selected, different users may not have the same permissions to access to the VO SAM services. To enable permission of a given user, one can add the user DN to /etc/voms2htpasswd-static.d/YAIM-ops-monitor.conf
How to switch off importing admin DNs ?
# switch off importing admin DNs (see SAM-1434) (optional variable) NCG_CONTACTS_USE_GOCDB=false
How do I run VO SAM tests with a specific FQAN ?
You can set
VO_ENMR_EU_NCG_DEFAULT_VO_FQAN="YOUR FQAN"
in your yaim configuration file, and start your proxy as normal
$ export X509_USER_CERT=~/.globus/usercert-vo1.pem $ export X509_USER_KEY=~/.globus/userkey-vo1.pem $ myproxy-init -l nagios -s $PX_HOST -k NagiosRetrieve-NAGIOS_HOSTNAME-vo1 -c 1000 -x -Z "NAGIOS_HOSTNAME_DN"
How to change the email notification header?
# optional - change of notification header (SAM-1130): NCG_NOTIFICATION_HEADER="YOUR HEADER"
How to enable the use of ROBOT certificates?
# optional - use of robot certificates (SAM-1180): NCG_USE_ROBOT_CERT=true # Robot cert and key can be different for each VO # and standard Yaim VO notation is used VO_OPS_ROBOT_CERT=/etc/nagios/globus/robot-cert.pem VO_OPS_ROBOT_KEY=/etc/nagios/globus/robot-key.pem
How to monitor uncertified sites?
To monitor uncertified sites, you will need to use a dedicated TopBDII (with the information of those sites) and a dedicated WMS. The list of uncertified sites to be monitored should also be listed:
# optional - add uncertified gLite sites (SAM-1143) UNCERTIFIED_SITES="SiteA SiteB SiteC" UNCERTIFIED_WMS=wms.uncert.org UNCERTIFIED_BDII=bdii.uncert.org
How to check host checks off/on?
# switch host checks off/on (see SAM-1173) (optional variable) NCG_CHECK_HOSTS=1
My VO is global, do I still have to define a list of NGIs for my VO instance?
No. You can define
NCG_GOCDB_ROC_NAME=ALL
However this approach is not perfect since it bootstraps all hosts and not only the ones interesting for VO. However, the feature to bootstrap only hosts relevant to VO is still not implemented. This will be done by Update-12 (sometimes in June).
Run-Time
ATP syncronization fails while running YAIM
Check the ATP log files (/var/log/atp.log) to know the cause of the problem. This can happen because of high latency values incompatible with ATP synchronization timeouts. Change ATP_SYNC_TIMEOUT to a higher value (ex: ATP_SYNC_TIMEOUT=1200; only in use for SAM 10 or higher). For previous versions you need to directly change the YAIM ATP function file: /opt/glite/yaim/functions/config_atp
NCG configuration fails while running YAIM
Check ncg log files (/var/log/ncg.log) to know the cause of the problem. This can arise due to a bad configuration file (/etc/ncg/ncg.conf), generated by YAIM incorrect configuration variables. Double check your YAIM configuration file.
A working example
Future improvements
- In the next SAM release, ncg will automatically go over the whole infrastructure and look for nodes which support defined VOs.
- Define a VO profile with VO dependent only metrics. Please check SAM-1178