FAQ VO Service Availability Monitoring

From EGIWiki
Jump to: navigation, search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


Frequently Asked Questions & Troubleshooting

General

Before proceeding, check if your question is not already answered in


Instalation

perl-DBD-MySQL dependency problems?

For installation problems related with perl-DBD-MySQL package, check that you have the repository priorities set as explained in the release notes. The proper package should be fetched from RPM FORGE EXTRA repository: RPM FORGE repository


perl-SOAP-Lite dependency problems?

Fri Mar 11 13:13:54 CET 2011 : Can't locate Class/Inspector.pm in @INC 
(@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl 
/usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.5 
/usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /usr/lib/perl5/vendor_perl/5.8.8/SOAP/Lite.pm line 435.
Fri Mar 11 13:13:54 CET 2011 : BEGIN failed--compilation aborted at /usr/lib/perl5/vendor_perl/5.8.8/SOAP/Lite.pm line 435.
Fri Mar 11 13:13:54 CET 2011 : Compilation failed in require at /usr/bin/voms2htpasswd line 12.
Fri Mar 11 13:13:54 CET 2011 : BEGIN failed--compilation aborted at /usr/bin/voms2htpasswd line 12.

Please check if you have "perl-Class-Inspector" installed.

$yum install 'perl(Class::Inspector)


Configuration

How to monitor more than one VO

Include a white space separated VO list in your YAIM configuration file for

VO="vo1 vo2"
NCG_VO="vo1 vo2"

and the VOMS_SERVERS, VOMSES, VOMS_CA:DN and WMS_HOSTS for each VO

# VOMS server definition for vo1
VO_<vo1>_VOMS_SERVERS="'vomss://voms.your.domain:8443/voms/vo1?/vo1/'"
# VOMSES server definition for vo1
VO_<vo1>_VOMSES="'vo1 voms.your.domain 15000 /C=PT/O=LIPCA/O=LIP/OU=Lisboa/CN=voms.your.domain vo1'"
# DN of the CA which issued the VOMS Certificate
VO_<vo1>_VOMS_CA_DN="/C=PT/O=LIPCA/CN=LIP Certification Authority"
# WMS used to submit jobs to vo1
VO_<vo1>_EU_WMS_HOSTS="wms.your.domain"

# VOMS server definition for vo1
VO_<vo2>_VOMS_SERVERS="'vomss://voms.your.domain:8443/voms/vo2?/vo2/'"
# VOMSES server definition for vo1
VO_<vo2>_VOMSES="'vo2 voms.your.domain 15001 /C=PT/O=LIPCA/O=LIP/OU=Lisboa/CN=voms.your.domain vo2'"
# DN of the CA which issued the VOMS Certificate
VO_<vo2>_VOMS_CA_DN="/C=PT/O=LIPCA/CN=LIP Certification Authority"
# WMS used to submit jobs to vo2
VO_<vo2>_EU_WMS_HOSTS="wms.your.domain"


Can I start 2 different proxies to submit jobs to the different VOs?

Yes. You can have different proxy for each VO. Just use different user certificate when creating MyProxy credential. For example:

# For vo1
$ export X509_USER_CERT=~/.globus/usercert-vo1.pem
$ export X509_USER_KEY=~/.globus/userkey-vo1.pem
$ myproxy-init -l nagios -s $PX_HOST -k NagiosRetrieve-NAGIOS_HOSTNAME-vo1 -c 1000 -x -Z "NAGIOS_HOSTNAME_DN"

# For vo2
$ export X509_USER_CERT=~/.globus/usercert-vo2.pem
$ export X509_USER_KEY=~/.globus/userkey-vo2.pem
$ myproxy-init -l nagios -s $PX_HOST -k NagiosRetrieve-NAGIOS_HOSTNAME-vo2 -c 1000 -x -Z "NAGIOS_HOSTNAME_DN"

Same principle applies to any other VO supported by that instance. Of course you can use the same user cert if it is member of multiple VOs. Easier solution would be to use robot certificates.


Can a catch-all VO SAM provide a dedicated VO view?

Nagios web interface was never about obvious presentation. However, there is the service group view where NCG generates service group aggregating all VO dependent checks for each VO. For example:


Can I configure VO SAM to use a unique LFC and central SE for all VOs?

Yes. Include the following definitions in your YAIM configuration variables. Implicitly there is the assumption that the unique LFC and central SE do support all monitored VOs.

# LFC and SE definitions
JOBSUBMIT_WN_LFC=lfc-allvos.my.domain
JOBSUBMIT_WN_SE_REP=se-allvos.my.domain


Can I configure VO dependent LFCs and central SEs in a VO SAM?

There is a way to do this, though slightly more complicated. Make sure that you don't have line like anywhere in localdb:

/etc/ncg/ncg-localdb.d/jobsubmit:MODIFY_METRIC_PARAMETER!org.sam.CREAMCE-JobState!--wn-lfc!lfc.my.domain
/etc/ncg/ncg-localdb.d/jobsubmit:MODIFY_METRIC_PARAMETER!org.sam.CE-JobState!--wn-lfc!lfc.my.domain
/etc/ncg/ncg-localdb.d/jobsubmit:MODIFY_METRIC_PARAMETER!org.sam.CREAMCE-JobState!--wn-se-rep!se.my.domain
/etc/ncg/ncg-localdb.d/jobsubmit:MODIFY_METRIC_PARAMETER!org.sam.CE-JobState!--wn-se-rep!se.my.domain

anywhere in /etc/ncg/*localdb*, and put the following instead:

VO_ATTRIBUTE!vo1!WN_SE_REP!se-vo1.my.domain
VO_ATTRIBUTE!vo2!WN_SE_REP!se-vo2.my.domain
MODIFY_METRIC_ATTRIBUTE!org.sam.CE-JobState!WN_SE_REP!--wn-se-rep
VO_ATTRIBUTE!vo1!WN_LFC!lfc-vo1.my.domain
VO_ATTRIBUTE!vo2!WN_LFC!lfc-vo2.my.domain
MODIFY_METRIC_ATTRIBUTE!org.sam.CE-JobState!WN_LFC!--wn-lfc


How can I provide access to non-allowed member?

At configuration time, and dependending of the NCG_ROLE selected, different users may not have the same permissions to access to the VO SAM services. To enable permission of a given user, one can add the user DN to /etc/voms2htpasswd-static.d/YAIM-ops-monitor.conf


How to switch off importing admin DNs ?

# switch off importing admin DNs (see SAM-1434) (optional variable)
NCG_CONTACTS_USE_GOCDB=false

How do I run VO SAM tests with a specific FQAN ?

You can set

VO_ENMR_EU_NCG_DEFAULT_VO_FQAN="YOUR FQAN"

in your yaim configuration file, and start your proxy as normal

$ export X509_USER_CERT=~/.globus/usercert-vo1.pem
$ export X509_USER_KEY=~/.globus/userkey-vo1.pem
$ myproxy-init -l nagios -s $PX_HOST -k NagiosRetrieve-NAGIOS_HOSTNAME-vo1 -c 1000 -x -Z "NAGIOS_HOSTNAME_DN"


How to change the email notification header?

# optional - change of notification header (SAM-1130):
NCG_NOTIFICATION_HEADER="YOUR HEADER"


How to enable the use of ROBOT certificates?

# optional - use of robot certificates (SAM-1180):
NCG_USE_ROBOT_CERT=true
# Robot cert and key can be different for each VO
# and standard Yaim VO notation is used
VO_OPS_ROBOT_CERT=/etc/nagios/globus/robot-cert.pem
VO_OPS_ROBOT_KEY=/etc/nagios/globus/robot-key.pem


How to monitor uncertified sites?

To monitor uncertified sites, you will need to use a dedicated TopBDII (with the information of those sites) and a dedicated WMS. The list of uncertified sites to be monitored should also be listed:

# optional - add uncertified gLite sites (SAM-1143)
UNCERTIFIED_SITES="SiteA SiteB SiteC"
UNCERTIFIED_WMS=wms.uncert.org
UNCERTIFIED_BDII=bdii.uncert.org


How to check host checks off/on?

# switch host checks off/on (see SAM-1173) (optional variable)
NCG_CHECK_HOSTS=1


My VO is global, do I still have to define a list of NGIs for my VO instance?

No. You can define

NCG_GOCDB_ROC_NAME=ALL

However this approach is not perfect since it bootstraps all hosts and not only the ones interesting for VO. However, the feature to bootstrap only hosts relevant to VO is still not implemented. This will be done by Update-12 (sometimes in June).


Run-Time

ATP syncronization fails while running YAIM

Check the ATP log files (/var/log/atp.log) to know the cause of the problem. This can happen because of high latency values incompatible with ATP synchronization timeouts. Change ATP_SYNC_TIMEOUT to a higher value (ex: ATP_SYNC_TIMEOUT=1200; only in use for SAM 10 or higher). For previous versions you need to directly change the YAIM ATP function file: /opt/glite/yaim/functions/config_atp


NCG configuration fails while running YAIM

Check ncg log files (/var/log/ncg.log) to know the cause of the problem. This can arise due to a bad configuration file (/etc/ncg/ncg.conf), generated by YAIM incorrect configuration variables. Double check your YAIM configuration file.


A working example


Future improvements

  • In the next SAM release, ncg will automatically go over the whole infrastructure and look for nodes which support defined VOs.
  • Define a VO profile with VO dependent only metrics. Please check SAM-1178