HOWTO04 Site Certification Manual tests
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
Check the functionality of the grid elements
Be sure that the site's GIIS URL is contained in the Top level BDII/Information System your NGI will use for your certification.
Note that the examples here use the Italian NGI and sites. Please substitute YOUR OWN NGI and site credentials when running the test.
lcg-CE checks
Verify the authentication and authorization on the CE by running a simple command. i.e.
$ globus-job-run inaf-ce-01.ct.pi2s2.it /bin/hostname
(you could also use: /usr/bin/whoami, or whatever you want!!)
Check if the lcg-CE gridftp server is working
$ globus-url-copy -dbg -v -vb file:/home/csys/goncalo/teste.txt gsiftp://ce02.lip.pt/tmp/txt
$ uberftp ce02.lip.pt
In case of pbs, check the WNs with the following command:
$ globus-job-run pbs-enmr.cerm.unifi.it /usr/bin/pbsnodes -a
Verify the functioning of the batch system: be careful that the queue you are querying really exists, and your VO is enabled on it. For example:
$ globus-job-run ce02.lip.pt:2119/jobmanager-fork /bin/pwd
$ globus-job-run ce-cyb.ca.infn.it/jobmanager-lcglsf -queue poncert /bin/pwd
check dgas processes on CE (with a ps ax| grep dgas)
Cream-CE checks
Open your browser to
https://<hostname-of-cream-ce>:8443/ce-cream/services
A page with link to the CREAM WSDL should be shown
Try a gsiftp (e.g. using globus-url-copy or @uberftp@@) towards that CREAM CE. E.g.:
$ globus-url-copy gsiftp://<hostname-of-cream-ce>/opt/glite/yaim/etc/versions/ig-yaim file:/tmp/ig-version-test
Try the following command:
$ glite-ce-allowed-submission <<hostname-of-cream-ce>>:8443
It should report:
Job Submission to this CREAM CE is enabled
Try a submission to Cream-CE using the glite-ce-job-submit command, e.g.:
$ /bin/cat sleep.jdl [ executable="/bin/sleep"; arguments="1"; ]
$ glite-ce-job-submit -a -r <hostname-of-cream-ce>:8443/<queue> test.jdl
$ glite-ce-job-submit -a -r ce-cr-02.ts.infn.it:8443/cream-lsf-cert sleep.jdl https://ce-cr-02.ts.infn.it:8443/CREAM127814374
Check the status of that job, which eventually should be DONE-OK
$ glite-ce-job-status https://ce-cr-02.ts.infn.it:8443/CREAM127814374 2010-07-27 11:55:37,986 WARN - No configuration file suitable for loading. Using built-in configuration ****** JobID=[https://ce-cr-02.ts.infn.it:8443/CREAM127814374] Status = [DONE-OK] ExitCode = [0]
Try a submission to that CE using the glite-ce-job-submit command, and then tries to cancel it (using the glite-ce-job-cancel command)
$ /bin/cat sleep2.jdl [ executable="/bin/sleep"; arguments="1000"; ]
$ glite-ce-job-submit -a -r cecream-cyb.ca.infn.it:8443/cream-lsf-poncert sleep2.jdl https://cecream-cyb.ca.infn.it:8443/CREAM126335182
$ glite-ce-job-cancel https://cecream-cyb.ca.infn.it:8443/CREAM126335182
$ glite-ce-job-status https://cecream-cyb.ca.infn.it:8443/CREAM126335182 2010-07-27 12:18:26,973 WARN - No configuration file suitable for loading. Using built-in configuration ****** JobID=[https://cecream-cyb.ca.infn.it:8443/CREAM126335182] Status = [CANCELLED] ExitCode = [] Description = [Cancelled by user]
ARC CE checks
For a new ARC CE you can easily use the predefined ARC CE checks from the monitoring host of your NGI.
Check the status of the CE with
/usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-status -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops Status is active
Test gsiftp
/usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-auth -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops gsiftp OK
Test the versions of the CA's
/usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-caver -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops version = 1.38 - All CAs present
Check the versions of ARC and Globus
/usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-softver -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops nordugrid-arc-0.8.3.1, globus-5.0.3
Copy a file
/usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-gridftp -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops Job finished successfully
Submit a test job
/usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-jobsubmit -H <CE hostname> --vo ops -x /etc/nagios/globus/userproxy.pem-ops Job submission successful
Check the LFC
/usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-lfc -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops Job finished successfully
Check the SRM
/usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-srm -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops Job finished successfully
SE checks
check if gridftp server on SE works:
$ uberftp inaf-se-01.ct.pi2s2.it
For STORM SE: check if SRM client works (on the published information you can find the right port to use)
$ /opt/storm/srm-clients/bin/clientSRM ping -e httpg://sunstorm.cnaf.infn.it:8444 ============================================================ Sending Ping request to: httpg://sunstorm.cnaf.infn.it:8444 ============================================================ Request status: statusCode="SRM_SUCCESS"(0) explanation="SRM server successfully contacted" ============================================================ SRM Response: versionInfo="v2.2" otherInfo (size=2) [0] key="backend_type" [0] value="StoRM" [1] key="backend_version" [1] value="<FE:1.5.0-1.sl4><BE:1.5.3-4.sl4>" ============================================================
Try to write on SE. Be sure your UI is pointing to an IS the SE is contained in (you may use your certification BDII)
1) Setting a top-bdii that is publishing the SE you have to test
$ export LCG_GFAL_INFOSYS=<TopBDII hostname>:2170
2) Copy a file from the local filesystem to the SE, registering it in the LFC. This command output will return a SURL that you can use latter for other tests.
A SURL is a path of the type: srm://srm01.ncg.ingrid.pt/ibergrid/iber/generated/2011-02-01/file4034a935-8d7a-48f4-914f-16f2634d4802
$ lcg-cr -v --vo <VO> -d <Your SE> -l lfn:/grid/<VO>/test.txt file:</path/to/your/local/file>
3) Create a new replica in other SE (to check the 3rd party transfer between 2 SEs)
$ lcg-rep -v --vo <VO> -d <Other SE> <SURL>
4) List Replicas
$ lcg-lr -v --vo <VO> lfn:/grid/<VO>/test.txt
5) Delete all replicas
$ lcg-del -v --vo <VO> -a <guid>
Job submission
Submit a test job to either lcg-CE or Cream-CE through the WMS, i.e. using the glite-wms-job-submit command. In case, submit a mpi test job. The NGI_IT certification WMS is gridit-cert-wms.cnaf.infn.it
Registration into 1st level HLR
NOTE: this step is needed if your infrastructure uses DGAS as accounting system
After the site entered in production, it needs to register the site resources in the hlr. Ask the site-admins to open a ticket towards the hlr adminstrators, passing them the following information:
- grid queues names, in the form:
- gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert
- not-grid queues names, in the form:
- hostname:queue
- Name, surname ad certificate subject of each site-admin
- Certificate subject of Computing Element
Eventually, the site-admins have to open a ticket to DGAS support unit asking to enable the forwarding of accounting data from the 2° level hlr to APEL
Certification Job
The test job checks several things, like the environment on WN and installed rpms. Moreover it performs some replica management tests. With a "grep TEST" you may get a summary of the results: in case of errors, you have to see in detail what is gone wrong!
As already said, if the site supports any flavour of mpi, launch a mpi test job, like this
don't forget to set a reasonable value in CPUNumber: most important is that your job starts running quickly
If you want less stuff in the .out and .err files, in the file mpi-start-wrapper.sh comment the line
export I2G_MPI_START_DEBUG=1
A successful output will look like the following one (extract)
[...] mpi-start [DEBUG ]: using user supplied startup : '/opt/mpich-1.2.7p1/bin/mpirun ' mpi-start [DEBUG ]: => MPI_SPECIFIC_PARAMS= mpi-start [DEBUG ]: => I2G_MPI_PRECOMMAND= mpi-start [DEBUG ]: => MPIEXEC=/opt/mpich-1.2.7p1/bin/mpirun mpi-start [DEBUG ]: => I2G_MACHINEFILE_AND_NP=-machinefile /tmp/tmp.iBypc12521 -np 6 mpi-start [DEBUG ]: => I2G_MPI_APPLICATION=/home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello mpi-start [DEBUG ]: => I2G_MPI_APPLICATION_ARGS= mpi-start [DEBUG ]: /opt/mpich-1.2.7p1/bin/mpirun -machinefile /tmp/tmp.iBypc12521 -np 6 /home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello Process 4 on t3-wn-37.pn.pd.infn.it out of 6 Process 3 on t3-wn-34.pn.pd.infn.it out of 6 Process 1 on t3-wn-13.pn.pd.infn.it out of 6 Process 2 on t3-wn-34.pn.pd.infn.it out of 6 Process 5 on t3-wn-37.pn.pd.infn.it out of 6 Process 0 on t3-wn-13.pn.pd.infn.it out of 6 [...]
Back to SiteCertMan/GIIS_BDII_check
Back to SiteCertMan#Site_certification_procedure
Revision history
Version | Authors | Date | Comments |
---|---|---|---|
1.0 | Alessandro Paolini | 2010-12-15 | first draft |
1.1 | Alessandro Paolini | 2010-12-16 | added links to certification job pages |
1.2 | Alessandro Paolini | 2011-06-08 | added some other lcg-utils test |