Difference between revisions of "HOWTO04 Site Certification Manual tests"
(Created page with ' Check the functionality of the grid elements') |
|||
Line 1: | Line 1: | ||
Check the functionality of the grid elements | Check the functionality of the grid elements | ||
lcg-CE checks | |||
Verify the authentication and authorization on CE | |||
$ globus-job-run inaf-ce-01.ct.pi2s2.it /bin/hostname (or /usr/bin/whoami, or whatever you want!!) | |||
In case of pbs, check the WNs, ex.: | |||
$ globus-job-run pbs-enmr.cerm.unifi.it /usr/bin/pbsnodes -a | |||
Verify the functioning of the batch system: be careful that the queue you are querying really exists, and your VO is enabled on it. For example: | |||
$ globus-job-run ce-cyb.ca.infn.it/jobmanager-lcglsf -queue poncert /bin/pwd | |||
check dgas processes on CE (with a ps ax| grep dgas) | |||
Cream-CE checks | |||
Open your browser to | |||
https://<hostname-of-cream-ce>:8443/ce-cream/services | |||
A page with link to the CREAM WSDL should be shown | |||
Try a gsiftp (e.g. using globus-url-copy or @uberftp@@) towards that CREAM CE. E.g.: | |||
$ globus-url-copy gsiftp://<hostname-of-cream-ce>/opt/glite/yaim/etc/versions/ig-yaim file:/tmp/ig-version-test | |||
Try the following command: | |||
$ glite-ce-allowed-submission <<hostname-of-cream-ce>>:8443 | |||
It should report: | |||
Job Submission to this CREAM CE is enabled | |||
Try a submission to Cream-CE using the glite-ce-job-submit command, e.g.: | |||
$ /bin/cat sleep.jdl | |||
[ | |||
executable="/bin/sleep"; | |||
arguments="1"; | |||
] | |||
$ glite-ce-job-submit -a -r <hostname-of-cream-ce>:8443/<queue> test.jdl | |||
$ glite-ce-job-submit -a -r ce-cr-02.ts.infn.it:8443/cream-lsf-cert sleep.jdl | |||
https://ce-cr-02.ts.infn.it:8443/CREAM127814374 | |||
Check the status of that job, which eventually should be DONE-OK | |||
$ glite-ce-job-status https://ce-cr-02.ts.infn.it:8443/CREAM127814374 | |||
2010-07-27 11:55:37,986 WARN - No configuration file suitable for loading. Using built-in configuration | |||
****** JobID=[https://ce-cr-02.ts.infn.it:8443/CREAM127814374] | |||
Status = [DONE-OK] | |||
ExitCode = [0] | |||
Try a submission to that CE using the glite-ce-job-submit command, and then tries to cancel it (using the glite-ce-job-cancel command) | |||
$ /bin/cat sleep2.jdl | |||
[ | |||
executable="/bin/sleep"; | |||
arguments="1000"; | |||
] | |||
$ glite-ce-job-submit -a -r cecream-cyb.ca.infn.it:8443/cream-lsf-poncert sleep2.jdl | |||
https://cecream-cyb.ca.infn.it:8443/CREAM126335182 | |||
$ glite-ce-job-cancel https://cecream-cyb.ca.infn.it:8443/CREAM126335182 | |||
$ glite-ce-job-status https://cecream-cyb.ca.infn.it:8443/CREAM126335182 | |||
2010-07-27 12:18:26,973 WARN - No configuration file suitable for loading. Using built-in configuration | |||
****** JobID=[https://cecream-cyb.ca.infn.it:8443/CREAM126335182] | |||
Status = [CANCELLED] | |||
ExitCode = [] | |||
Description = [Cancelled by user] | |||
SE checks | |||
check if gridftp server on SE works (NOTE: this command isn't present any more on sl5 UI): | |||
$ edg-gridftp-ls gsiftp://inaf-se-01.ct.pi2s2.it/ | |||
check if SRM client works (on the published information you can find the right port to use) | |||
$ clientSRM ping -e httpg://sunstorm.cnaf.infn.it:8444 | |||
============================================================ | |||
Sending Ping request to: httpg://sunstorm.cnaf.infn.it:8444 | |||
============================================================ | |||
Request status: | |||
statusCode="SRM_SUCCESS"(0) | |||
explanation="SRM server successfully contacted" | |||
============================================================ | |||
SRM Response: | |||
versionInfo="v2.2" | |||
otherInfo (size=2) | |||
[0] key="backend_type" | |||
[0] value="StoRM" | |||
[1] key="backend_version" | |||
[1] value="<FE:1.5.0-1.sl4><BE:1.5.3-4.sl4>" | |||
============================================================ | |||
if you want, try to write on SE. Be sure your UI is pointing to an IS the SE is contained in (you may use our certification BDII), i.e. | |||
$ export LCG_GFAL_INFOSYS=gridit-bdii-01.cnaf.infn.it:2170 | |||
$ lcg-cr -v --vo glast.org -d storm-fe-cg.cr.cnaf.infn.it -l lfn:/grid/glast.org/wfug.jdl file:/home/paolini/rank.jdl | |||
$ lcg-del -v --vo glast.org -a <guid> | |||
Job submission | |||
Submit a test job to either lcg-CE or Cream-CE through the WMS, i.e. using the glite-wms-job-submit command. In case, submit a mpi test job. Our certification WMS is gridit-cert-wms.cnaf.infn.it | |||
Registration into 1st level HLR | |||
After the site entered in production, it needs to register the site resources in the hlr. | |||
Ask the site-admins to open a ticket towards the hlr adminstrators, passing them the following information: | |||
* grid queues names, in the form: | |||
o gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert | |||
* not-grid queues names, in the form: | |||
o hostname:queue | |||
* Name, surname ad certificate subject of each site-admin | |||
* Certificate subject of Computing Element | |||
Eventually, the site-admins have to open a ticket to DGAS support unit asking to enable the forwarding of accounting data from the 2° level hlr to APEL | |||
Certication Job | |||
The test job cheks several things, like the envirnment on WN and rpms installed. Moreover it performs some replica managements test. | |||
With a "grep TEST" you may get a summary of the results: in case of errors, you have to see in detail what is gone wrong! | |||
As already said, if the site supports any flavour of mpi, launch a mpi test job, like this | |||
don't forget to set a reasonable value in CPUNumber: the important is that your job will go soon in running | |||
If you want less stuff in the .out and .err files, in the file mpi-start-wrapper.sh comment the line | |||
export I2G_MPI_START_DEBUG=1 | |||
A successful output will look like the following one (extract) | |||
[...] | |||
mpi-start [DEBUG ]: using user supplied startup : '/opt/mpich-1.2.7p1/bin/mpirun ' | |||
mpi-start [DEBUG ]: => MPI_SPECIFIC_PARAMS= | |||
mpi-start [DEBUG ]: => I2G_MPI_PRECOMMAND= | |||
mpi-start [DEBUG ]: => MPIEXEC=/opt/mpich-1.2.7p1/bin/mpirun | |||
mpi-start [DEBUG ]: => I2G_MACHINEFILE_AND_NP=-machinefile /tmp/tmp.iBypc12521 -np 6 | |||
mpi-start [DEBUG ]: => I2G_MPI_APPLICATION=/home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello | |||
mpi-start [DEBUG ]: => I2G_MPI_APPLICATION_ARGS= | |||
mpi-start [DEBUG ]: /opt/mpich-1.2.7p1/bin/mpirun -machinefile /tmp/tmp.iBypc12521 -np 6 /home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello | |||
Process 4 on t3-wn-37.pn.pd.infn.it out of 6 | |||
Process 3 on t3-wn-34.pn.pd.infn.it out of 6 | |||
Process 1 on t3-wn-13.pn.pd.infn.it out of 6 | |||
Process 2 on t3-wn-34.pn.pd.infn.it out of 6 | |||
Process 5 on t3-wn-37.pn.pd.infn.it out of 6 | |||
Process 0 on t3-wn-13.pn.pd.infn.it out of 6 | |||
[...] |
Revision as of 17:42, 14 December 2010
Check the functionality of the grid elements
lcg-CE checks
Verify the authentication and authorization on CE
$ globus-job-run inaf-ce-01.ct.pi2s2.it /bin/hostname (or /usr/bin/whoami, or whatever you want!!)
In case of pbs, check the WNs, ex.:
$ globus-job-run pbs-enmr.cerm.unifi.it /usr/bin/pbsnodes -a
Verify the functioning of the batch system: be careful that the queue you are querying really exists, and your VO is enabled on it. For example:
$ globus-job-run ce-cyb.ca.infn.it/jobmanager-lcglsf -queue poncert /bin/pwd
check dgas processes on CE (with a ps ax| grep dgas)
Cream-CE checks
Open your browser to
https://<hostname-of-cream-ce>:8443/ce-cream/services
A page with link to the CREAM WSDL should be shown
Try a gsiftp (e.g. using globus-url-copy or @uberftp@@) towards that CREAM CE. E.g.:
$ globus-url-copy gsiftp://<hostname-of-cream-ce>/opt/glite/yaim/etc/versions/ig-yaim file:/tmp/ig-version-test
Try the following command:
$ glite-ce-allowed-submission <<hostname-of-cream-ce>>:8443
It should report:
Job Submission to this CREAM CE is enabled
Try a submission to Cream-CE using the glite-ce-job-submit command, e.g.:
$ /bin/cat sleep.jdl
[ executable="/bin/sleep"; arguments="1"; ]
$ glite-ce-job-submit -a -r <hostname-of-cream-ce>:8443/<queue> test.jdl
$ glite-ce-job-submit -a -r ce-cr-02.ts.infn.it:8443/cream-lsf-cert sleep.jdl https://ce-cr-02.ts.infn.it:8443/CREAM127814374
Check the status of that job, which eventually should be DONE-OK
$ glite-ce-job-status https://ce-cr-02.ts.infn.it:8443/CREAM127814374 2010-07-27 11:55:37,986 WARN - No configuration file suitable for loading. Using built-in configuration
- JobID=[1]
Status = [DONE-OK] ExitCode = [0]
Try a submission to that CE using the glite-ce-job-submit command, and then tries to cancel it (using the glite-ce-job-cancel command)
$ /bin/cat sleep2.jdl
[ executable="/bin/sleep"; arguments="1000"; ]
$ glite-ce-job-submit -a -r cecream-cyb.ca.infn.it:8443/cream-lsf-poncert sleep2.jdl https://cecream-cyb.ca.infn.it:8443/CREAM126335182
$ glite-ce-job-cancel https://cecream-cyb.ca.infn.it:8443/CREAM126335182
$ glite-ce-job-status https://cecream-cyb.ca.infn.it:8443/CREAM126335182 2010-07-27 12:18:26,973 WARN - No configuration file suitable for loading. Using built-in configuration
- JobID=[2]
Status = [CANCELLED] ExitCode = [] Description = [Cancelled by user]
SE checks
check if gridftp server on SE works (NOTE: this command isn't present any more on sl5 UI):
$ edg-gridftp-ls gsiftp://inaf-se-01.ct.pi2s2.it/
check if SRM client works (on the published information you can find the right port to use)
$ clientSRM ping -e httpg://sunstorm.cnaf.infn.it:8444
================================================
Sending Ping request to: httpg://sunstorm.cnaf.infn.it:8444
================================================
Request status:
statusCode="SRM_SUCCESS"(0) explanation="SRM server successfully contacted"
================================================
SRM Response:
versionInfo="v2.2" otherInfo (size=2) [0] key="backend_type" [0] value="StoRM" [1] key="backend_version" [1] value="<FE:1.5.0-1.sl4><BE:1.5.3-4.sl4>"
================================================
if you want, try to write on SE. Be sure your UI is pointing to an IS the SE is contained in (you may use our certification BDII), i.e.
$ export LCG_GFAL_INFOSYS=gridit-bdii-01.cnaf.infn.it:2170 $ lcg-cr -v --vo glast.org -d storm-fe-cg.cr.cnaf.infn.it -l lfn:/grid/glast.org/wfug.jdl file:/home/paolini/rank.jdl $ lcg-del -v --vo glast.org -a <guid>
Job submission
Submit a test job to either lcg-CE or Cream-CE through the WMS, i.e. using the glite-wms-job-submit command. In case, submit a mpi test job. Our certification WMS is gridit-cert-wms.cnaf.infn.it
Registration into 1st level HLR
After the site entered in production, it needs to register the site resources in the hlr. Ask the site-admins to open a ticket towards the hlr adminstrators, passing them the following information:
* grid queues names, in the form: o gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert
* not-grid queues names, in the form: o hostname:queue
* Name, surname ad certificate subject of each site-admin * Certificate subject of Computing Element
Eventually, the site-admins have to open a ticket to DGAS support unit asking to enable the forwarding of accounting data from the 2° level hlr to APEL
Certication Job
The test job cheks several things, like the envirnment on WN and rpms installed. Moreover it performs some replica managements test. With a "grep TEST" you may get a summary of the results: in case of errors, you have to see in detail what is gone wrong!
As already said, if the site supports any flavour of mpi, launch a mpi test job, like this don't forget to set a reasonable value in CPUNumber: the important is that your job will go soon in running
If you want less stuff in the .out and .err files, in the file mpi-start-wrapper.sh comment the line
export I2G_MPI_START_DEBUG=1
A successful output will look like the following one (extract)
[...] mpi-start [DEBUG ]: using user supplied startup : '/opt/mpich-1.2.7p1/bin/mpirun ' mpi-start [DEBUG ]: => MPI_SPECIFIC_PARAMS= mpi-start [DEBUG ]: => I2G_MPI_PRECOMMAND= mpi-start [DEBUG ]: => MPIEXEC=/opt/mpich-1.2.7p1/bin/mpirun mpi-start [DEBUG ]: => I2G_MACHINEFILE_AND_NP=-machinefile /tmp/tmp.iBypc12521 -np 6 mpi-start [DEBUG ]: => I2G_MPI_APPLICATION=/home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello mpi-start [DEBUG ]: => I2G_MPI_APPLICATION_ARGS= mpi-start [DEBUG ]: /opt/mpich-1.2.7p1/bin/mpirun -machinefile /tmp/tmp.iBypc12521 -np 6 /home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello Process 4 on t3-wn-37.pn.pd.infn.it out of 6 Process 3 on t3-wn-34.pn.pd.infn.it out of 6 Process 1 on t3-wn-13.pn.pd.infn.it out of 6 Process 2 on t3-wn-34.pn.pd.infn.it out of 6 Process 5 on t3-wn-37.pn.pd.infn.it out of 6 Process 0 on t3-wn-13.pn.pd.infn.it out of 6 [...]