Difference between revisions of "HOWTO04 Site Certification Manual tests"
Line 119: | Line 119: | ||
============================================================ | ============================================================ | ||
Try to write on SE. Be sure your UI is pointing to an IS the SE is contained in (you may use your certification BDII), i.e. | |||
$ export LCG_GFAL_INFOSYS=gridit-bdii-01.cnaf.infn.it:2170 | $ export LCG_GFAL_INFOSYS=gridit-bdii-01.cnaf.infn.it:2170 | ||
$ lcg-cr -v --vo glast.org -d storm-fe-cg.cr.cnaf.infn.it -l lfn:/grid/glast.org/wfug.jdl file:/home/paolini/rank.jdl | |||
$ lcg-del -v --vo glast.org -a <guid> | $ lcg-cr -v --vo glast.org -d storm-fe-cg.cr.cnaf.infn.it -l lfn:/grid/glast.org/wfug.jdl file:/home/paolini/rank.jdl | ||
$ lcg-del -v --vo glast.org -a <guid> | |||
Job submission | Job submission |
Revision as of 11:23, 15 December 2010
Check the functionality of the grid elements
Be sure that its GIIS url is contained in the BDII you use for certification
lcg-CE checks
Verify the authentication and authorization on CE
$ globus-job-run inaf-ce-01.ct.pi2s2.it /bin/hostname (or /usr/bin/whoami, or whatever you want!!)
Check if the lcg-CE gridftp server is working
$ globus-url-copy -dbg -v -vb file:/home/csys/goncalo/teste.txt gsiftp://ce02.lip.pt/tmp/txt
$ uberftp ce02.lip.pt
In case of pbs, check the WNs with the following command:
$ globus-job-run pbs-enmr.cerm.unifi.it /usr/bin/pbsnodes -a
Verify the functioning of the batch system: be careful that the queue you are querying really exists, and your VO is enabled on it. For example:
$ globus-job-run ce02.lip.pt:2119/jobmanager-fork /bin/pwd
$ globus-job-run ce-cyb.ca.infn.it/jobmanager-lcglsf -queue poncert /bin/pwd
check dgas processes on CE (with a ps ax| grep dgas)
Cream-CE checks
Open your browser to
https://<hostname-of-cream-ce>:8443/ce-cream/services
A page with link to the CREAM WSDL should be shown
Try a gsiftp (e.g. using globus-url-copy or @uberftp@@) towards that CREAM CE. E.g.:
$ globus-url-copy gsiftp://<hostname-of-cream-ce>/opt/glite/yaim/etc/versions/ig-yaim file:/tmp/ig-version-test
Try the following command:
$ glite-ce-allowed-submission <<hostname-of-cream-ce>>:8443
It should report:
Job Submission to this CREAM CE is enabled
Try a submission to Cream-CE using the glite-ce-job-submit command, e.g.:
$ /bin/cat sleep.jdl [ executable="/bin/sleep"; arguments="1"; ]
$ glite-ce-job-submit -a -r <hostname-of-cream-ce>:8443/<queue> test.jdl
$ glite-ce-job-submit -a -r ce-cr-02.ts.infn.it:8443/cream-lsf-cert sleep.jdl https://ce-cr-02.ts.infn.it:8443/CREAM127814374
Check the status of that job, which eventually should be DONE-OK
$ glite-ce-job-status https://ce-cr-02.ts.infn.it:8443/CREAM127814374 2010-07-27 11:55:37,986 WARN - No configuration file suitable for loading. Using built-in configuration ****** JobID=[1] Status = [DONE-OK] ExitCode = [0]
Try a submission to that CE using the glite-ce-job-submit command, and then tries to cancel it (using the glite-ce-job-cancel command)
$ /bin/cat sleep2.jdl [ executable="/bin/sleep"; arguments="1000"; ]
$ glite-ce-job-submit -a -r cecream-cyb.ca.infn.it:8443/cream-lsf-poncert sleep2.jdl https://cecream-cyb.ca.infn.it:8443/CREAM126335182
$ glite-ce-job-cancel https://cecream-cyb.ca.infn.it:8443/CREAM126335182
$ glite-ce-job-status https://cecream-cyb.ca.infn.it:8443/CREAM126335182 2010-07-27 12:18:26,973 WARN - No configuration file suitable for loading. Using built-in configuration ****** JobID=[2] Status = [CANCELLED] ExitCode = [] Description = [Cancelled by user]
SE checks
check if gridftp server on SE works:
$ uberftp inaf-se-01.ct.pi2s2.it
For STORM SE: check if SRM client works (on the published information you can find the right port to use)
$ /opt/storm/srm-clients/bin/clientSRM ping -e httpg://sunstorm.cnaf.infn.it:8444 ============================================================ Sending Ping request to: httpg://sunstorm.cnaf.infn.it:8444 ============================================================ Request status: statusCode="SRM_SUCCESS"(0) explanation="SRM server successfully contacted" ============================================================ SRM Response: versionInfo="v2.2" otherInfo (size=2) [0] key="backend_type" [0] value="StoRM" [1] key="backend_version" [1] value="<FE:1.5.0-1.sl4><BE:1.5.3-4.sl4>" ============================================================
Try to write on SE. Be sure your UI is pointing to an IS the SE is contained in (you may use your certification BDII), i.e.
$ export LCG_GFAL_INFOSYS=gridit-bdii-01.cnaf.infn.it:2170 $ lcg-cr -v --vo glast.org -d storm-fe-cg.cr.cnaf.infn.it -l lfn:/grid/glast.org/wfug.jdl file:/home/paolini/rank.jdl $ lcg-del -v --vo glast.org -a <guid>
Job submission
Submit a test job to either lcg-CE or Cream-CE through the WMS, i.e. using the glite-wms-job-submit command. In case, submit a mpi test job. Our certification WMS is gridit-cert-wms.cnaf.infn.it
Registration into 1st level HLR
After the site entered in production, it needs to register the site resources in the hlr. Ask the site-admins to open a ticket towards the hlr adminstrators, passing them the following information:
* grid queues names, in the form: o gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert
* not-grid queues names, in the form: o hostname:queue
* Name, surname ad certificate subject of each site-admin * Certificate subject of Computing Element
Eventually, the site-admins have to open a ticket to DGAS support unit asking to enable the forwarding of accounting data from the 2° level hlr to APEL
Certication Job
The test job cheks several things, like the envirnment on WN and rpms installed. Moreover it performs some replica managements test. With a "grep TEST" you may get a summary of the results: in case of errors, you have to see in detail what is gone wrong!
As already said, if the site supports any flavour of mpi, launch a mpi test job, like this don't forget to set a reasonable value in CPUNumber: the important is that your job will go soon in running
If you want less stuff in the .out and .err files, in the file mpi-start-wrapper.sh comment the line
export I2G_MPI_START_DEBUG=1
A successful output will look like the following one (extract)
[...] mpi-start [DEBUG ]: using user supplied startup : '/opt/mpich-1.2.7p1/bin/mpirun ' mpi-start [DEBUG ]: => MPI_SPECIFIC_PARAMS= mpi-start [DEBUG ]: => I2G_MPI_PRECOMMAND= mpi-start [DEBUG ]: => MPIEXEC=/opt/mpich-1.2.7p1/bin/mpirun mpi-start [DEBUG ]: => I2G_MACHINEFILE_AND_NP=-machinefile /tmp/tmp.iBypc12521 -np 6 mpi-start [DEBUG ]: => I2G_MPI_APPLICATION=/home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello mpi-start [DEBUG ]: => I2G_MPI_APPLICATION_ARGS= mpi-start [DEBUG ]: /opt/mpich-1.2.7p1/bin/mpirun -machinefile /tmp/tmp.iBypc12521 -np 6 /home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello Process 4 on t3-wn-37.pn.pd.infn.it out of 6 Process 3 on t3-wn-34.pn.pd.infn.it out of 6 Process 1 on t3-wn-13.pn.pd.infn.it out of 6 Process 2 on t3-wn-34.pn.pd.infn.it out of 6 Process 5 on t3-wn-37.pn.pd.infn.it out of 6 Process 0 on t3-wn-13.pn.pd.infn.it out of 6 [...]