Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "HOWTO04 Site Certification Manual tests"

From EGIWiki
Jump to navigation Jump to search
Line 6: Line 6:


  '''''$ globus-job-run inaf-ce-01.ct.pi2s2.it /bin/hostname''''' (or /usr/bin/whoami, or whatever you want!!)  
  '''''$ globus-job-run inaf-ce-01.ct.pi2s2.it /bin/hostname''''' (or /usr/bin/whoami, or whatever you want!!)  
Check if the lcg-CE gridftp server is working
globus-url-copy -dbg -v -vb file:/home/csys/goncalo/teste.txt gsiftp://ce02.lip.pt/tmp/txt
uberftp ce02.lip.pt


In case of pbs, check the WNs, ex.:
In case of pbs, check the WNs, ex.:
Line 12: Line 19:


Verify the functioning of the batch system: be careful that the queue you are querying really exists, and your VO is enabled on it. For example:
Verify the functioning of the batch system: be careful that the queue you are querying really exists, and your VO is enabled on it. For example:


$ globus-job-run ce-cyb.ca.infn.it/jobmanager-lcglsf -queue poncert /bin/pwd  
$ globus-job-run ce-cyb.ca.infn.it/jobmanager-lcglsf -queue poncert /bin/pwd  

Revision as of 17:51, 14 December 2010

Check the functionality of the grid elements

lcg-CE checks

Verify the authentication and authorization on CE

$ globus-job-run inaf-ce-01.ct.pi2s2.it /bin/hostname (or /usr/bin/whoami, or whatever you want!!) 

Check if the lcg-CE gridftp server is working

globus-url-copy -dbg -v -vb file:/home/csys/goncalo/teste.txt gsiftp://ce02.lip.pt/tmp/txt
uberftp ce02.lip.pt


In case of pbs, check the WNs, ex.:

$ globus-job-run pbs-enmr.cerm.unifi.it /usr/bin/pbsnodes -a

Verify the functioning of the batch system: be careful that the queue you are querying really exists, and your VO is enabled on it. For example:


$ globus-job-run ce-cyb.ca.infn.it/jobmanager-lcglsf -queue poncert /bin/pwd

check dgas processes on CE (with a ps ax| grep dgas)

Cream-CE checks

Open your browser to

https://<hostname-of-cream-ce>:8443/ce-cream/services

A page with link to the CREAM WSDL should be shown

Try a gsiftp (e.g. using globus-url-copy or @uberftp@@) towards that CREAM CE. E.g.:

$ globus-url-copy gsiftp://<hostname-of-cream-ce>/opt/glite/yaim/etc/versions/ig-yaim file:/tmp/ig-version-test

Try the following command:

$ glite-ce-allowed-submission <<hostname-of-cream-ce>>:8443

It should report:

Job Submission to this CREAM CE is enabled

Try a submission to Cream-CE using the glite-ce-job-submit command, e.g.:

$ /bin/cat sleep.jdl

[ executable="/bin/sleep"; arguments="1"; ]

$ glite-ce-job-submit -a -r <hostname-of-cream-ce>:8443/<queue> test.jdl

$ glite-ce-job-submit -a -r ce-cr-02.ts.infn.it:8443/cream-lsf-cert sleep.jdl https://ce-cr-02.ts.infn.it:8443/CREAM127814374

Check the status of that job, which eventually should be DONE-OK

$ glite-ce-job-status https://ce-cr-02.ts.infn.it:8443/CREAM127814374 2010-07-27 11:55:37,986 WARN - No configuration file suitable for loading. Using built-in configuration

       Status        = [DONE-OK]
       ExitCode      = [0]

Try a submission to that CE using the glite-ce-job-submit command, and then tries to cancel it (using the glite-ce-job-cancel command)

$ /bin/cat sleep2.jdl

[ executable="/bin/sleep"; arguments="1000"; ]

$ glite-ce-job-submit -a -r cecream-cyb.ca.infn.it:8443/cream-lsf-poncert sleep2.jdl https://cecream-cyb.ca.infn.it:8443/CREAM126335182

$ glite-ce-job-cancel https://cecream-cyb.ca.infn.it:8443/CREAM126335182

$ glite-ce-job-status https://cecream-cyb.ca.infn.it:8443/CREAM126335182 2010-07-27 12:18:26,973 WARN - No configuration file suitable for loading. Using built-in configuration

       Status        = [CANCELLED]
       ExitCode      = []
       Description   = [Cancelled by user]

SE checks

check if gridftp server on SE works (NOTE: this command isn't present any more on sl5 UI):

$ edg-gridftp-ls gsiftp://inaf-se-01.ct.pi2s2.it/

check if SRM client works (on the published information you can find the right port to use)

$ clientSRM ping -e httpg://sunstorm.cnaf.infn.it:8444

================================================

Sending Ping request to: httpg://sunstorm.cnaf.infn.it:8444

================================================

Request status:

 statusCode="SRM_SUCCESS"(0)
 explanation="SRM server successfully contacted"
================================================

SRM Response:

 versionInfo="v2.2"
 otherInfo (size=2)
   [0] key="backend_type"
   [0] value="StoRM"
   [1] key="backend_version"
   [1] value="<FE:1.5.0-1.sl4><BE:1.5.3-4.sl4>"
================================================

if you want, try to write on SE. Be sure your UI is pointing to an IS the SE is contained in (you may use our certification BDII), i.e.

$ export LCG_GFAL_INFOSYS=gridit-bdii-01.cnaf.infn.it:2170 $ lcg-cr -v --vo glast.org -d storm-fe-cg.cr.cnaf.infn.it -l lfn:/grid/glast.org/wfug.jdl file:/home/paolini/rank.jdl $ lcg-del -v --vo glast.org -a <guid>

Job submission

Submit a test job to either lcg-CE or Cream-CE through the WMS, i.e. using the glite-wms-job-submit command. In case, submit a mpi test job. Our certification WMS is gridit-cert-wms.cnaf.infn.it

Registration into 1st level HLR

After the site entered in production, it needs to register the site resources in the hlr. Ask the site-admins to open a ticket towards the hlr adminstrators, passing them the following information:

   * grid queues names, in the form:
         o gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert 
   * not-grid queues names, in the form:
         o hostname:queue 
   * Name, surname ad certificate subject of each site-admin
   * Certificate subject of Computing Element 

Eventually, the site-admins have to open a ticket to DGAS support unit asking to enable the forwarding of accounting data from the 2° level hlr to APEL

Certication Job

The test job cheks several things, like the envirnment on WN and rpms installed. Moreover it performs some replica managements test. With a "grep TEST" you may get a summary of the results: in case of errors, you have to see in detail what is gone wrong!

As already said, if the site supports any flavour of mpi, launch a mpi test job, like this don't forget to set a reasonable value in CPUNumber: the important is that your job will go soon in running

If you want less stuff in the .out and .err files, in the file mpi-start-wrapper.sh comment the line

export I2G_MPI_START_DEBUG=1

A successful output will look like the following one (extract)

[...] mpi-start [DEBUG ]: using user supplied startup : '/opt/mpich-1.2.7p1/bin/mpirun ' mpi-start [DEBUG ]: => MPI_SPECIFIC_PARAMS= mpi-start [DEBUG ]: => I2G_MPI_PRECOMMAND= mpi-start [DEBUG ]: => MPIEXEC=/opt/mpich-1.2.7p1/bin/mpirun mpi-start [DEBUG ]: => I2G_MACHINEFILE_AND_NP=-machinefile /tmp/tmp.iBypc12521 -np 6 mpi-start [DEBUG ]: => I2G_MPI_APPLICATION=/home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello mpi-start [DEBUG ]: => I2G_MPI_APPLICATION_ARGS= mpi-start [DEBUG ]: /opt/mpich-1.2.7p1/bin/mpirun -machinefile /tmp/tmp.iBypc12521 -np 6 /home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello Process 4 on t3-wn-37.pn.pd.infn.it out of 6 Process 3 on t3-wn-34.pn.pd.infn.it out of 6 Process 1 on t3-wn-13.pn.pd.infn.it out of 6 Process 2 on t3-wn-34.pn.pd.infn.it out of 6 Process 5 on t3-wn-37.pn.pd.infn.it out of 6 Process 0 on t3-wn-13.pn.pd.infn.it out of 6 [...]