Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "HOWTO04 Site Certification Manual tests"

From EGIWiki
Jump to navigation Jump to search
Line 1: Line 1:
{{Template: Op menubar}}
{{Template: Op menubar}} {{Template:Doc_menubar}} {{TOC_right}}  
{{Template:Doc_menubar}}
[[Category:Operations Manuals]]
{{TOC_right}}
=Check the functionality of the grid elements=


Be sure that the site's GIIS URL is contained in the Top level BDII/Information System your NGI will use for your certification. 
= Check the functionality of the grid elements =
<!--Note that this BDII can also contain uncertified sites. -->


Be sure that the site's GIIS URL is contained in the Top level BDII/Information System your NGI will use for your certification. <!--Note that this BDII can also contain uncertified sites. -->


Note that the examples here use the Italian NGI and sites. Please substitute '''YOUR OWN''' NGI and site credentials when running the test.
<br> Note that the examples here use the Italian NGI and sites. Please substitute '''YOUR OWN''' NGI and site credentials when running the test.  


==lcg-CE checks==  
== lcg-CE checks ==


Verify the authentication and authorization on the CE by running a simple command. i.e.
Verify the authentication and authorization on the CE by running a simple command. i.e.  


  '''''$ globus-job-run inaf-ce-01.ct.pi2s2.it /bin/hostname'''''  
  '''''$ globus-job-run inaf-ce-01.ct.pi2s2.it /bin/hostname'''''  
(you could also use: /usr/bin/whoami, or whatever you want!!)  
(you could also use: /usr/bin/whoami, or whatever you want!!)  


Check if the lcg-CE gridftp server is working
Check if the lcg-CE gridftp server is working  


  '''''$ globus-url-copy -dbg -v -vb file:/home/csys/goncalo/teste.txt gsiftp://ce02.lip.pt/tmp/txt'''''
  '''''$ globus-url-copy -dbg -v -vb file:/home/csys/goncalo/teste.txt gsiftp://ce02.lip.pt/tmp/txt'''''
Line 24: Line 21:
  '''''$ uberftp ce02.lip.pt'''''
  '''''$ uberftp ce02.lip.pt'''''


 
<br> In case of pbs, check the WNs with the following command:  
In case of pbs, check the WNs with the following command:


  '''''$ globus-job-run pbs-enmr.cerm.unifi.it /usr/bin/pbsnodes -a'''''
  '''''$ globus-job-run pbs-enmr.cerm.unifi.it /usr/bin/pbsnodes -a'''''
   
   


Verify the functioning of the batch system: be careful that the queue you are querying really exists, and your VO is enabled on it. For example:
Verify the functioning of the batch system: be careful that the queue you are querying really exists, and your VO is enabled on it. For example:  


  '''''$ globus-job-run ce02.lip.pt:2119/jobmanager-fork /bin/pwd'''''
  '''''$ globus-job-run ce02.lip.pt:2119/jobmanager-fork /bin/pwd'''''
Line 36: Line 32:
  '''''$ globus-job-run ce-cyb.ca.infn.it/jobmanager-lcglsf -queue poncert /bin/pwd'''''  
  '''''$ globus-job-run ce-cyb.ca.infn.it/jobmanager-lcglsf -queue poncert /bin/pwd'''''  


check dgas processes on CE (with a ps ax| grep dgas)
check dgas processes on CE (with a ps ax| grep dgas)  


==Cream-CE checks==
== Cream-CE checks ==


Open your browser to
Open your browser to  


  '''''<nowiki>https://<hostname-of-cream-ce>:8443/ce-cream/services</nowiki>''
  '''''<nowiki>https://<hostname-of-cream-ce>:8443/ce-cream/services</nowiki>'''''
'''
A page with link to the CREAM WSDL should be shown


Try a gsiftp (e.g. using globus-url-copy or @uberftp@@) towards that CREAM CE. E.g.:
A page with link to the CREAM WSDL should be shown


'''''$ globus-url-copy gsiftp://<hostname-of-cream-ce>/opt/glite/yaim/etc/versions/ig-yaim file:/tmp/ig-version-test''''' 
Try a gsiftp (e.g. using globus-url-copy or @uberftp@@) towards that CREAM CE. E.g.:  


Try the following command:
'''''$ globus-url-copy gsiftp://&lt;hostname-of-cream-ce&gt;/opt/glite/yaim/etc/versions/ig-yaim file:/tmp/ig-version-test''''' 


'''''$ glite-ce-allowed-submission <<hostname-of-cream-ce>>:8443'''''
Try the following command:  


It should report:
'''''$ glite-ce-allowed-submission &lt;&lt;hostname-of-cream-ce&gt;&gt;:8443'''''
 
It should report:  


  Job Submission to this CREAM CE is enabled   
  Job Submission to this CREAM CE is enabled   


Try a submission to Cream-CE using the glite-ce-job-submit command, e.g.:
Try a submission to Cream-CE using the glite-ce-job-submit command, e.g.:  


  $ /bin/cat sleep.jdl  
  $ /bin/cat sleep.jdl  
Line 67: Line 63:
  ]  
  ]  


  '''''$ glite-ce-job-submit -a -r <hostname-of-cream-ce>:8443/<queue> test.jdl'''''   
  '''''$ glite-ce-job-submit -a -r &lt;hostname-of-cream-ce&gt;:8443/&lt;queue&gt; test.jdl'''''   


  '''''$ glite-ce-job-submit -a -r ce-cr-02.ts.infn.it:8443/cream-lsf-cert sleep.jdl'''''  
  '''''$ glite-ce-job-submit -a -r ce-cr-02.ts.infn.it:8443/cream-lsf-cert sleep.jdl'''''  
  <nowiki>https://ce-cr-02.ts.infn.it:8443/CREAM127814374</nowiki>
  <nowiki>https://ce-cr-02.ts.infn.it:8443/CREAM127814374</nowiki>


Check the status of that job, which eventually should be DONE-OK
Check the status of that job, which eventually should be DONE-OK  


  '''''$ glite-ce-job-status <nowiki>https://ce-cr-02.ts.infn.it:8443/CREAM127814374</nowiki>'''''
  '''''$ glite-ce-job-status <nowiki>https://ce-cr-02.ts.infn.it:8443/CREAM127814374</nowiki>'''''
Line 78: Line 74:
   
   
  ******  JobID=<nowiki>[https://ce-cr-02.ts.infn.it:8443/CREAM127814374]</nowiki>
  ******  JobID=<nowiki>[https://ce-cr-02.ts.infn.it:8443/CREAM127814374]</nowiki>
        Status        = [DONE-OK]
        Status        = [DONE-OK]
        ExitCode      = [0]
        ExitCode      = [0]
   
   
Try a submission to that CE using the glite-ce-job-submit command, and then tries to cancel it (using the glite-ce-job-cancel command)
 
Try a submission to that CE using the glite-ce-job-submit command, and then tries to cancel it (using the glite-ce-job-cancel command)  


  $ /bin/cat sleep2.jdl  
  $ /bin/cat sleep2.jdl  
Line 99: Line 96:
   
   
  ******  JobID=<nowiki>[https://cecream-cyb.ca.infn.it:8443/CREAM126335182]</nowiki>
  ******  JobID=<nowiki>[https://cecream-cyb.ca.infn.it:8443/CREAM126335182]</nowiki>
        Status        = [CANCELLED]
        Status        = [CANCELLED]
        ExitCode      = []
        ExitCode      = []
        Description  = [Cancelled by user]
        Description  = [Cancelled by user]


==ARC CE checks==
== ARC CE checks ==
 
A first test can be done using ARC's <font face="Courier New,Courier">ngstat</font> command:


A first test can be done using ARC's <font face="Courier New,Courier">ngstat</font> command:
  '''''$ export X509_USER_PROXY=/etc/nagios/globus/userproxy.pem-ops'''''
  '''''$ export X509_USER_PROXY=/etc/nagios/globus/userproxy.pem-ops'''''
  '''''$ export LD_LIBRARY_PATH=/opt/nordugrid/lib64:/opt/nordugrid/lib'''''
  '''''$ export LD_LIBRARY_PATH=/opt/nordugrid/lib64:/opt/nordugrid/lib'''''
  '''''$ /opt/nordugrid/bin/ngstat -q -l -c <CE hostname> -t 20'''''
  '''''$ /opt/nordugrid/bin/ngstat -q -l -c &lt;CE hostname&gt; -t 20'''''
  ...
  ...
  ... plenty of output
  ... plenty of output
  ...
  ...


If a [https://tomtools.cern.ch/confluence/display/SAM/SAM+setup+for+ARC+services monitoring host of your NGI] is available, then the probes can easily be executed from there:
If a [https://tomtools.cern.ch/confluence/display/SAM/SAM+setup+for+ARC+services monitoring host of your NGI] is available, then the probes can easily be executed from there:  


Check the status of the CE with:
Check the status of the CE with:  
  '''''$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-status -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops'''''
 
  '''''$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-status -H &lt;CE hostname&gt; -x /etc/nagios/globus/userproxy.pem-ops'''''
  Status is active
  Status is active


Test gsiftp:
Test gsiftp:  
  '''''$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-auth -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops'''''
 
  '''''$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-auth -H &lt;CE hostname&gt; -x /etc/nagios/globus/userproxy.pem-ops'''''
  gsiftp OK
  gsiftp OK


Test the versions of the CA's:
Test the versions of the CA's:  
  '''''$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-caver -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops'''''
 
  '''''$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-caver -H &lt;CE hostname&gt; -x /etc/nagios/globus/userproxy.pem-ops'''''
  version = 1.38 - All CAs present
  version = 1.38 - All CAs present


Check the versions of ARC and Globus:
Check the versions of ARC and Globus:  
  '''''$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-softver -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops'''''
 
  '''''$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-softver -H &lt;CE hostname&gt; -x /etc/nagios/globus/userproxy.pem-ops'''''
  nordugrid-arc-0.8.3.1, globus-5.0.3
  nordugrid-arc-0.8.3.1, globus-5.0.3


Copy a file:
Copy a file:  
  '''''$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-gridftp -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops'''''
 
  '''''$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-gridftp -H &lt;CE hostname&gt; -x /etc/nagios/globus/userproxy.pem-ops'''''
  Job finished successfully
  Job finished successfully


Submit a test job:
Submit a test job:  
  '''''$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-jobsubmit -H <CE hostname> --vo ops -x /etc/nagios/globus/userproxy.pem-ops'''''
 
  '''''$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-jobsubmit -H &lt;CE hostname&gt; --vo ops -x /etc/nagios/globus/userproxy.pem-ops'''''
  Job submission successful
  Job submission successful


Check the LFC:
Check the LFC:  
  '''''$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-lfc -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops'''''
 
  '''''$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-lfc -H &lt;CE hostname&gt; -x /etc/nagios/globus/userproxy.pem-ops'''''
  Job finished successfully
  Job finished successfully


Check the SRM:
Check the SRM:  
  '''''$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-srm -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops'''''
 
  '''''$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-srm -H &lt;CE hostname&gt; -x /etc/nagios/globus/userproxy.pem-ops'''''
  Job finished successfully
  Job finished successfully


Before continuing, you may want to make sure that the probes for all services which the CE intends to offer, do actually succeed.
Before continuing, you may want to make sure that the probes for all services which the CE intends to offer, do actually succeed.  


==SE checks==
== SE checks ==


check if gridftp server on SE works:
check if gridftp server on SE works:  


  $ uberftp inaf-se-01.ct.pi2s2.it  
  $ uberftp inaf-se-01.ct.pi2s2.it  


For STORM SE: check if SRM client works (on the published information you can find the right port to use)
For STORM SE: check if SRM client works (on the published information you can find the right port to use)  


  $ /opt/storm/srm-clients/bin/clientSRM ping -e httpg://sunstorm.cnaf.infn.it:8444
  $ /opt/storm/srm-clients/bin/clientSRM ping -e httpg://sunstorm.cnaf.infn.it:8444
Line 171: Line 177:
     [0] value="StoRM"
     [0] value="StoRM"
     [1] key="backend_version"
     [1] key="backend_version"
     [1] value="<FE:1.5.0-1.sl4><BE:1.5.3-4.sl4>"
     [1] value="&lt;FE:1.5.0-1.sl4&gt;&lt;BE:1.5.3-4.sl4&gt;"
  ============================================================
  ============================================================


<br> Try to write on SE. Be sure your UI is pointing to an IS the SE is contained in (you may use your certification BDII)


Try to write on SE. Be sure your UI is pointing to an IS the SE is contained in (you may use your certification BDII)
1) Setting a top-bdii that is publishing the SE you have to test


1) Setting a top-bdii that is publishing the SE you have to test
$ export LCG_GFAL_INFOSYS=&lt;TopBDII hostname&gt;:2170


$ export LCG_GFAL_INFOSYS=<TopBDII hostname>:2170
2) Copy a file from the local filesystem to the SE, registering it in the LFC. This command output will return a SURL that you can use latter for other tests.


2) Copy a file from the local filesystem to the SE, registering it in the
A SURL is a path of the type: srm://srm01.ncg.ingrid.pt/ibergrid/iber/generated/2011-02-01/file4034a935-8d7a-48f4-914f-16f2634d4802
LFC. This command output will return a SURL that you can use latter for
other tests.


A SURL is a path of the type: srm://srm01.ncg.ingrid.pt/ibergrid/iber/generated/2011-02-01/file4034a935-8d7a-48f4-914f-16f2634d4802
$ lcg-cr -v --vo &lt;VO&gt; -d &lt;Your SE&gt; -l lfn:/grid/&lt;VO&gt;/test.txt file:&lt;/path/to/your/local/file&gt;


$ lcg-cr -v --vo <VO> -d <Your SE> -l lfn:/grid/<VO>/test.txt file:</path/to/your/local/file>
3) Create a new replica in other SE (to check the 3rd party transfer between 2 SEs)


3) Create a new replica in other SE (to check the 3rd party transfer between 2 SEs)
$ lcg-rep -v --vo &lt;VO&gt; -d &lt;Other SE&gt; &lt;SURL&gt;


$ lcg-rep -v --vo <VO> -d <Other SE> <SURL>
4) List Replicas


4) List Replicas
$ lcg-lr -v --vo &lt;VO&gt; lfn:/grid/&lt;VO&gt;/test.txt


$ lcg-lr -v --vo <VO> lfn:/grid/<VO>/test.txt
5) Delete all replicas


5) Delete all replicas
$ lcg-del -v --vo &lt;VO&gt; -a &lt;guid&gt;


$ lcg-del -v --vo <VO> -a <guid>
== Job submission ==


==Job submission==
Submit a test job to either '''lcg-CE''' or '''Cream-CE''' through the '''WMS''', i.e. using the '''glite-wms-job-submit''' command. In case, submit a mpi test job. The NGI_IT certification WMS is gridit-cert-wms.cnaf.infn.it


Submit a test job to either '''lcg-CE''' or '''Cream-CE''' through the '''WMS''', i.e. using the '''glite-wms-job-submit''' command. In case, submit a mpi test job. The NGI_IT certification WMS is gridit-cert-wms.cnaf.infn.it
== Registration into 1st level HLR ==


==Registration into 1st level HLR==
'''NOTE: this step is needed if your infrastructure uses DGAS as accounting system'''


'''NOTE: this step is needed if your infrastructure uses DGAS as accounting system'''
After the site entered in production, it needs to register the site resources in the hlr. Ask the site-admins to open a ticket towards the hlr adminstrators, passing them the following information:  


After the site entered in production, it needs to register the site resources in the hlr.
*grid queues names, in the form:
Ask the site-admins to open a ticket towards the hlr adminstrators, passing them the following information:
**gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert


* grid queues names, in the form:
*not-grid queues names, in the form:  
** gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert
**hostname:queue


* not-grid queues names, in the form:
*Name, surname ad certificate subject of each site-admin
** hostname:queue
*Certificate subject of Computing Element


* Name, surname ad certificate subject of each site-admin
Eventually, the site-admins have to open a ticket to DGAS support unit asking to enable the forwarding of accounting data from the 2° level hlr to APEL
* Certificate subject of Computing Element


Eventually, the site-admins have to open a ticket to DGAS support unit asking to enable the forwarding of accounting data from the 2° level hlr to APEL
== Certification Job ==


==Certification Job==
The [[Cert Job|test job]] checks several things, like the environment on WN and installed rpms. Moreover it performs some replica management tests. With a "grep TEST" you may get a summary of the results: in case of errors, you have to see in detail what is gone wrong!


The [[Cert_Job|test job]] checks several things, like the environment on WN and installed rpms. Moreover it performs some replica management tests.
As already said, if the site supports any flavour of mpi, launch a mpi test job, like [[SiteCertMan/MPI Job Cert|this]]  
With a "grep TEST" you may get a summary of the results: in case of errors, you have to see in detail what is gone wrong!


As already said, if the site supports any flavour of mpi, launch a mpi test job, like [[SiteCertMan/MPI_Job_Cert|this]]
don't forget to set a reasonable value in ''CPUNumber'': most important is that your job starts running quickly


don't forget to set a reasonable value in ''CPUNumber'': most important is that your job starts running quickly
If you want less stuff in the .out and .err files, in the file mpi-start-wrapper.sh comment the line  
 
If you want less stuff in the .out and .err files, in the file mpi-start-wrapper.sh comment the line


  export I2G_MPI_START_DEBUG=1  
  export I2G_MPI_START_DEBUG=1  


A successful output will look like the following one (extract)
A successful output will look like the following one (extract)  


  [...]  
  [...]  
  mpi-start [DEBUG  ]: using user supplied startup : '/opt/mpich-1.2.7p1/bin/mpirun '  
  mpi-start [DEBUG  ]: using user supplied startup&nbsp;: '/opt/mpich-1.2.7p1/bin/mpirun '  
  mpi-start [DEBUG  ]: => MPI_SPECIFIC_PARAMS=  
  mpi-start [DEBUG  ]: =&gt; MPI_SPECIFIC_PARAMS=  
  mpi-start [DEBUG  ]: => I2G_MPI_PRECOMMAND=  
  mpi-start [DEBUG  ]: =&gt; I2G_MPI_PRECOMMAND=  
  mpi-start [DEBUG  ]: => MPIEXEC=/opt/mpich-1.2.7p1/bin/mpirun  
  mpi-start [DEBUG  ]: =&gt; MPIEXEC=/opt/mpich-1.2.7p1/bin/mpirun  
  mpi-start [DEBUG  ]: => I2G_MACHINEFILE_AND_NP=-machinefile /tmp/tmp.iBypc12521 -np 6  
  mpi-start [DEBUG  ]: =&gt; I2G_MACHINEFILE_AND_NP=-machinefile /tmp/tmp.iBypc12521 -np 6  
  mpi-start [DEBUG  ]: => I2G_MPI_APPLICATION=/home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello  
  mpi-start [DEBUG  ]: =&gt; I2G_MPI_APPLICATION=/home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello  
  mpi-start [DEBUG  ]: => I2G_MPI_APPLICATION_ARGS=  
  mpi-start [DEBUG  ]: =&gt; I2G_MPI_APPLICATION_ARGS=  
  mpi-start [DEBUG  ]: /opt/mpich-1.2.7p1/bin/mpirun -machinefile /tmp/tmp.iBypc12521 -np 6 /home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello  
  mpi-start [DEBUG  ]: /opt/mpich-1.2.7p1/bin/mpirun -machinefile /tmp/tmp.iBypc12521 -np 6 /home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello  
  Process 4 on t3-wn-37.pn.pd.infn.it out of 6  
  Process 4 on t3-wn-37.pn.pd.infn.it out of 6  
Line 255: Line 256:
  [...]
  [...]


Back to Site Certification GIIS Check [https://wiki.egi.eu/wiki/Operations/HOWTO03 HOWTO03]
Back to Site Certification GIIS Check [https://wiki.egi.eu/wiki/Operations/HOWTO03 HOWTO03]  


Back to Resource Centre registration and certification procedure [https://wiki.egi.eu/wiki/PROC09#Resource_Centre_certification PROC09]
Back to Resource Centre registration and certification procedure [https://wiki.egi.eu/wiki/PROC09#Resource_Centre_certification PROC09]  


= Revision history =
= Revision history =
{| border="1" cellspacing="0" cellpadding="5" align="center"
 
! Version
{| cellspacing="0" cellpadding="5" border="1" align="center"
! Authors
|-
! Date
! Version  
! Authors  
! Date  
! Comments
! Comments
|-
|-
| 1.0
| 1.0  
| Alessandro Paolini
| Alessandro Paolini  
| 2010-12-15
| 2010-12-15  
| first draft
| first draft
|-
|-
| 1.1
| 1.1  
| Alessandro Paolini
| Alessandro Paolini  
| 2010-12-16
| 2010-12-16  
| added links to certification job pages
| added links to certification job pages
|-
|-
| 1.2
| 1.2  
| Alessandro Paolini
| Alessandro Paolini  
| 2011-06-08
| 2011-06-08  
| added some other lcg-utils test
| added some other lcg-utils test
|-
|}
 
[[Category:Operations_Manuals]]

Revision as of 15:10, 5 October 2011

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators



Check the functionality of the grid elements

Be sure that the site's GIIS URL is contained in the Top level BDII/Information System your NGI will use for your certification.


Note that the examples here use the Italian NGI and sites. Please substitute YOUR OWN NGI and site credentials when running the test.

lcg-CE checks

Verify the authentication and authorization on the CE by running a simple command. i.e.

$ globus-job-run inaf-ce-01.ct.pi2s2.it /bin/hostname 

(you could also use: /usr/bin/whoami, or whatever you want!!)

Check if the lcg-CE gridftp server is working

$ globus-url-copy -dbg -v -vb file:/home/csys/goncalo/teste.txt gsiftp://ce02.lip.pt/tmp/txt
$ uberftp ce02.lip.pt


In case of pbs, check the WNs with the following command:

$ globus-job-run pbs-enmr.cerm.unifi.it /usr/bin/pbsnodes -a

Verify the functioning of the batch system: be careful that the queue you are querying really exists, and your VO is enabled on it. For example:

$ globus-job-run ce02.lip.pt:2119/jobmanager-fork /bin/pwd
$ globus-job-run ce-cyb.ca.infn.it/jobmanager-lcglsf -queue poncert /bin/pwd 

check dgas processes on CE (with a ps ax| grep dgas)

Cream-CE checks

Open your browser to

https://<hostname-of-cream-ce>:8443/ce-cream/services

A page with link to the CREAM WSDL should be shown

Try a gsiftp (e.g. using globus-url-copy or @uberftp@@) towards that CREAM CE. E.g.:

$ globus-url-copy gsiftp://<hostname-of-cream-ce>/opt/glite/yaim/etc/versions/ig-yaim file:/tmp/ig-version-test  

Try the following command:

$ glite-ce-allowed-submission <<hostname-of-cream-ce>>:8443 

It should report:

Job Submission to this CREAM CE is enabled  

Try a submission to Cream-CE using the glite-ce-job-submit command, e.g.:

$ /bin/cat sleep.jdl 
 
[ 
executable="/bin/sleep"; 
arguments="1"; 
] 
$ glite-ce-job-submit -a -r <hostname-of-cream-ce>:8443/<queue> test.jdl  
$ glite-ce-job-submit -a -r ce-cr-02.ts.infn.it:8443/cream-lsf-cert sleep.jdl 
https://ce-cr-02.ts.infn.it:8443/CREAM127814374

Check the status of that job, which eventually should be DONE-OK

$ glite-ce-job-status https://ce-cr-02.ts.infn.it:8443/CREAM127814374
2010-07-27 11:55:37,986 WARN - No configuration file suitable for loading. Using built-in configuration

******  JobID=[https://ce-cr-02.ts.infn.it:8443/CREAM127814374]
       Status        = [DONE-OK]
       ExitCode      = [0]

Try a submission to that CE using the glite-ce-job-submit command, and then tries to cancel it (using the glite-ce-job-cancel command)

$ /bin/cat sleep2.jdl 
 
[ 
executable="/bin/sleep"; 
arguments="1000"; 
] 
$ glite-ce-job-submit -a -r cecream-cyb.ca.infn.it:8443/cream-lsf-poncert sleep2.jdl
https://cecream-cyb.ca.infn.it:8443/CREAM126335182
$ glite-ce-job-cancel https://cecream-cyb.ca.infn.it:8443/CREAM126335182
$ glite-ce-job-status https://cecream-cyb.ca.infn.it:8443/CREAM126335182
2010-07-27 12:18:26,973 WARN - No configuration file suitable for loading. Using built-in configuration

******  JobID=[https://cecream-cyb.ca.infn.it:8443/CREAM126335182]
       Status        = [CANCELLED]
       ExitCode      = []
       Description   = [Cancelled by user]

ARC CE checks

A first test can be done using ARC's ngstat command:

$ export X509_USER_PROXY=/etc/nagios/globus/userproxy.pem-ops
$ export LD_LIBRARY_PATH=/opt/nordugrid/lib64:/opt/nordugrid/lib
$ /opt/nordugrid/bin/ngstat -q -l -c <CE hostname> -t 20
...
... plenty of output
...

If a monitoring host of your NGI is available, then the probes can easily be executed from there:

Check the status of the CE with:

$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-status -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops
Status is active

Test gsiftp:

$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-auth -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops
gsiftp OK

Test the versions of the CA's:

$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-caver -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops
version = 1.38 - All CAs present

Check the versions of ARC and Globus:

$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-softver -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops
nordugrid-arc-0.8.3.1, globus-5.0.3

Copy a file:

$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-gridftp -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops
Job finished successfully

Submit a test job:

$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-jobsubmit -H <CE hostname> --vo ops -x /etc/nagios/globus/userproxy.pem-ops
Job submission successful

Check the LFC:

$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-lfc -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops
Job finished successfully

Check the SRM:

$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-srm -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops
Job finished successfully

Before continuing, you may want to make sure that the probes for all services which the CE intends to offer, do actually succeed.

SE checks

check if gridftp server on SE works:

$ uberftp inaf-se-01.ct.pi2s2.it 

For STORM SE: check if SRM client works (on the published information you can find the right port to use)

$ /opt/storm/srm-clients/bin/clientSRM ping -e httpg://sunstorm.cnaf.infn.it:8444
============================================================
Sending Ping request to: httpg://sunstorm.cnaf.infn.it:8444
============================================================
Request status:
  statusCode="SRM_SUCCESS"(0)
  explanation="SRM server successfully contacted"
============================================================
SRM Response:
  versionInfo="v2.2"
  otherInfo (size=2)
    [0] key="backend_type"
    [0] value="StoRM"
    [1] key="backend_version"
    [1] value="<FE:1.5.0-1.sl4><BE:1.5.3-4.sl4>"
============================================================


Try to write on SE. Be sure your UI is pointing to an IS the SE is contained in (you may use your certification BDII)

1) Setting a top-bdii that is publishing the SE you have to test

$ export LCG_GFAL_INFOSYS=<TopBDII hostname>:2170

2) Copy a file from the local filesystem to the SE, registering it in the LFC. This command output will return a SURL that you can use latter for other tests.

A SURL is a path of the type: srm://srm01.ncg.ingrid.pt/ibergrid/iber/generated/2011-02-01/file4034a935-8d7a-48f4-914f-16f2634d4802

$ lcg-cr -v --vo <VO> -d <Your SE> -l lfn:/grid/<VO>/test.txt file:</path/to/your/local/file> 

3) Create a new replica in other SE (to check the 3rd party transfer between 2 SEs)

$ lcg-rep -v --vo <VO> -d <Other SE> <SURL> 

4) List Replicas

$ lcg-lr -v --vo <VO> lfn:/grid/<VO>/test.txt 

5) Delete all replicas

$ lcg-del -v --vo <VO> -a <guid>

Job submission

Submit a test job to either lcg-CE or Cream-CE through the WMS, i.e. using the glite-wms-job-submit command. In case, submit a mpi test job. The NGI_IT certification WMS is gridit-cert-wms.cnaf.infn.it

Registration into 1st level HLR

NOTE: this step is needed if your infrastructure uses DGAS as accounting system

After the site entered in production, it needs to register the site resources in the hlr. Ask the site-admins to open a ticket towards the hlr adminstrators, passing them the following information:

  • grid queues names, in the form:
    • gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert
  • not-grid queues names, in the form:
    • hostname:queue
  • Name, surname ad certificate subject of each site-admin
  • Certificate subject of Computing Element

Eventually, the site-admins have to open a ticket to DGAS support unit asking to enable the forwarding of accounting data from the 2° level hlr to APEL

Certification Job

The test job checks several things, like the environment on WN and installed rpms. Moreover it performs some replica management tests. With a "grep TEST" you may get a summary of the results: in case of errors, you have to see in detail what is gone wrong!

As already said, if the site supports any flavour of mpi, launch a mpi test job, like this

don't forget to set a reasonable value in CPUNumber: most important is that your job starts running quickly

If you want less stuff in the .out and .err files, in the file mpi-start-wrapper.sh comment the line

export I2G_MPI_START_DEBUG=1 

A successful output will look like the following one (extract)

[...] 
mpi-start [DEBUG  ]: using user supplied startup : '/opt/mpich-1.2.7p1/bin/mpirun ' 
mpi-start [DEBUG  ]: => MPI_SPECIFIC_PARAMS= 
mpi-start [DEBUG  ]: => I2G_MPI_PRECOMMAND= 
mpi-start [DEBUG  ]: => MPIEXEC=/opt/mpich-1.2.7p1/bin/mpirun 
mpi-start [DEBUG  ]: => I2G_MACHINEFILE_AND_NP=-machinefile /tmp/tmp.iBypc12521 -np 6 
mpi-start [DEBUG  ]: => I2G_MPI_APPLICATION=/home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello 
mpi-start [DEBUG  ]: => I2G_MPI_APPLICATION_ARGS= 
mpi-start [DEBUG  ]: /opt/mpich-1.2.7p1/bin/mpirun -machinefile /tmp/tmp.iBypc12521 -np 6 /home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello 
Process 4 on t3-wn-37.pn.pd.infn.it out of 6 
Process 3 on t3-wn-34.pn.pd.infn.it out of 6 
Process 1 on t3-wn-13.pn.pd.infn.it out of 6 
Process 2 on t3-wn-34.pn.pd.infn.it out of 6 
Process 5 on t3-wn-37.pn.pd.infn.it out of 6 
Process 0 on t3-wn-13.pn.pd.infn.it out of 6 
[...]

Back to Site Certification GIIS Check HOWTO03

Back to Resource Centre registration and certification procedure PROC09

Revision history

Version Authors Date Comments
1.0 Alessandro Paolini 2010-12-15 first draft
1.1 Alessandro Paolini 2010-12-16 added links to certification job pages
1.2 Alessandro Paolini 2011-06-08 added some other lcg-utils test