Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "HOWTO04 Site Certification Manual tests"

From EGIWiki
Jump to navigation Jump to search
(Deprecate page)
Tag: Replaced
 
(82 intermediate revisions by 13 users not shown)
Line 1: Line 1:
=Check the functionality of the grid elements=
{{Template: Op menubar}} {{Template:Doc_menubar}}


Be sure that the site's GIIS URL is contained in the BDII you use for certification  <span style="background:yellow"> needs some clarification. which BDII? what is meant by "you use for certification"? </span> (<span style="background:yellow">It means a top-BDII containing also uncertified sites that each NGI can provide and use for site certification purpose</span>)
{{DeprecatedAndMovedTo|new_location=https://docs.egi.eu/providers/operations-manuals/howto04_site_certification_manual_tests/}}


Note that the examples here use the Italian NGI and sites.  Please substitute '''YOUR OWN''' NGI and site credentials when running the test.
[[Category:Operations_Manuals]]
 
==lcg-CE checks==
 
<span style="background:yellow"> should we mention lcg - or should this be glite?  </span> (<span style="background:yellow"> lcg-CE is the name of one of the computing elements currently supported, glite-CE is another one no more existing </span>)
 
Verify the authentication and authorization on the CE by running a simple command. i.e.
 
'''''$ globus-job-run inaf-ce-01.ct.pi2s2.it /bin/hostname'''''
(you could also use: /usr/bin/whoami, or whatever you want!!)
 
Check if the lcg-CE gridftp server is working
 
'''''$ globus-url-copy -dbg -v -vb file:/home/csys/goncalo/teste.txt gsiftp://ce02.lip.pt/tmp/txt'''''
 
'''''$ uberftp ce02.lip.pt'''''
 
 
In case of pbs, check the WNs with the following command:
 
'''''$ globus-job-run pbs-enmr.cerm.unifi.it /usr/bin/pbsnodes -a'''''
 
Verify the functioning of the batch system: be careful that the queue you are querying really exists, and your VO is enabled on it. For example:
 
'''''$ globus-job-run ce02.lip.pt:2119/jobmanager-fork /bin/pwd'''''
 
'''''$ globus-job-run ce-cyb.ca.infn.it/jobmanager-lcglsf -queue poncert /bin/pwd'''''
 
check dgas processes on CE (with a ps ax| grep dgas)
 
==Cream-CE checks==
 
Open your browser to
 
'''''<nowiki>https://<hostname-of-cream-ce>:8443/ce-cream/services</nowiki>''
'''
A page with link to the CREAM WSDL should be shown
 
Try a gsiftp (e.g. using globus-url-copy or @uberftp@@) towards that CREAM CE. E.g.:
 
'''''$ globus-url-copy gsiftp://<hostname-of-cream-ce>/opt/glite/yaim/etc/versions/ig-yaim file:/tmp/ig-version-test''''' 
 
Try the following command:
 
'''''$ glite-ce-allowed-submission <<hostname-of-cream-ce>>:8443'''''
 
It should report:
 
Job Submission to this CREAM CE is enabled 
 
Try a submission to Cream-CE using the glite-ce-job-submit command, e.g.:
 
$ /bin/cat sleep.jdl
 
[
executable="/bin/sleep";
arguments="1";
]
 
'''''$ glite-ce-job-submit -a -r <hostname-of-cream-ce>:8443/<queue> test.jdl''''' 
 
'''''$ glite-ce-job-submit -a -r ce-cr-02.ts.infn.it:8443/cream-lsf-cert sleep.jdl'''''
<nowiki>https://ce-cr-02.ts.infn.it:8443/CREAM127814374</nowiki>
 
Check the status of that job, which eventually should be DONE-OK
 
'''''$ glite-ce-job-status <nowiki>https://ce-cr-02.ts.infn.it:8443/CREAM127814374</nowiki>'''''
2010-07-27 11:55:37,986 WARN - No configuration file suitable for loading. Using built-in configuration
******  JobID=<nowiki>[https://ce-cr-02.ts.infn.it:8443/CREAM127814374]</nowiki>
        Status        = [DONE-OK]
        ExitCode      = [0]
Try a submission to that CE using the glite-ce-job-submit command, and then tries to cancel it (using the glite-ce-job-cancel command)
 
$ /bin/cat sleep2.jdl
 
[
executable="/bin/sleep";
arguments="1000";
]
 
$ glite-ce-job-submit -a -r cecream-cyb.ca.infn.it:8443/cream-lsf-poncert sleep2.jdl
<nowiki>https://cecream-cyb.ca.infn.it:8443/CREAM126335182</nowiki>
 
$ glite-ce-job-cancel <nowiki>https://cecream-cyb.ca.infn.it:8443/CREAM126335182</nowiki>
 
$ glite-ce-job-status <nowiki>https://cecream-cyb.ca.infn.it:8443/CREAM126335182</nowiki>
2010-07-27 12:18:26,973 WARN - No configuration file suitable for loading. Using built-in configuration
******  JobID=<nowiki>[https://cecream-cyb.ca.infn.it:8443/CREAM126335182]</nowiki>
        Status        = [CANCELLED]
        ExitCode      = []
        Description  = [Cancelled by user]
 
==SE checks==
 
check if gridftp server on SE works:
 
$ uberftp inaf-se-01.ct.pi2s2.it
 
For STORM SE: check if SRM client works (on the published information you can find the right port to use)
 
$ /opt/storm/srm-clients/bin/clientSRM ping -e httpg://sunstorm.cnaf.infn.it:8444
============================================================
Sending Ping request to: httpg://sunstorm.cnaf.infn.it:8444
============================================================
Request status:
  statusCode="SRM_SUCCESS"(0)
  explanation="SRM server successfully contacted"
============================================================
SRM Response:
  versionInfo="v2.2"
  otherInfo (size=2)
    [0] key="backend_type"
    [0] value="StoRM"
    [1] key="backend_version"
    [1] value="<FE:1.5.0-1.sl4><BE:1.5.3-4.sl4>"
============================================================
 
 
<span style="background:yellow">Add examples with srm commands (ex: srmls, srmcp and srmrm)
 
Try to write on SE. Be sure your UI is pointing to an IS the SE is contained in (you may use your certification BDII), i.e.
 
$ export LCG_GFAL_INFOSYS=gridit-bdii-01.cnaf.infn.it:2170
 
$ lcg-cr -v --vo glast.org -d storm-fe-cg.cr.cnaf.infn.it -l lfn:/grid/glast.org/wfug.jdl file:/home/paolini/rank.jdl
$ lcg-del -v --vo glast.org -a <guid>
 
==Job submission==
 
Submit a test job to either '''lcg-CE''' or '''Cream-CE''' through the '''WMS''', i.e. using the '''glite-wms-job-submit''' command. In case, submit a mpi test job. Our certification WMS is gridit-cert-wms.cnaf.infn.it
 
==Registration into 1st level HLR==
 
'''NOTE: this step is needed if your infrastructure uses DGAS as accounting system'''
 
After the site entered in production, it needs to register the site resources in the hlr.
Ask the site-admins to open a ticket towards the hlr adminstrators, passing them the following information:
 
* grid queues names, in the form:
** gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert
 
* not-grid queues names, in the form:
** hostname:queue
 
* Name, surname ad certificate subject of each site-admin
* Certificate subject of Computing Element
 
Eventually, the site-admins have to open a ticket to DGAS support unit asking to enable the forwarding of accounting data from the 2° level hlr to APEL
 
==Certication Job==
 
The [[Cert_Job|test job]] cheks several things, like the environment on WN and rpms installed. Moreover it performs some replica managements test.
With a "grep TEST" you may get a summary of the results: in case of errors, you have to see in detail what is gone wrong!
 
As already said, if the site supports any flavour of mpi, launch a mpi test job, like <span style="background:yellow">[[SiteCertMan/MPI_Job_Cert|this]]</span>
 
don't forget to set a reasonable value in ''CPUNumber'': the important is that your job will go soon in running
 
If you want less stuff in the .out and .err files, in the file mpi-start-wrapper.sh comment the line
 
export I2G_MPI_START_DEBUG=1
 
A successful output will look like the following one (extract)
 
[...]
mpi-start [DEBUG  ]: using user supplied startup : '/opt/mpich-1.2.7p1/bin/mpirun '
mpi-start [DEBUG  ]: => MPI_SPECIFIC_PARAMS=
mpi-start [DEBUG  ]: => I2G_MPI_PRECOMMAND=
mpi-start [DEBUG  ]: => MPIEXEC=/opt/mpich-1.2.7p1/bin/mpirun
mpi-start [DEBUG  ]: => I2G_MACHINEFILE_AND_NP=-machinefile /tmp/tmp.iBypc12521 -np 6
mpi-start [DEBUG  ]: => I2G_MPI_APPLICATION=/home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello
mpi-start [DEBUG  ]: => I2G_MPI_APPLICATION_ARGS=
mpi-start [DEBUG  ]: /opt/mpich-1.2.7p1/bin/mpirun -machinefile /tmp/tmp.iBypc12521 -np 6 /home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello
Process 4 on t3-wn-37.pn.pd.infn.it out of 6
Process 3 on t3-wn-34.pn.pd.infn.it out of 6
Process 1 on t3-wn-13.pn.pd.infn.it out of 6
Process 2 on t3-wn-34.pn.pd.infn.it out of 6
Process 5 on t3-wn-37.pn.pd.infn.it out of 6
Process 0 on t3-wn-13.pn.pd.infn.it out of 6
[...]
 
Back to [[SiteCertMan/GIIS_BDII_check]]
 
Back to [[SiteCertMan#Site_certification_procedure]]
 
= Revision history =
{| border="1" cellspacing="0" cellpadding="5" align="center"
! Version
! Authors
! Date
! Comments
|-
| 1.0
| Alessandro Paolini
| 2010-12-15
| first draft
|-
| 1.1
| Alessandro Paolini
| 2010-12-16
| added links to certification job pages
|-

Latest revision as of 14:51, 25 August 2021