Difference between revisions of "HOWTO04 Site Certification Manual tests"
Line 580: | Line 580: | ||
*Create a block storage entity | *Create a block storage entity | ||
[spinto@w43asd ~]$ occi --endpoint https://carach5.ics.muni.cz:11443 --auth x509 --voms --user-cred /tmp/x509up_u500 -a create -r storage -t occi.storage.size='num(1)' | [spinto@w43asd ~]$ occi --endpoint https://carach5.ics.muni.cz:11443 --auth x509 --voms --user-cred /tmp/x509up_u500 -a create -r storage -t occi.storage.size='num(1)' -t occi.core.title='volatile-disk-test1' | ||
https://carach5.ics.muni.cz:11443/storage/195 | https://carach5.ics.muni.cz:11443/storage/195 | ||
Revision as of 10:27, 29 August 2014
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
This page provides insructions how to test manualy functionality of the grid and cloud elements.
This check is mandatory for sites which wants to be included into EGI Production infrastructure.
Check the functionality of the grid elements
Be sure that the site's GIIS URL is contained in the Top level BDII/Information System your NGI will use for your certification.
Note that the examples here use the Italian NGI and sites. Please substitute YOUR OWN NGI and site credentials when running the test.
Cream-CE checks
Open your browser to
https://<hostname-of-cream-ce>:8443/ce-cream/services
A page with link to the CREAM WSDL should be shown
Try a gsiftp (e.g. using globus-url-copy or @uberftp@@) towards that CREAM CE. E.g.:
$ globus-url-copy gsiftp://<hostname-of-cream-ce>/opt/glite/yaim/etc/versions/ig-yaim file:/tmp/ig-version-test
Try the following command:
$ glite-ce-allowed-submission <<hostname-of-cream-ce>>:8443
It should report:
Job Submission to this CREAM CE is enabled
Try a submission to Cream-CE using the glite-ce-job-submit command, e.g.:
$ /bin/cat sleep.jdl [ executable="/bin/sleep"; arguments="1"; ]
$ glite-ce-job-submit -a -r <hostname-of-cream-ce>:8443/<queue> test.jdl
$ glite-ce-job-submit -a -r ce-cr-02.ts.infn.it:8443/cream-lsf-cert sleep.jdl https://ce-cr-02.ts.infn.it:8443/CREAM127814374
Check the status of that job, which eventually should be DONE-OK
$ glite-ce-job-status https://ce-cr-02.ts.infn.it:8443/CREAM127814374 2010-07-27 11:55:37,986 WARN - No configuration file suitable for loading. Using built-in configuration ****** JobID=[https://ce-cr-02.ts.infn.it:8443/CREAM127814374] Status = [DONE-OK] ExitCode = [0]
Try a submission to that CE using the glite-ce-job-submit command, and then tries to cancel it (using the glite-ce-job-cancel command)
$ /bin/cat sleep2.jdl [ executable="/bin/sleep"; arguments="1000"; ]
$ glite-ce-job-submit -a -r cecream-cyb.ca.infn.it:8443/cream-lsf-poncert sleep2.jdl https://cecream-cyb.ca.infn.it:8443/CREAM126335182
$ glite-ce-job-cancel https://cecream-cyb.ca.infn.it:8443/CREAM126335182
$ glite-ce-job-status https://cecream-cyb.ca.infn.it:8443/CREAM126335182 2010-07-27 12:18:26,973 WARN - No configuration file suitable for loading. Using built-in configuration ****** JobID=[https://cecream-cyb.ca.infn.it:8443/CREAM126335182] Status = [CANCELLED] ExitCode = [] Description = [Cancelled by user]
ARC CE checks
A first test can be done using ARC's ngstat command:
$ export X509_USER_PROXY=/etc/nagios/globus/userproxy.pem-ops $ export LD_LIBRARY_PATH=/opt/nordugrid/lib64:/opt/nordugrid/lib $ /opt/nordugrid/bin/ngstat -q -l -c <CE hostname> -t 20 ... ... plenty of output ...
If a monitoring host of your NGI is available, then the probes can easily be executed from there:
Check the status of the CE with:
$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-status -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops Status is active
Test gsiftp:
$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-auth -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops gsiftp OK
Test the versions of the CA's:
$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-caver -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops version = 1.38 - All CAs present
Check the versions of ARC and Globus:
$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-softver -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops nordugrid-arc-0.8.3.1, globus-5.0.3
Copy a file:
$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-gridftp -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops Job finished successfully
Submit a test job:
$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-jobsubmit -H <CE hostname> --vo ops -x /etc/nagios/globus/userproxy.pem-ops Job submission successful
Check the LFC:
$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-lfc -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops Job finished successfully
Check the SRM:
$ /usr/libexec/grid-monitoring/probes/org.ndgf/ARCCE-srm -H <CE hostname> -x /etc/nagios/globus/userproxy.pem-ops Job finished successfully
Before continuing, you may want to make sure that the probes for all services which the CE intends to offer, do actually succeed.
SE checks
check if gridftp server on SE works:
$ uberftp inaf-se-01.ct.pi2s2.it
For STORM SE: check if SRM client works (on the published information you can find the right port to use)
$ /opt/storm/srm-clients/bin/clientSRM ping -e httpg://sunstorm.cnaf.infn.it:8444 ============================================================ Sending Ping request to: httpg://sunstorm.cnaf.infn.it:8444 ============================================================ Request status: statusCode="SRM_SUCCESS"(0) explanation="SRM server successfully contacted" ============================================================ SRM Response: versionInfo="v2.2" otherInfo (size=2) [0] key="backend_type" [0] value="StoRM" [1] key="backend_version" [1] value="<FE:1.5.0-1.sl4><BE:1.5.3-4.sl4>" ============================================================
Try to write on SE. Be sure your UI is pointing to an IS the SE is contained in (you may use your certification BDII)
1) Setting a top-bdii that is publishing the SE you have to test
$ export LCG_GFAL_INFOSYS=<TopBDII hostname>:2170
2) Copy a file from the local filesystem to the SE, registering it in the LFC. This command output will return a SURL that you can use latter for other tests.
A SURL is a path of the type: srm://srm01.ncg.ingrid.pt/ibergrid/iber/generated/2011-02-01/file4034a935-8d7a-48f4-914f-16f2634d4802
$ lcg-cr -v --vo <VO> -d <Your SE> -l lfn:/grid/<VO>/test.txt file:</path/to/your/local/file>
3) Create a new replica in other SE (to check the 3rd party transfer between 2 SEs)
$ lcg-rep -v --vo <VO> -d <Other SE> <SURL>
4) List Replicas
$ lcg-lr -v --vo <VO> lfn:/grid/<VO>/test.txt
5) Delete all replicas
$ lcg-del -v --vo <VO> -a <guid>
Job submission
Submit a test job to either lcg-CE or Cream-CE through the WMS, i.e. using the glite-wms-job-submit command. In case, submit a mpi test job. The NGI_IT certification WMS is gridit-cert-wms.cnaf.infn.it
Registration into 1st level HLR
NOTE: this step is needed if your infrastructure uses DGAS as accounting system
After the site entered in production, it needs to register the site resources in the hlr. Ask the site-admins to open a ticket towards the hlr adminstrators, passing them the following information:
- grid queues names, in the form:
- gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert
- not-grid queues names, in the form:
- hostname:queue
- Name, surname ad certificate subject of each site-admin
- Certificate subject of Computing Element
Eventually, the site-admins have to open a ticket to DGAS support unit asking to enable the forwarding of accounting data from the 2° level hlr to APEL
Certification Job
The test job checks several things, like the environment on WN and installed rpms. Moreover it performs some replica management tests. With a "grep TEST" you may get a summary of the results: in case of errors, you have to see in detail what is gone wrong!
As already said, if the site supports any flavour of mpi, launch a mpi test job, like this
don't forget to set a reasonable value in CPUNumber: most important is that your job starts running quickly
If you want less stuff in the .out and .err files, in the file mpi-start-wrapper.sh comment the line
export I2G_MPI_START_DEBUG=1
A successful output will look like the following one (extract)
[...] mpi-start [DEBUG ]: using user supplied startup : '/opt/mpich-1.2.7p1/bin/mpirun ' mpi-start [DEBUG ]: => MPI_SPECIFIC_PARAMS= mpi-start [DEBUG ]: => I2G_MPI_PRECOMMAND= mpi-start [DEBUG ]: => MPIEXEC=/opt/mpich-1.2.7p1/bin/mpirun mpi-start [DEBUG ]: => I2G_MACHINEFILE_AND_NP=-machinefile /tmp/tmp.iBypc12521 -np 6 mpi-start [DEBUG ]: => I2G_MPI_APPLICATION=/home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello mpi-start [DEBUG ]: => I2G_MPI_APPLICATION_ARGS= mpi-start [DEBUG ]: /opt/mpich-1.2.7p1/bin/mpirun -machinefile /tmp/tmp.iBypc12521 -np 6 /home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello Process 4 on t3-wn-37.pn.pd.infn.it out of 6 Process 3 on t3-wn-34.pn.pd.infn.it out of 6 Process 1 on t3-wn-13.pn.pd.infn.it out of 6 Process 2 on t3-wn-34.pn.pd.infn.it out of 6 Process 5 on t3-wn-37.pn.pd.infn.it out of 6 Process 0 on t3-wn-13.pn.pd.infn.it out of 6 [...]
Globus checks
These checks should be executed depending on the services registered in GOCDB under a Resource Centre. Not all services are compulsory for a RC, but upon registration of new ones, the corresponding tests should be executed.
GSISSH
Initialize grid proxy and check if GSISSH works:
$ grid-proxy-init $ gsissh USER@HOST -p 2222 /bin/date (Debug with: USER@HOST -vvv -p 2222 /bin/date)
GridFTP
Check if upload works:
$ globus-url-copy file:/tmp/test.txt gsiftp://HOST:2811/tmp/test.txt (Debug with: globus-url-copy -dbg -v -vb file:/tmp/test.txt gsiftp://HOST:2811/tmp/test.txt)
Check if download works:
$ globus-url-copy gsiftp://HOST:2811/tmp/test.txt file:/tmp/test.txt (Debug with: globus-url-copy -dbg -v -vb gsiftp://HOST:2811/tmp/test.txt file:/tmp/test.txt)
Delete the remote file:
$ uberftp HOST 'rm /tmp/test.txt' (Debug with: uberftp HOST 'rm /tmp/test.txt' -debug 3)
GRAM
Check authentication:
$ globusrun -a -r HOST:2119
Check job submission:
$ globusrun -s -r HOST:2119 "&(executable="/bin/date")"
Unicore checks
This testing manual assumes that the test instance has not been added to the “Global” registry. “Global” registry does not have to be global (for the whole infrastructure) - is a register used by a group of site which work together. For example each Resource Infrastructure Provider can have own “Global” registry.
It is suggested to add the instance to the “Global” registry only if it was tested and works properly. For this reason this instruction refers to the local registry.
Preliminary testing
After installation and configuration, start all the services and see if functioning properly. To avoid errors/warnings in the logs first start the TSI and the Gateway and then the Unicore/X (requires two other servers to operate).
The first step of verification is to verify proper configuration of log files for all services whether they running. Logs for Unicore/X and Gateway are in standard locations /var/log/unicore/unicorex/unicorex.log and /var/log/unicore/gateway/gateway.log. In the case where there is no log file, check the file /var/log/unicore/unicorex/unicorex-startup.log or /var/log/unicore/gateway/gateway-startup.log - those file contain the servers' standard output output, and can be useful in case of generic, system-wide issues as missing Java virtual machine.
Log files should be checked carefully for warnings and errors. They should show only the information about the start of the service, without any warnings (the WARN label) or errors (the ERROR label).
In case of problems, you should proceed according to the information found in the log files. If they are unclear you should increase logging detail (for Unicore/X and Gateway). This is set in the file /etc/unicore/gateway/logging.properties and /etc/unicore/unicorex/logging.properties. UNICORE uses log4j logging subsystem. When you change the login parameters is not required to restart the component.
After the successful initialization of all services you can begin to test them in practice. Please connect to the site via any UNICORE client (URC or UCC). Since the registration of newly created VSite was initially turned off in the global registry, you should use the local registry.
The local registry address is: https://GATEWAY-ADDESS/VSITE_NAME/services/Registry?res=default_registry.
Is recommended for test script execution, which displays the user. This should be a user associated with the certificate.
Testing using the URC
- Testing should start from setting up the user's credentials,
- A local registry should be added in URC Grid Browser view..
- The registry contents should be listed, by double clicking on its node. It is worth to enable the display of all sites by clicking on the Grid Browser the "Show" button and selecting from the list "All services". If you see a red cross on the service, click on it and see the details of the error message in the URC and the error on the server side.
- If all services are available, you can send the job. At the same time it is recommended to monitor the logs Unicore/X and TSI for errors.
Testing using the UCC
- Configure UCC credentials
- Configure the registry in UCC preferences file (the registry property).
- Invoke:
./ucc shell ucc> connect You can access 1 target system(s). ucc> list-sites VSITE_NAME https://GATEWAY_ADDRESS/VSITE_NAME/services/TargetSystemService?res=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX ucc> list-storages SHARE https://GATEWAY_ADDRESS/VSITE_NAME/services/services/StorageManagement?res=default_storage ucc> ls u6://SHARE /a5063ea0-ecbe-4097-9abc-f55ec9437376 /3f501d37-5851-4c9e-a1da-5ad7b9f16633 /3bf169c2-2149-4564-b827-0b6560a3dd35 ... ucc> list-applications Applications on target system <VSITE_NAME> R 2.10.0 BLAST 2.2.22 POVRay 3.6.1 ...
We should get a message similar to the above.
Then test the file transfer:
ucc> put-file -s LOCAL_FILE_PATH -t https://GATEWAY_ADDRESS/VSITE_NAME/services/StorageManagement?res=default_storage#TARGET_FILE_NAME
and job submition:
ucc> run -s VSITE_NAME JOB_FILE_PATH.u SUCCESSFUL exit code: 0
If an error occurs, you can on each of these commands add the "-v" flag, what increases UCC verbosity. As in the URC case it is advised to simultaneously monitor Unicore / Xa and TSI log files.
After testing
If testing was successful, you can unlock the registration system in the global registry.
QCG checks
QCG Computing checks
The presented tests of QCG-Computing service use the qcg-comp, the client program for QCG-Computing, that may be installed from provided RPMS. In order to connect to QCG-Computing the grid proxy must be created.
Generate user’s proxy:
$ grid-proxy-init Your identity: /C=PL/O=GRID/O=PSNC/CN=Mariusz Mamonski Enter GRID pass phrase for this identity: Creating proxy ............................ Done Your proxy is valid until: Fri Jun 10 06:23:32 2011
Query the QCG-Computing service:
$ qcg-comp -G | xmllint --format - # the xmllint is used only to present the result in more pleasant way <bes-factory:FactoryResourceAttributesDocument xmlns:bes-factory="http://schemas.ggf.org/bes/2006/08/bes-factory"> … a lot of information … </bes-factory:FactoryResourceAttributesDocument>
Submit a sample job:
$ qcg-comp -c -J /opt/plgrid/qcg/share/qcg-comp/doc/examples/date.xml Activity Id: ccb6b04a-887b-4027-633f-412375559d73
Query its status:
$ qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 status = Executing $ qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 status = Finished exit status = 0
QCG Notification checks
The tests of QCG-Notification require qcg-ntf-client program to be installed in a system. The program is provided in RPM package.
Create a sample subscription:
$ qcg-ntf-client -d -S "cons=http://127.0.0.1:2212 top=http://schemas.qoscosgrid.org/comp/2011/04/notification/topic;//*;Full" ... INF May 17 14:15:51 1128 0xa0262720 [qcg-client-gsoa] Subscribed, subRef: '810917963' ...
Remove the created subscription:
$ qcg-ntf-client -d -U "id=810917963" ... INF May 17 14:41:48 3318 0xa0262720 [qcg-client-gsoa] Unsubscribed: '810917963' …
Checking the connection with QCG-Computing:
In one shell run ‘tail -f’ on the QCG-Computing log file and in the other try to submit a sample job using the qcg-comp program (as described above). Check the tail output if there are no error messages on sending notifications. E.g. the following lines means that the connection problems occurred:
$ tail -f /opt/qcg/var/log/qcg-comp/qcg-compd.log INF Oct 04 10:55:33 18929 0x2adadc2abe30 [notification_ws] Sending notify: 320f014c-3181-4daf-bbd9-1824b7d8216a -> Queued NOT Oct 04 10:55:33 18929 0x2adadc2abe30 [.....ntf_client] FaultCode: 'SOAP-ENV:Client' NOT Oct 04 10:55:33 18929 0x2adadc2abe30 [.....ntf_client] FaultString: 'smcm:ActivityState' NOT Oct 04 10:55:33 18929 0x2adadc2abe30 [.....ntf_client] FaultDetail: '<SOAP-ENV:Detail xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">connect failed in tcp_connect()</SOAP-ENV:Detail>' ERR Oct 04 10:55:33 18929 0x2adadc2abe30 [notification_ws] Failed to send notification to http://grass1.man.poznan.pl:19011/
QCG Broker checks
The basic tests of QCG-Broker service may be proceeded with help of qcg-simple-client, the software that provides a set of commands for interaction with QCG-Broker. qcg-simple-client may be installed from RPMs.
Create a sample job description:
$ cat > sleep.qcg << EOF #!/bin/bash #QCG queue=plgrid #QCG host=nova.wcss.wroc.pl #QCG persistent sleep 30 EOF
Submit a job:
$ qcg-sub sleep.qcg https://qcg-broker.man.poznan.pl:8443/qcg/services/ /C=PL/O=GRID/O=PSNC/CN=qcg-broker/qcg-broker.man.poznan.pl Your identity: C=PL,O=GRID,O=PSNC,CN=Bartosz Bosak Enter GRID pass phrase for this identity: Creating proxy, please wait... Proxy verify OK Your proxy is valid until Tue Mar 12 14:50:27 CET 2013 UserDN = /C=PL/O=GRID/O=PSNC/CN=Bartosz Bosak ProxyLifetime = 24 Days 23 Hours 59 Minutes 58 Seconds jobId = J1360936230540__0152
Check the job statuses:
$ qcg-info https://qcg-broker.man.poznan.pl:8443/qcg/services/ /C=PL/O=GRID/O=PSNC/CN=qcg-broker/qcg-broker.man.poznan.pl UserDN = /C=PL/O=GRID/O=PSNC/CN=Bartosz Bosak ProxyLifetime = 24 Days 23 Hours 59 Minutes 49 Seconds Command translated to: "task_info" "J1360936230540__0152" "task" Note: UserDN: /C=PL/O=GRID/O=PSNC/CN=Bartosz Bosak TaskType: SINGLE SubmissionTime: Fri Feb 15 14:50:31 CET 2013 FinishTime: ProxyLifetime: PT0S Status: PREPROCESSING StatusDesc: StartTime: Fri Feb 15 14:50:33 CET 2013 Allocation: HostName: nova.wcss.wroc.pl ProcessesCount: 1 ProcessesGroupId: Status: PREPROCESSING StatusDescription: SubmissionTime: Fri Feb 15 14:50:32 CET 2013 FinishTime: LocalSubmissionTime: Fri Feb 15 14:50:37 CET 2013 LocalStartTime: LocalFinishTime: $ qcg-info https://qcg-broker.man.poznan.pl:8443/qcg/services/ /C=PL/O=GRID/O=PSNC/CN=qcg-broker/qcg-broker.man.poznan.pl UserDN = /C=PL/O=GRID/O=PSNC/CN=Bartosz Bosak ProxyLifetime = 24 Days 23 Hours 59 Minutes 23 Seconds Command translated to: "task_info" "J1360936230540__0152" "task" Note: UserDN: /C=PL/O=GRID/O=PSNC/CN=Bartosz Bosak TaskType: SINGLE SubmissionTime: Fri Feb 15 14:50:31 CET 2013 FinishTime: ProxyLifetime: PT0S Status: RUNNING StatusDesc: StartTime: Fri Feb 15 14:50:33 CET 2013 Allocation: HostName: nova.wcss.wroc.pl ProcessesCount: 1 ProcessesGroupId: Status: RUNNING StatusDescription: SubmissionTime: Fri Feb 15 14:50:32 CET 2013 FinishTime: LocalSubmissionTime: Fri Feb 15 14:50:37 CET 2013 LocalStartTime: Fri Feb 15 14:50:47 CET 2013 LocalFinishTime: $ qcg-info https://qcg-broker.man.poznan.pl:8443/qcg/services/ /C=PL/O=GRID/O=PSNC/CN=qcg-broker/qcg-broker.man.poznan.pl UserDN = /C=PL/O=GRID/O=PSNC/CN=Bartosz Bosak ProxyLifetime = 24 Days 23 Hours 56 Minutes 10 Seconds Command translated to: "task_info" "J1360936230540__0152" "task" Note: UserDN: /C=PL/O=GRID/O=PSNC/CN=Bartosz Bosak TaskType: SINGLE SubmissionTime: Fri Feb 15 14:50:31 CET 2013 FinishTime: Fri Feb 15 14:52:17 CET 2013 ProxyLifetime: PT0S Status: FINISHED StatusDesc: StartTime: Fri Feb 15 14:50:33 CET 2013 Allocation: HostName: nova.wcss.wroc.pl ProcessesCount: 1 ProcessesGroupId: Status: FINISHED StatusDescription: SubmissionTime: Fri Feb 15 14:50:32 CET 2013 FinishTime: Fri Feb 15 14:52:12 CET 2013 LocalSubmissionTime: Fri Feb 15 14:50:37 CET 2013 LocalStartTime: Fri Feb 15 14:50:47 CET 2013 LocalFinishTime: Fri Feb 15 14:52:09 CET 2013
Check the functionality of the cloud elements
Cloud Compute (OCCI) checks
NOTE: Pre-requisite to run the following commands is the installation of the EGI CLI environment, according to this guide.
- Go into the AppDB and look for a generic OS image, member of the Federated Cloud VO (which all the sites needs to support) and/or any other VO supported by the site. Eg. the CentOS 6 minimal image (https://appdb.egi.eu/store/software/centos.6.minimal)
- Check that the Site is visible into the AppDB "Availability and Usage" panel for the image. If not, probably the site has not registered the FedCloud VO into their middleware (vmcatcher) or it did not properly configured the BDII provider script.
- On the AppDB "Availability and Usage" panel, click on the Site name, then on the latest VM Image version, select the resource template with the smallest quantity of resources (RAM & CPU) and click on the "get IDs" button on the right of the resource template. You will get the "Site Endpoint", "Template ID" and "OCCI ID". Save these values since they will be needed in the next steps.
- Generate a set of random keys for your user (it is not required to set a phassphrase for the keys, since these are just temporary keys for the test)
[spinto@w43asd ~]$ ssh-keygen -t rsa -b 2048 -f tempkey
- Create a simple contextualization script, to setup access keys on the machine and test contextualization
[spinto@w43asd ~]$ cat << EOF > ctx.txt Content-Type: multipart/mixed; boundary="===============4393449873403893838==" MIME-Version: 1.0 --===============4393449873403893838== Content-Type: text/x-shellscript; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="deploy.sh" #!/bin/bash echo "OK" > /tmp/deployment.log --===============4393449873403893838== Content-Type: text/cloud-config; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="userdata.txt" #cloud-config users: - name: testadm sudo: ALL=(ALL) NOPASSWD:ALL lock-passwd: true ssh-import-id: testadm ssh-authorized-keys: - `cat tempkey.pub` --===============4393449873403893838==-- EOF
- Create a proxy registered for the fedcloud VO (or any other VO you want to check for the site)
[spinto@w43asd ~]$ voms-proxy-init -dont_verify_ac -rfc -voms fedcloud.egi.eu Enter GRID pass phrase for this identity: Contacting voms1.egee.cesnet.cz:15002 [/DC=org/DC=terena/DC=tcs/C=CZ/O=CESNET/CN=voms1.egee.cesnet.cz] "fedcloud.egi.eu"... Remote VOMS server contacted succesfully. Created proxy in /tmp/x509up_u500. Your proxy is valid until Thu Aug 14 05:23:26 CEST 2014
- Try to describe OS templates
[spinto@w43asd ~]$ occi --endpoint https://carach5.ics.muni.cz:11443 --auth x509 --voms --user-cred /tmp/x509up_u500 -a describe -r os_tpl [...] [[ http://occi.carach5.ics.muni.cz/occi/infrastructure/os_tpl#uuid_egi_compss_debian_7_0_x86_64_0001_fedcloud_dukan_74 ]] title: EGI-COMPSs-Debian-7.0-x86_64-0001@fedcloud-dukan term: uuid_egi_compss_debian_7_0_x86_64_0001_fedcloud_dukan_74 location: /mixin/os_tpl/uuid_egi_compss_debian_7_0_x86_64_0001_fedcloud_dukan_74/ [...]
where the site endpoint is the one you retrieved from AppDB. You can check also that the OCCI ID provided by in AppDB is listed in the interface reply.
- Try to describe the resource templates
[spinto@w43asd ~]$ occi --endpoint https://carach5.ics.muni.cz:11443 --auth x509 --voms --user-cred /tmp/x509up_u500 -a describe -r resource_tpl [...] [[ http://schema.fedcloud.egi.eu/occi/infrastructure/resource_tpl#small ]] title: Small Instance - 1 core and 2 GB RAM term: small location: /mixin/resource_tpl/small/ [...]
You can check here if the Resource ID provided by AppDB is present in the interface reply.
- Create a block storage entity
[spinto@w43asd ~]$ occi --endpoint https://carach5.ics.muni.cz:11443 --auth x509 --voms --user-cred /tmp/x509up_u500 -a create -r storage -t occi.storage.size='num(1)' -t occi.core.title='volatile-disk-test1' https://carach5.ics.muni.cz:11443/storage/195
Take note of the new storage block ID
- Check that the storage block has been created correctly via
[spinto@w43asd ~]$ occi --endpoint https://carach5.ics.muni.cz:11443 --auth x509 --voms --user-cred /tmp/x509up_u500 -a describe -r /storage/195
- Start a VM (with block storage attached to it)
[spinto@w43asd ~]$ occi --endpoint https://carach5.ics.muni.cz:11443/ --auth x509 --voms --user-cred /tmp/x509up_u500 --action create --resource compute --context user_data=file://$PWD/ctx.txt --mixin=os_tpl#uuid_metacloud_scilinux_6_5_x86_64_0001_fedcloud_dukan_73 --mixin=resource_tpl#small --attribute occi.core.title=cert-test --link /storage/195 https://carach5.ics.muni.cz:11443/compute/44093
where the mixin are the OCCI ID and Resource ID as provided by AppDB, the --link command has as argument the storage ID of the storage created above and --context user_data=file:// contains the path to the contextualization script created above.
- Check that the VM goes online in a certain time by continuing to poll the OCCI compute interface until the VM passes to the 'active' state
[spinto@w43asd ~]$ occi --endpoint https://carach5.ics.muni.cz:11443/ --auth x509 --voms --user-cred /tmp/x509up_u500 --action describe -r /compute/44093 [...] occi.compute.state = active [...]
The startup should take no more than 1-2 minutes in average (5 minutes top), otherwise there may be some problem with the site.
- Take note of the external IP of your VM from the "compute describe"
[spinto@w43asd ~]$ occi --endpoint https://carach5.ics.muni.cz:11443/ --auth x509 --voms --user-cred /tmp/x509up_u500 --action describe -r /compute/44093 [...] occi.networkinterface.address = 147.32.3.54 [...]
- Check that the IP provided by the "compute describe" operation above is an external public IP. If not, manually request an external IP via
[spinto@w43asd ~]$ occi --endpoint https://carach5.ics.muni.cz:11443/ --auth x509 --voms --user-cred /tmp/x509up_u500 --action link -r /compute/44093 -j /network/public
and then check using the command above that the service is providing now a public IP
- SSH to the machine via the provided IP and the temporary key and check that you have sudo rights and that the contextualization has been applied correctly
[spinto@w43asd ~]$ ssh -i tempkey testadm@147.32.3.54 [testadm@cert-test ~]$ sudo su - [root@cert-test ~]$ cat /tmp/deployment.log OK
- Check that the disk has been properly attached to the VM (there should be a 1GB unmounted disk visible by the VM)
[root@cert-test ~]$ fdisk -l [...] Disk /dev/xvdd: 1073 MB, 1073741824 bytes 255 heads, 63 sectors/track, 130 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 [...]
- Detach the block storage from the VM
[spinto@w43asd ~]$ occi --endpoint https://carach5.ics.muni.cz:11443/ --auth x509 --voms --user-cred /tmp/x509up_u500 --action unlink -r /compute/44093 --link /storage/195
- Check now that the disk is not visible anymore in the VM
[root@cert-test ~]$ fdisk -l
- Try now to attach the block storage entity to a VM when it is in running stage
[spinto@w43asd ~]$ occi --endpoint https://carach5.ics.muni.cz:11443/ --auth x509 --voms --user-cred /tmp/x509up_u500 --action link -r /compute/44093 --link /storage/196
- Check that the block storage is visible again from the VM
[root@cert-test ~]$ fdisk -l [...] Disk /dev/xvdd: 1073 MB, 1073741824 bytes 255 heads, 63 sectors/track, 130 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 [...]
- Detach and delete the block storage
[spinto@w43asd ~]$ occi --endpoint https://carach5.ics.muni.cz:11443/ --auth x509 --voms --user-cred /tmp/x509up_u500 --action unlink -r /compute/44093 --link /storage/195 [spinto@w43asd ~]$ occi --endpoint https://carach5.ics.muni.cz:11443/ --auth x509 --voms --user-cred /tmp/x509up_u500 --action delete -r /storage/195
- Delete the VM
[spinto@w43asd ~]$ occi --endpoint https://carach5.ics.muni.cz:11443/ --auth x509 --voms --user-cred /tmp/x509up_u500 --action delete -r /compute/44093
- Check that you do not have VMs running on the site anymore via
[spinto@w43asd ~]$ occi --endpoint https://carach5.ics.muni.cz:11443/ --auth x509 --voms --user-cred /tmp/x509up_u500 --action describe -r compute
Cloud Storage (CDMI) checks
NOTE: Pre-requisite to run the following commands is the installation of the EGI CLI environment, according to this guide.
- Create a proxy registered for the fedcloud VO (or any other VO you want to check for the site)
[spinto@w43asd ~]$ voms-proxy-init -dont_verify_ac -rfc -voms fedcloud.egi.eu Enter GRID pass phrase for this identity: Contacting voms1.egee.cesnet.cz:15002 [/DC=org/DC=terena/DC=tcs/C=CZ/O=CESNET/CN=voms1.egee.cesnet.cz] "fedcloud.egi.eu"... Remote VOMS server contacted succesfully. Created proxy in /tmp/x509up_u500. Your proxy is valid until Thu Aug 14 05:23:26 CEST 2014
- List the content of the repository:
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ list /
- Create a test folder:
[[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ mkdir test { "completionStatus": "Complete", "objectName": "test/", "capabilitiesURI": "/cdmi/AUTH_113d9a9a671944648722e890ecb94d36/cdmi_capabilities/container/", "parentURI": "/cdmi/AUTH_113d9a9a671944648722e890ecb94d36/", "objectType": "application/cdmi-container", "metadata": {} }
- Create a test file:
[[user@client]# echo "TEST OK" > testfile
- Upload the file:
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ put -T testfile test/test.txt
- Try to download the file:
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ get test/test.txt -o testfile.downloaded [user@client]# cat testfile.downloaded TEST OK
- Delete the file:
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ delete test/test.txt
- Check that the file is not present anymore
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ list test/
- Upload the file again:
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ put -T testfile test/test.txt
- Delete a folder and all its files:
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ delete -r test/
- Check that the folder does not exist anymore
[user@client]# ./bcdmi -e https://prisma-swift.ba.infn.it:8080/ list /
Back to Site Certification GIIS Check HOWTO03
Back to Resource Centre registration and certification procedure PROC09
Revision history
Version | Authors | Date | Comments |
---|---|---|---|
1.0 | Alessandro Paolini | 2010-12-15 | first draft |
1.1 | Alessandro Paolini | 2010-12-16 | added links to certification job pages |
1.2 | Alessandro Paolini | 2011-06-08 | added some other lcg-utils test |
1.3 | Malgorzata Krakowian | 2012-10-15 | added Globus and Unicore check instructions |
1.4 | Malgorzata Krakowian | 2013-03-01 | added QCG check instructions |
1.5 | Salvatore Pinto | 2013-08-14 | added cloud compute (OCCI) and cloud storage (CDMI) instructions |