Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI QC Specific"

From EGIWiki
Jump to navigation Jump to search
m (moved EGI QC6 Specific to EGI QC Specific: QC applies to various releases)
 
(24 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Information model / Information discovery ==
site-BDII, top-BDII, glite-CLUSTER, arc-infosys fall into this category.
Refer to the [[EGI_QC6_Testing#Information_Model|generic criteria on GlueSchema compliance]]
== Job Execution Appliances ==
== Job Execution Appliances ==


This category covers Computing Elements products (CREAM, ARC-CE, QCG-COMP,...)
This category covers Computing Elements products (CREAM, ARC-CE, QCG-COMP, ...)


=== Interaction with the batch system ===
=== Interaction with the batch system ===
Line 26: Line 32:
** the status of the job is retrieved correctly and in a timely manner (i.e. status may not be updated in real-time, but it should be available within a short period of time)
** the status of the job is retrieved correctly and in a timely manner (i.e. status may not be updated in real-time, but it should be available within a short period of time)
** cancel the jobs in the Appliance removes the job in the batch system
** cancel the jobs in the Appliance removes the job in the batch system
* Submit jobs with some input/output files and assure that the files are correctly transferred.
Sample jobs for some CEs are available at https://github.com/egi-qc/qc-tests/tree/master/tests/jobexecution
=== Underlying system information ===


* Submit jobs with some input/output files and assure that the files are correctly transferred.
You should check that the information published by the CE is updated regularly and reflects the actual values (i.e. are not set to static default values). The information updates should not compromise the availability of the service.


Sample jobs for some CEs are available at https://github.com/enolfc/egi-qc/tree/master/tests/jobexecution
There is no unique (and easy) way to test for this feature since it depends on the dynamic state of the resource being tested.
A simple case is the following:
* Get the current number of free cpus (or running/queued jobs)
* Submit several long jobs (sleep for long time) and monitor the information system
* Check that the free cpus (or running/queued jobs) is updated in the information system. Take into account that the change may not be immediate.


=== Multi-node/multi-core jobs ===
=== Multi-node/multi-core jobs ===
Line 45: Line 60:
Submit jobs for testing the different modes listed above and check in the batch system that the allocated slots are as specified.
Submit jobs for testing the different modes listed above and check in the batch system that the allocated slots are as specified.


Sample jobs for some CEs are available at https://github.com/enolfc/egi-qc/tree/master/tests/jobexecution
Sample jobs for some CEs are available at https://github.com/egi-qc/qc-tests/tree/master/tests/jobexecution/


=== Parallel jobs ===
=== Parallel jobs with mpi-start ===


mpi-start should be able to detect the batch system and execute parallel jobs with different MPI implementations.


==== How to test ====
Submit mpi-start jobs with different slot requirements (see possible cases in the multi-node/multi-core test), using different parallel jobs (dummy, MPI and OpenMP), and check that:
* mpi-start detects the batch system
* input and executables is transferred to the nodes involved in the job
* MPI execution works without issues
Sample tests are available at https://github.com/egi-qc/qc-tests/blob/master/jobexecution/cream/paralleljobs.sh


== Storage Management Appliances ==
== Storage Management Appliances ==


This category covers Storage Elements products (DPM, dCache, StoRM, ARC-CE,...)
This category covers Storage Elements products (DPM, dCache, StoRM)


=== SRM compliance ===
=== SRM compliance ===
Execute tests with a SRM client that:
* pings the SRM interface
* creates a directory
* puts a file in that directory using different transfer methods (gsiftp, http, file)
* gets back the file
* copy file
* moves file
* removes file
* deletes files and directory
==== How to test ====
Sample test using the StoRM SRM client is available at https://github.com/egi-qc/qc-tests/tree/master/storage/srm-test.sh


=== lcg-utils test ===
=== lcg-utils test ===
Perform various operations using the lcg-* commands that use the SRM interface.
==== How to test ====
Sample test is available at https://github.com/egi-qc/qc-tests/tree/master/storage/lcg-test.sh


=== WebDAV ===
=== WebDAV ===
If the SE supports WebDAV, execute the following operations:
* create directory
* list directory
* put file
* get file
* copy file
* move file
* remove file
* remove directory
==== How to test ====
Sample test at https://github.com/egi-qc/qc-tests/tree/master/storage/webdav-test.sh


== VOMS ==
== VOMS ==
VOMS testing in verification should perform the following actions:
* Create a VO (ops.vo.ibergrid.eu) and a manager to that VO (usually the verifier)
* Add new users to the VO via the command line and via the web interface
* Generate RFC and SHA-2 proxies using UMD-2 and UMD-3 client tools.
* Assure that the rest of services available (CEs, SEs, ...) in the verification testbed still handle the generated proxies. Check other sections of this document for possible tests.


== Job Scheduling ==
== Job Scheduling ==


This category covers WMS and qcg-broker
This category covers WMS, GridWay and qcg-broker.
 
=== Interaction with job execution appliances ===
 
No standard interface is enforced for Job Scheduling appliances. The job scheduling appliances may be able to manage work items in one or more kinds of Job Execution appliances, support is expected for at least one of the following:
* ARC-CE gridFTP
* CREAM
* EMI-ES
* Globus GRAM5
* UNICORE
* QCG-comp
 
The appliance must be able to perform the following operations against the supported Job Execution interfaces
* create new jobs,
* retrieve the status of the jobs submitted by the appliance,
* cancel jobs, and
* (optionally) hold and resume jobs
 
The Appliance may perform these operations for individual jobs or for set of jobs in order to improve its performance (e.g. for retrieving the status instead of querying each of the individual jobs, do a single query for all jobs submitted for the appliance)
 
Any information needed for performing scheduling of jobs is expected to be discovered through the Information Discovery Appliances available in UMD, which use GlueSchema 1.3 or GlueSchema 2.0 with LDAP interface.
 
==== How to test ====
 
* (If supported) Perform a list-match for jobs with no requirements. This should return a list with all available resources.
* Submit simple jobs (e.g. sleep for a couple of minutes) to the Job Scheduling Appliance and check:
** the jobs are correctly executed in the execution appliance (CE)
** the status of the job is retrieved correctly and in a timely manner (i.e. status may not be updated in real-time, but it should be available within a short period of time)
** cancelling jobs in the Appliance removes the job in the underlying system
* Submit jobs with some input/output files and assure that the files are correctly transferred.
* If the appliance supports it, submit:
** DAG jobs
** Parametric jobs
** Job Collections
 
Sample jobs for some Schedulers are available at https://github.com/egi-qc/qc-tests/tree/master/jobscheduling
 
=== Multi-node/multi-core jobs ===
 
Job Scheduling Appliances should also support multi-node/-core jobs. Check the JobScheduling section for more information.
Sample jobs for are available at https://github.com/egi-qc/qc-tests/tree/master/jobscheduling


=== WMS ===
=== WMS ===
For WMS check:
* Proxy renewal features work (submit a long job with a short renewable proxy and assure that it ends)
* Multiple role/group proxy is supported
* Proxies with long chains should be supported (such as the ones created by myproxy C=[...]/CN=proxy/CN=proxy/CN=proxy/...)


== Interactive Job ==
== Interactive Job ==
Interactive job capability is provided by gsissh and gsisshterm.
With a configured system, the verifier should check that:
* is able to login to the remote system (using the gsissh client or gsisshterm)
* can copy files using gsiscp
and that the logins are properly logged


== Client Tools ==
== Client Tools ==
UIs, WNs and the different products client tools fit in this category.
Services are tested through the client tools, refer to the services tests for samples.
== Other products ==
=== ARGUS ===
Testing of ARGUS should at least include defining two different resources:
* <tt><nowiki>http://test30.egi.cesga.es/policy</nowiki></tt>, with a policy that allows any user from ops, dteam or ops.vo.ibergrid.eu VOs is allowed to perform any action (".*")
* and <tt><nowiki>http://test30.egi.cesga.es/deny</nowiki></tt>, that lets users from <tt>ops.vo.ibergrid.eu</tt> and <tt>ops</tt>, but bans <tt>dteam</tt> users.
These policies can be tested against other components of the testbed.
=== myproxy ===
Check:
* storing (and retrieving) user credentials (with and without VOMS extensions)
* renewal of credentials for long running jobs with a WMS
=== APEL ===
Apel is covered by the Generic criteria on Accounting. Verification consists on installation, configuration and assuring that job records are created and transmitted.
=== glexec ===
glexec takes Grid credentials as input and authenticate and authorize the credentials to create a new execution sandbox and execute the given command as the switched identity.
Testing of glexec by verifiers should include:
* using ARGUS for authz policies
* test both allowed and denied users
* check that the actions are properly logged
A simple job for testing glexec is available at https://github.com/egi-qc/qc-tests/tree/master/glexec
=== Products without specific tests ===
FTS, LFC, AMGA and GridSite do not have yet specific tests defined. In this case, the verifier should contact the verifiers mailing list for deciding on how to proceed.

Latest revision as of 16:40, 12 May 2014

Information model / Information discovery

site-BDII, top-BDII, glite-CLUSTER, arc-infosys fall into this category.

Refer to the generic criteria on GlueSchema compliance

Job Execution Appliances

This category covers Computing Elements products (CREAM, ARC-CE, QCG-COMP, ...)

Interaction with the batch system

Job execution appliances must be able to perform basic management jobs in a batch system:

  • create new jobs,
  • retrieve the status of the jobs submitted by the appliance,
  • cancel jobs, and
  • (optionally) hold and resume jobs

The Appliance may perform these operations for individual jobs or for set of jobs in order to improve its performance (e.g. for retrieving the status instead of querying each of the individual jobs, do a single query for all jobs submitted for the appliance)

Verification must be performed for at least one of the following batch systems:

  • Torque/PBS
  • SGE/OGE
  • SLURM
  • LSF

How to test

  • Submit simple jobs (e.g. sleep for a couple of minutes) to the Job Execution Appliance and check:
    • the jobs are correctly executed in the batch system
    • the status of the job is retrieved correctly and in a timely manner (i.e. status may not be updated in real-time, but it should be available within a short period of time)
    • cancel the jobs in the Appliance removes the job in the batch system
  • Submit jobs with some input/output files and assure that the files are correctly transferred.

Sample jobs for some CEs are available at https://github.com/egi-qc/qc-tests/tree/master/tests/jobexecution

Underlying system information

You should check that the information published by the CE is updated regularly and reflects the actual values (i.e. are not set to static default values). The information updates should not compromise the availability of the service.

There is no unique (and easy) way to test for this feature since it depends on the dynamic state of the resource being tested. A simple case is the following:

  • Get the current number of free cpus (or running/queued jobs)
  • Submit several long jobs (sleep for long time) and monitor the information system
  • Check that the free cpus (or running/queued jobs) is updated in the information system. Take into account that the change may not be immediate.

Multi-node/multi-core jobs

Job Execution Appliances should support multi-node/-core jobs. Different support modes are considered:

  • multi-slot request: the job specifies the number of slots, which will be allocated following a default policiy defined by the site (e.g. filling up machines, using free slots of any machine, etc.)
  • single-machine multi-core request: the job specifies number of required slots that get allocated within a single machine.
  • multi-node multi-core request: job can specify the number of cores and the number of hosts to use (e.g. 4-cores at 2 different hosts)
  • Exclusive request: job request specifies the hosts to be used exclusively.

How to test

Submit jobs for testing the different modes listed above and check in the batch system that the allocated slots are as specified.

Sample jobs for some CEs are available at https://github.com/egi-qc/qc-tests/tree/master/tests/jobexecution/

Parallel jobs with mpi-start

mpi-start should be able to detect the batch system and execute parallel jobs with different MPI implementations.

How to test

Submit mpi-start jobs with different slot requirements (see possible cases in the multi-node/multi-core test), using different parallel jobs (dummy, MPI and OpenMP), and check that:

  • mpi-start detects the batch system
  • input and executables is transferred to the nodes involved in the job
  • MPI execution works without issues

Sample tests are available at https://github.com/egi-qc/qc-tests/blob/master/jobexecution/cream/paralleljobs.sh

Storage Management Appliances

This category covers Storage Elements products (DPM, dCache, StoRM)

SRM compliance

Execute tests with a SRM client that:

  • pings the SRM interface
  • creates a directory
  • puts a file in that directory using different transfer methods (gsiftp, http, file)
  • gets back the file
  • copy file
  • moves file
  • removes file
  • deletes files and directory

How to test

Sample test using the StoRM SRM client is available at https://github.com/egi-qc/qc-tests/tree/master/storage/srm-test.sh

lcg-utils test

Perform various operations using the lcg-* commands that use the SRM interface.

How to test

Sample test is available at https://github.com/egi-qc/qc-tests/tree/master/storage/lcg-test.sh

WebDAV

If the SE supports WebDAV, execute the following operations:

  • create directory
  • list directory
  • put file
  • get file
  • copy file
  • move file
  • remove file
  • remove directory

How to test

Sample test at https://github.com/egi-qc/qc-tests/tree/master/storage/webdav-test.sh

VOMS

VOMS testing in verification should perform the following actions:

  • Create a VO (ops.vo.ibergrid.eu) and a manager to that VO (usually the verifier)
  • Add new users to the VO via the command line and via the web interface
  • Generate RFC and SHA-2 proxies using UMD-2 and UMD-3 client tools.
  • Assure that the rest of services available (CEs, SEs, ...) in the verification testbed still handle the generated proxies. Check other sections of this document for possible tests.

Job Scheduling

This category covers WMS, GridWay and qcg-broker.

Interaction with job execution appliances

No standard interface is enforced for Job Scheduling appliances. The job scheduling appliances may be able to manage work items in one or more kinds of Job Execution appliances, support is expected for at least one of the following:

  • ARC-CE gridFTP
  • CREAM
  • EMI-ES
  • Globus GRAM5
  • UNICORE
  • QCG-comp

The appliance must be able to perform the following operations against the supported Job Execution interfaces

  • create new jobs,
  • retrieve the status of the jobs submitted by the appliance,
  • cancel jobs, and
  • (optionally) hold and resume jobs

The Appliance may perform these operations for individual jobs or for set of jobs in order to improve its performance (e.g. for retrieving the status instead of querying each of the individual jobs, do a single query for all jobs submitted for the appliance)

Any information needed for performing scheduling of jobs is expected to be discovered through the Information Discovery Appliances available in UMD, which use GlueSchema 1.3 or GlueSchema 2.0 with LDAP interface.

How to test

  • (If supported) Perform a list-match for jobs with no requirements. This should return a list with all available resources.
  • Submit simple jobs (e.g. sleep for a couple of minutes) to the Job Scheduling Appliance and check:
    • the jobs are correctly executed in the execution appliance (CE)
    • the status of the job is retrieved correctly and in a timely manner (i.e. status may not be updated in real-time, but it should be available within a short period of time)
    • cancelling jobs in the Appliance removes the job in the underlying system
  • Submit jobs with some input/output files and assure that the files are correctly transferred.
  • If the appliance supports it, submit:
    • DAG jobs
    • Parametric jobs
    • Job Collections

Sample jobs for some Schedulers are available at https://github.com/egi-qc/qc-tests/tree/master/jobscheduling

Multi-node/multi-core jobs

Job Scheduling Appliances should also support multi-node/-core jobs. Check the JobScheduling section for more information. Sample jobs for are available at https://github.com/egi-qc/qc-tests/tree/master/jobscheduling

WMS

For WMS check:

  • Proxy renewal features work (submit a long job with a short renewable proxy and assure that it ends)
  • Multiple role/group proxy is supported
  • Proxies with long chains should be supported (such as the ones created by myproxy C=[...]/CN=proxy/CN=proxy/CN=proxy/...)

Interactive Job

Interactive job capability is provided by gsissh and gsisshterm.

With a configured system, the verifier should check that:

  • is able to login to the remote system (using the gsissh client or gsisshterm)
  • can copy files using gsiscp

and that the logins are properly logged

Client Tools

UIs, WNs and the different products client tools fit in this category. Services are tested through the client tools, refer to the services tests for samples.

Other products

ARGUS

Testing of ARGUS should at least include defining two different resources:

  • http://test30.egi.cesga.es/policy, with a policy that allows any user from ops, dteam or ops.vo.ibergrid.eu VOs is allowed to perform any action (".*")
  • and http://test30.egi.cesga.es/deny, that lets users from ops.vo.ibergrid.eu and ops, but bans dteam users.

These policies can be tested against other components of the testbed.

myproxy

Check:

  • storing (and retrieving) user credentials (with and without VOMS extensions)
  • renewal of credentials for long running jobs with a WMS

APEL

Apel is covered by the Generic criteria on Accounting. Verification consists on installation, configuration and assuring that job records are created and transmitted.

glexec

glexec takes Grid credentials as input and authenticate and authorize the credentials to create a new execution sandbox and execute the given command as the switched identity.

Testing of glexec by verifiers should include:

  • using ARGUS for authz policies
  • test both allowed and denied users
  • check that the actions are properly logged

A simple job for testing glexec is available at https://github.com/egi-qc/qc-tests/tree/master/glexec

Products without specific tests

FTS, LFC, AMGA and GridSite do not have yet specific tests defined. In this case, the verifier should contact the verifiers mailing list for deciding on how to proceed.