Difference between revisions of "EGI QC Specific"
Line 84: | Line 84: | ||
==== How to test ==== | ==== How to test ==== | ||
Sample test using the StoRM SRM client is available at | Sample test using the StoRM SRM client is available at https://github.com/enolfc/qc-tests/tree/master/storage/srm-test.sh | ||
=== lcg-utils test === | === lcg-utils test === | ||
Line 92: | Line 92: | ||
==== How to test ==== | ==== How to test ==== | ||
Sample test is available at | Sample test is available at https://github.com/enolfc/qc-tests/tree/master/storage/lcg-test.sh | ||
=== WebDAV === | === WebDAV === | ||
Line 108: | Line 108: | ||
==== How to test ==== | ==== How to test ==== | ||
Sample test: | Sample test at https://github.com/enolfc/qc-tests/tree/master/storage/webdav-test.sh | ||
== VOMS == | == VOMS == | ||
Line 126: | Line 126: | ||
This category covers WMS, GridWay and qcg-broker. | This category covers WMS, GridWay and qcg-broker. | ||
=== Interaction with job execution appliances === | |||
No standard interface is enforced for Job Scheduling appliances. The job scheduling appliances may be able to manage work items in one or more kinds of Job Execution appliances, support is expected for at least one of the following: | |||
* ARC-CE gridFTP | |||
* CREAM | |||
* EMI-ES | |||
* Globus GRAM5 | |||
* UNICORE | |||
* QCG-comp | |||
The appliance must be able to perform the following operations against the supported Job Execution interfaces | |||
* create new jobs, | |||
* retrieve the status of the jobs submitted by the appliance, | |||
* cancel jobs, and | |||
* (optionally) hold and resume jobs | |||
The Appliance may perform these operations for individual jobs or for set of jobs in order to improve its performance (e.g. for retrieving the status instead of querying each of the individual jobs, do a single query for all jobs submitted for the appliance) | |||
Any information needed for performing scheduling of jobs is expected to be discovered through the Information Discovery Appliances available in UMD, which use GlueSchema 1.3 or GlueSchema 2.0 with LDAP interface. | |||
==== How to test ==== | |||
* (If supported) Perform a list-match for jobs with no requirements. This should return a list with all available resources. | |||
* Submit simple jobs (e.g. sleep for a couple of minutes) to the Job Scheduling Appliance and check: | |||
** the jobs are correctly executed in the execution appliance (CE) | |||
** the status of the job is retrieved correctly and in a timely manner (i.e. status may not be updated in real-time, but it should be available within a short period of time) | |||
** cancelling jobs in the Appliance removes the job in the underlying system | |||
* Submit jobs with some input/output files and assure that the files are correctly transferred. | |||
* If the appliance supports it, submit: | |||
** DAG jobs | |||
** Parametric jobs | |||
** Job Collections | |||
Sample jobs for some Schedulers are available at XXX | |||
=== Multi-node/multi-core jobs === | |||
Job Scheduling Appliances should also support multi-node/-core jobs. Check the JobScheduling section for more information. | |||
Sample jobs for are available at XXX | |||
=== WMS === | === WMS === | ||
==== Proxy renewal ==== | |||
long proxy - proxy renewal - multiple role/group proxy support | |||
== Interactive Job == | == Interactive Job == |
Revision as of 12:47, 25 October 2013
Information model / Information discovery
site-BDII, top-BDII, glite-CLUSTER fall into this category.
Refer to the generic criteria on GlueSchema compliance.
Job Execution Appliances
This category covers Computing Elements products (CREAM, ARC-CE, QCG-COMP, ...)
Interaction with the batch system
Job execution appliances must be able to perform basic management jobs in a batch system:
- create new jobs,
- retrieve the status of the jobs submitted by the appliance,
- cancel jobs, and
- (optionally) hold and resume jobs
The Appliance may perform these operations for individual jobs or for set of jobs in order to improve its performance (e.g. for retrieving the status instead of querying each of the individual jobs, do a single query for all jobs submitted for the appliance)
Verification must be performed for at least one of the following batch systems:
- Torque/PBS
- SGE/OGE
- SLURM
- LSF
How to test
- Submit simple jobs (e.g. sleep for a couple of minutes) to the Job Execution Appliance and check:
- the jobs are correctly executed in the batch system
- the status of the job is retrieved correctly and in a timely manner (i.e. status may not be updated in real-time, but it should be available within a short period of time)
- cancel the jobs in the Appliance removes the job in the batch system
- Submit jobs with some input/output files and assure that the files are correctly transferred.
Sample jobs for some CEs are available at https://github.com/enolfc/egi-qc/tree/master/tests/jobexecution
Multi-node/multi-core jobs
Job Execution Appliances should support multi-node/-core jobs. Different support modes are considered:
- multi-slot request: the job specifies the number of slots, which will be allocated following a default policiy defined by the site (e.g. filling up machines, using free slots of any machine, etc.)
- single-machine multi-core request: the job specifies number of required slots that get allocated within a single machine.
- multi-node multi-core request: job can specify the number of cores and the number of hosts to use (e.g. 4-cores at 2 different hosts)
- Exclusive request: job request specifies the hosts to be used exclusively.
How to test
Submit jobs for testing the different modes listed above and check in the batch system that the allocated slots are as specified.
Sample jobs for some CEs are available at https://github.com/enolfc/egi-qc/tree/master/tests/jobexecution/
Parallel jobs (with mpi-start)
mpi-start should be able to detect the batch system and execute parallel jobs with different MPI implementations.
How to test
Submit mpi-start jobs with different slot requirements (see possible cases in the multi-node/multi-core test), using different parallel jobs (dummy, MPI and OpenMP), and check that:
- mpi-start detects the batch system
- input and executables is transferred to the nodes involved in the job
- MPI execution works without issues
Sample tests are available at XXX
Storage Management Appliances
This category covers Storage Elements products (DPM, dCache, StoRM, ARC-SE,...)
SRM compliance
Execute tests with a SRM client that:
- pings the SRM interface
- creates a directory
- puts a file in that directory using different transfer methods (gsiftp, http)
- gets back the file
- copy file
- moves file
- removes file
- deletes files and directory
How to test
Sample test using the StoRM SRM client is available at https://github.com/enolfc/qc-tests/tree/master/storage/srm-test.sh
lcg-utils test
Perform various operations using the lcg-* commands that use the SRM interface.
How to test
Sample test is available at https://github.com/enolfc/qc-tests/tree/master/storage/lcg-test.sh
WebDAV
If the SE supports WebDAV, execute the following operations:
- create directory
- list directory
- put file
- get file
- copy file
- move file
- remove file
- remove directory
How to test
Sample test at https://github.com/enolfc/qc-tests/tree/master/storage/webdav-test.sh
VOMS
Basic VOMS functionality...
Configure one VO at the server and
- XXX
QCG
Job Scheduling
This category covers WMS, GridWay and qcg-broker.
Interaction with job execution appliances
No standard interface is enforced for Job Scheduling appliances. The job scheduling appliances may be able to manage work items in one or more kinds of Job Execution appliances, support is expected for at least one of the following:
- ARC-CE gridFTP
- CREAM
- EMI-ES
- Globus GRAM5
- UNICORE
- QCG-comp
The appliance must be able to perform the following operations against the supported Job Execution interfaces
- create new jobs,
- retrieve the status of the jobs submitted by the appliance,
- cancel jobs, and
- (optionally) hold and resume jobs
The Appliance may perform these operations for individual jobs or for set of jobs in order to improve its performance (e.g. for retrieving the status instead of querying each of the individual jobs, do a single query for all jobs submitted for the appliance)
Any information needed for performing scheduling of jobs is expected to be discovered through the Information Discovery Appliances available in UMD, which use GlueSchema 1.3 or GlueSchema 2.0 with LDAP interface.
How to test
- (If supported) Perform a list-match for jobs with no requirements. This should return a list with all available resources.
- Submit simple jobs (e.g. sleep for a couple of minutes) to the Job Scheduling Appliance and check:
- the jobs are correctly executed in the execution appliance (CE)
- the status of the job is retrieved correctly and in a timely manner (i.e. status may not be updated in real-time, but it should be available within a short period of time)
- cancelling jobs in the Appliance removes the job in the underlying system
- Submit jobs with some input/output files and assure that the files are correctly transferred.
- If the appliance supports it, submit:
- DAG jobs
- Parametric jobs
- Job Collections
Sample jobs for some Schedulers are available at XXX
Multi-node/multi-core jobs
Job Scheduling Appliances should also support multi-node/-core jobs. Check the JobScheduling section for more information. Sample jobs for are available at XXX
WMS
Proxy renewal
long proxy - proxy renewal - multiple role/group proxy support
Interactive Job
Client Tools
UI, ARC client,
The User interface is a collection of clients