Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

EGI QC Specific

From EGIWiki
Revision as of 12:39, 25 October 2013 by Enolfc (talk | contribs) (→‎glexec)
Jump to navigation Jump to search

Information model / Information discovery

site-BDII, top-BDII, glite-CLUSTER fall into this category.

Refer to the generic criteria on GlueSchema compliance.

Job Execution Appliances

This category covers Computing Elements products (CREAM, ARC-CE, QCG-COMP, ...)

Interaction with the batch system

Job execution appliances must be able to perform basic management jobs in a batch system:

  • create new jobs,
  • retrieve the status of the jobs submitted by the appliance,
  • cancel jobs, and
  • (optionally) hold and resume jobs

The Appliance may perform these operations for individual jobs or for set of jobs in order to improve its performance (e.g. for retrieving the status instead of querying each of the individual jobs, do a single query for all jobs submitted for the appliance)

Verification must be performed for at least one of the following batch systems:

  • Torque/PBS
  • SGE/OGE
  • SLURM
  • LSF

How to test

  • Submit simple jobs (e.g. sleep for a couple of minutes) to the Job Execution Appliance and check:
    • the jobs are correctly executed in the batch system
    • the status of the job is retrieved correctly and in a timely manner (i.e. status may not be updated in real-time, but it should be available within a short period of time)
    • cancel the jobs in the Appliance removes the job in the batch system
  • Submit jobs with some input/output files and assure that the files are correctly transferred.

Sample jobs for some CEs are available at https://github.com/enolfc/egi-qc/tree/master/tests/jobexecution

Multi-node/multi-core jobs

Job Execution Appliances should support multi-node/-core jobs. Different support modes are considered:

  • multi-slot request: the job specifies the number of slots, which will be allocated following a default policiy defined by the site (e.g. filling up machines, using free slots of any machine, etc.)
  • single-machine multi-core request: the job specifies number of required slots that get allocated within a single machine.
  • multi-node multi-core request: job can specify the number of cores and the number of hosts to use (e.g. 4-cores at 2 different hosts)
  • Exclusive request: job request specifies the hosts to be used exclusively.

How to test

Submit jobs for testing the different modes listed above and check in the batch system that the allocated slots are as specified.

Sample jobs for some CEs are available at https://github.com/enolfc/egi-qc/tree/master/tests/jobexecution/

Parallel jobs (with mpi-start)

mpi-start should be able to detect the batch system and execute parallel jobs with different MPI implementations.

How to test

Submit mpi-start jobs with different slot requirements (see possible cases in the multi-node/multi-core test), using different parallel jobs (dummy, MPI and OpenMP), and check that:

  • mpi-start detects the batch system
  • input and executables is transferred to the nodes involved in the job
  • MPI execution works without issues

Sample tests are available at XXX


Storage Management Appliances

This category covers Storage Elements products (DPM, dCache, StoRM, ARC-SE,...)

SRM compliance

Execute tests with a SRM client that:

  • pings the SRM interface
  • creates a directory
  • puts a file in that directory using different transfer methods (gsiftp, http)
  • gets back the file
  • copy file
  • moves file
  • removes file
  • deletes files and directory

How to test

Sample test using the StoRM SRM client is available at https://github.com/enolfc/qc-tests/tree/master/storage/srm-test.sh

lcg-utils test

Perform various operations using the lcg-* commands that use the SRM interface.

How to test

Sample test is available at https://github.com/enolfc/qc-tests/tree/master/storage/lcg-test.sh

WebDAV

If the SE supports WebDAV, execute the following operations:

  • create directory
  • list directory
  • put file
  • get file
  • copy file
  • move file
  • remove file
  • remove directory

How to test

Sample test at https://github.com/enolfc/qc-tests/tree/master/storage/webdav-test.sh

VOMS

VOMS testing in verification should perform the following actions:

  • Create a VO (ops.vo.ibergrid.eu) and a manager to that VO (usually the verifier)
  • Add new users to the VO via the command line and via the web interface
  • Generate RFC and SHA-2 proxies using UMD-2 and UMD-3 client tools.
  • Assure that the rest of services available (CEs, SEs, ...) in the verification testbed still handle the generated proxies. Check other sections of this document for possible tests.

QCG

Job Scheduling

This category covers WMS, GridWay and qcg-broker.

Interaction with job execution appliances

No standard interface is enforced for Job Scheduling appliances. The job scheduling appliances may be able to manage work items in one or more kinds of Job Execution appliances, support is expected for at least one of the following:

  • ARC-CE gridFTP
  • CREAM
  • EMI-ES
  • Globus GRAM5
  • UNICORE
  • QCG-comp

The appliance must be able to perform the following operations against the supported Job Execution interfaces

  • create new jobs,
  • retrieve the status of the jobs submitted by the appliance,
  • cancel jobs, and
  • (optionally) hold and resume jobs

The Appliance may perform these operations for individual jobs or for set of jobs in order to improve its performance (e.g. for retrieving the status instead of querying each of the individual jobs, do a single query for all jobs submitted for the appliance)

Any information needed for performing scheduling of jobs is expected to be discovered through the Information Discovery Appliances available in UMD, which use GlueSchema 1.3 or GlueSchema 2.0 with LDAP interface.

How to test

  • (If supported) Perform a list-match for jobs with no requirements. This should return a list with all available resources.
  • Submit simple jobs (e.g. sleep for a couple of minutes) to the Job Scheduling Appliance and check:
    • the jobs are correctly executed in the execution appliance (CE)
    • the status of the job is retrieved correctly and in a timely manner (i.e. status may not be updated in real-time, but it should be available within a short period of time)
    • cancelling jobs in the Appliance removes the job in the underlying system
  • Submit jobs with some input/output files and assure that the files are correctly transferred.
  • If the appliance supports it, submit:
    • DAG jobs
    • Parametric jobs
    • Job Collections

Sample jobs for some Schedulers are available at XXX

Multi-node/multi-core jobs

Job Scheduling Appliances should also support multi-node/-core jobs. Check the JobScheduling section for more information. Sample jobs for are available at XXX

WMS

Proxy renewal

long proxy - proxy renewal - multiple role/group proxy support

Interactive Job

Interactive job capability is provided by gsissh and gsisshterm.

With a configured system, the verifier should check that:

  • is able to login to the remote system (using the gsissh client or gsisshterm)
  • can copy files using gsiscp

and that the logins are properly logged

Client Tools

UI, ARC client,

The User interface is a collection of clients

Other products

FTS

LFC

Apel

glexec

glexec takes Grid credentials as input and authenticate and authorize the credentials to create a new execution sandbox and execute the given command as the switched identity.

Testing of glexec by verifiers should include:

  • using ARGUS for authz policies
  • test both allowed and denied users
  • check that the actions are properly logged

A simple job for testing glexec is available at https://github.com/enolfc/qc-tests/tree/master/glexec

Unicore

myproxy

AMGA

gridftp