Difference between revisions of "USG Simple Job Cycle"
Line 7: | Line 7: | ||
---- | ---- | ||
You will find an explanation of how to use the job management commands to prepare and submit a simple job, monitor its status and retrieve the output. | |||
<div class="sect2" title="Job commands"><div class="titlepage"><div><div> | |||
=== Job commands === | |||
</div></div></div> | |||
Most jobs are submitted to the EGI Grid infrastructure through the gLite WMS (Workload Management System). This is a Grid meta-scheduling service that matches job requirements to the capabilities of resources and chooses the most appropriate resource for a particular job. <br> | |||
The commands used for handling jobs on the Grid are: | |||
The commands used for handling jobs on the Grid are: | |||
<div class="informaltable"> | <div class="informaltable"> | ||
{| border="1" | {| border="1" | ||
|- | |- | ||
| Submitting a job | | Submitting a job | ||
| <code class="code">glite-wms-job-submit</code> | | <code class="code">glite-wms-job-submit</code> | ||
|- | |- | ||
| Checking job status | | Checking job status | ||
| <code class="code">glite-wms-job-status</code> | | <code class="code">glite-wms-job-status</code> | ||
|- | |- | ||
| Retrieving job output | | Retrieving job output | ||
| <code class="code">glite-wms-job-output</code> | | <code class="code">glite-wms-job-output</code> | ||
|- | |- | ||
| Checking job history | | Checking job history | ||
| <code class="code">glite-wms-job-logging-info</code> | | <code class="code">glite-wms-job-logging-info</code> | ||
|- | |- | ||
| Listing compatible resources | | Listing compatible resources | ||
| <code class="code">glite-wms-job-list-match</code> | | <code class="code">glite-wms-job-list-match</code> | ||
|- | |- | ||
| Delegate a proxy | | Delegate a proxy | ||
| <code class="code">glite-wms-job-delegate-proxy </code> | | <code class="code">glite-wms-job-delegate-proxy </code> | ||
|} | |} | ||
</div> | </div> | ||
All job management commands require the user to have a valid proxy to work, as all interactions with the WMS are secure and authenticated. | All job management commands require the user to have a valid proxy to work, as all interactions with the WMS are secure and authenticated. | ||
</div><div title="Proxy Delegation | </div><div class="sect2" title="Proxy Delegation"><div class="titlepage"><div><div> | ||
=== Proxy Delegation === | === Proxy Delegation === | ||
</div></div></div> | </div></div></div> | ||
The gLite WMS allows more flexible proxy delegation than previous meta-scheduling services, requires explicit delegation of a proxy. Consequently, the gLite WMS has an additional command to do this delegation (<code class="code">glite-wms-job-delegate-proxy</code>) and required delegation parameters for the <code class="code">glite-wms-job-submit</code> and <code class="code">glite-wms-job-list-match</code> commands. | The gLite WMS allows more flexible proxy delegation than previous meta-scheduling services, requires explicit delegation of a proxy. Consequently, the gLite WMS has an additional command to do this delegation (<code class="code">glite-wms-job-delegate-proxy</code>) and required delegation parameters for the <code class="code">glite-wms-job-submit</code> and <code class="code">glite-wms-job-list-match</code> commands. | ||
<div title="Automatic Delegation | <div class="sect3" title="Automatic Delegation"><div class="titlepage"><div><div> | ||
==== Automatic Delegation ==== | ==== Automatic Delegation ==== | ||
</div></div></div> | </div></div></div> | ||
To use automatic delegation, one simply needs to add the <code class="code">-a</code> option when using the <code class="code">glite-wms-job-submit</code> and <code class="code">glite-wms-job-list-match</code> commands. This will transmit your active proxy to the WMS system and use it for the job. | To use automatic delegation, one simply needs to add the <code class="code">-a</code> option when using the <code class="code">glite-wms-job-submit</code> and <code class="code">glite-wms-job-list-match</code> commands. This will transmit your active proxy to the WMS system and use it for the job. | ||
</div><div title="Manual Delegation | </div><div class="sect3" title="Manual Delegation"><div class="titlepage"><div><div> | ||
==== Manual Delegation ==== | ==== Manual Delegation ==== | ||
</div></div></div> | </div></div></div> | ||
You may want to use different proxies for different classes of jobs to allow those jobs to have different authorizations (e.g. via the use of VOMS groups or roles). To manually delegate a proxy, create a local proxy with the groups and roles you want with the <code class="code">voms-proxy-init</code> command. To delegate this proxy, do the following: | You may want to use different proxies for different classes of jobs to allow those jobs to have different authorizations (e.g. via the use of VOMS groups or roles). To manually delegate a proxy, create a local proxy with the groups and roles you want with the <code class="code">voms-proxy-init</code> command. To delegate this proxy, do the following: | ||
<pre class="command">glite-wms-job-delegate-proxy -d myproxy</pre> | <pre class="command">glite-wms-job-delegate-proxy -d myproxy</pre> | ||
Replace "myproxy" with a descriptive name of your choice. You can have many different, named proxies delegated to the WMS system. This proxy can then be used for a particular job submission via the <code class="code">-d</code> option. For example, | Replace "myproxy" with a descriptive name of your choice. You can have many different, named proxies delegated to the WMS system. This proxy can then be used for a particular job submission via the <code class="code">-d</code> option. For example, | ||
<pre class="command">glite-wms-job-submit -d myproxy MyJob.jdl</pre> | <pre class="command">glite-wms-job-submit -d myproxy MyJob.jdl</pre> | ||
This will use the proxy identified by the name "myproxy" for the submitted job. | This will use the proxy identified by the name "myproxy" for the submitted job. | ||
</div></div><div title="Preparing a Simple Job | </div></div><div class="sect2" title="Preparing a Simple Job"><div class="titlepage"><div><div> | ||
=== Preparing a Simple Job === | === Preparing a Simple Job === | ||
</div></div></div> | </div></div></div> | ||
The main purpose of the Grid is to allow people to run their programs in the most efficient way. In Grid terminology, a job is a unit of work intended as a program which starts, reads some data, does some calculation, produces an output and finishes. A job is described by a file, expressed in a language called Job Description Language, often referred to as a JDL file. A JDL file contains information such as | The main purpose of the Grid is to allow people to run their programs in the most efficient way. In Grid terminology, a job is a unit of work intended as a program which starts, reads some data, does some calculation, produces an output and finishes. A job is described by a file, expressed in a language called Job Description Language, often referred to as a JDL file. A JDL file contains information such as | ||
<div class="itemizedlist"> | <div class="itemizedlist"> | ||
* | *the name of the program to run; | ||
the name of the program to run; | |||
* | *the input files read by the program; | ||
the input files read by the program; | |||
* | *the output files produced by the program; | ||
the output files produced by the program; | |||
* | *the requirements to be satisfied by the host which is going to execute the program. | ||
the requirements to be satisfied by the host which is going to execute the program. | </div> | ||
</div> | The Grid is made up, among other things, of <span class="emphasis">''Computing Elements (CE)''</span>, which correspond physically to clusters of computers located in a computer centre. Users submit their jobs to the <span class="emphasis">''WMS''</span>, which looks at the corresponding JDL files and dispatches the jobs to the best available resources satisfying the job requirements, where the adequacy of a resource is measured by an estimate of the time to wait from job submission to job execution (which can be large, in case the resource has already a long queue of jobs to process). | ||
The Grid is made up, among other things, of <span class="emphasis">''Computing Elements (CE)''</span>, which correspond physically to clusters of computers located in a computer centre. Users submit their jobs to the <span class="emphasis">''WMS''</span>, which looks at the corresponding JDL files and dispatches the jobs to the best available resources satisfying the job requirements, where the adequacy of a resource is measured by an estimate of the time to wait from job submission to job execution (which can be large, in case the resource has already a long queue of jobs to process). | <div class="sect3" title="The Job Description Language"><div class="titlepage"><div><div> | ||
<div title="The Job Description Language | ==== The Job Description Language ==== | ||
==== The Job Description Language ==== | </div></div></div> | ||
</div></div></div> | The Job Description Language is a high-level language used to describe jobs. A JDL file is a text file containing a series of key-value pairs with the format: | ||
The Job Description Language is a high-level language used to describe jobs. A JDL file is a text file containing a series of key-value pairs with the format: | <pre class="template">attribute = expression;</pre> | ||
<pre class="template">attribute = expression;</pre> | where every pair is terminated by a semi-colon and can span several lines. Comments in a JDL file must be preceded by a pound (#) or enclosed between /* and */. | ||
where every pair is terminated by a semi-colon and can span several lines. Comments in a JDL file must be preceded by a pound (#) or enclosed between /* and */. | |||
In the following examples, the most important JDL attributes will be explained. | In the following examples, the most important JDL attributes will be explained. | ||
</div><div title="The simplest possible job | </div><div class="sect3" title="The simplest possible job"><div class="titlepage"><div><div> | ||
==== The simplest possible job ==== | ==== The simplest possible job ==== | ||
</div></div></div> | </div></div></div> | ||
The most basic example of a Grid job consists in executing a simple command and retrieving its output. This JDL file shows code to do this: | The most basic example of a Grid job consists in executing a simple command and retrieving its output. This JDL file shows code to do this: | ||
<pre class="program"># example.jdl | <pre class="program"># example.jdl | ||
Executable = "/bin/hostname"; | Executable = "/bin/hostname"; | ||
StdOutput = "std.out"; | StdOutput = "std.out"; | ||
StdError = "std.err";</pre> | StdError = "std.err";</pre> | ||
The <code class="code">Executable</code> attribute specifies the command to run in the job. The <code class="code">StdOutput</code> attribute specifies the file where the standard output of the job will be written, and similarly the <code class="code">StdError</code> attribute specifies the file where the standard error of the job will be written. | The <code class="code">Executable</code> attribute specifies the command to run in the job. The <code class="code">StdOutput</code> attribute specifies the file where the standard output of the job will be written, and similarly the <code class="code">StdError</code> attribute specifies the file where the standard error of the job will be written. | ||
The above JDL file does not allow to get back the output of the job. This is done via the concept of <span class="emphasis">''sandbox''</span>. | The above JDL file does not allow to get back the output of the job. This is done via the concept of <span class="emphasis">''sandbox''</span>. | ||
</div><div title="The sandbox | </div><div class="sect3" title="The sandbox"><div class="titlepage"><div><div> | ||
==== The sandbox ==== | ==== The sandbox ==== | ||
</div></div></div> | </div></div></div> | ||
The sandbox mechanism allows users to specify what files should be sent together with the job from the UI to the execution host, and sent back from to execution host to UI upon successful completion of the job. This is achieved by using the <code class="code">InputSandbox</code> and the <code class="code">OutputSandbox</code> attributes, respectively. | The sandbox mechanism allows users to specify what files should be sent together with the job from the UI to the execution host, and sent back from to execution host to UI upon successful completion of the job. This is achieved by using the <code class="code">InputSandbox</code> and the <code class="code">OutputSandbox</code> attributes, respectively. | ||
The <code class="code">InputSandbox</code> attribute contains typically the executable to be run, when it is not already present on the execution host, and any other file possibly needed for the executable to run and which is located on the User Interface. For example: | The <code class="code">InputSandbox</code> attribute contains typically the executable to be run, when it is not already present on the execution host, and any other file possibly needed for the executable to run and which is located on the User Interface. For example: | ||
<pre class="program">InputSandbox = {"/home/doe/test.sh", "fileA", "fileB"};</pre> | <pre class="program">InputSandbox = {"/home/doe/test.sh", "fileA", "fileB"};</pre> | ||
Relative paths are relative to the current directory when the user submits the job to the Grid. Wildcards (the * character) can be used, but it is forbidden to specify two or more files with the same file name, even if their paths are different. | Relative paths are relative to the current directory when the user submits the job to the Grid. Wildcards (the * character) can be used, but it is forbidden to specify two or more files with the same file name, even if their paths are different. | ||
The <code class="code">OutputSandbox</code> attribute contains the files for the standard output and the standard error, plus any other file created by the job executable that the user wants to keep. For example: | The <code class="code">OutputSandbox</code> attribute contains the files for the standard output and the standard error, plus any other file created by the job executable that the user wants to keep. For example: | ||
<pre class="program">OutputSandbox = {"std.out", "std.err", "output.data"};</pre> | <pre class="program">OutputSandbox = {"std.out", "std.err", "output.data"};</pre> | ||
All the files in the output sandbox must be expressed as relative paths. Their absolute paths cannot be known in advance, as the jobs are executed in temporary directories on the execution host. | All the files in the output sandbox must be expressed as relative paths. Their absolute paths cannot be known in advance, as the jobs are executed in temporary directories on the execution host. | ||
It is very important that the files in the sandboxes are limited both in size and in number. This is because every sandbox file has to be separately transferred from the UI to the WMS and then to the execution host, or vice versa; jobs having a lot of files or very big files in the sandboxes have an impact on the performance of the WMS. | It is very important that the files in the sandboxes are limited both in size and in number. This is because every sandbox file has to be separately transferred from the UI to the WMS and then to the execution host, or vice versa; jobs having a lot of files or very big files in the sandboxes have an impact on the performance of the WMS. | ||
</div></div><div title="Other important attributes | </div></div><div class="sect2" title="Other important attributes"><div class="titlepage"><div><div> | ||
=== Other important attributes === | === Other important attributes === | ||
</div></div></div> | </div></div></div> | ||
The <code class="code">Arguments</code> attribute can be used to specify command-line arguments for the command specified in the <code class="code">Executable</code> attribute. For example: | The <code class="code">Arguments</code> attribute can be used to specify command-line arguments for the command specified in the <code class="code">Executable</code> attribute. For example: | ||
<pre class="program">Executable = "/bin/echo"; | <pre class="program">Executable = "/bin/echo"; | ||
Arguments = "Hello world!";</pre> | Arguments = "Hello world!";</pre> | ||
Special characters such ", &, |, \, <, > must be escaped by preceding them with a \. This means that the parser which processes the JDL commands knows when to regard a special character as part of a string, as in other parsers. | Special characters such ", &, |, \, <, > must be escaped by preceding them with a \. This means that the parser which processes the JDL commands knows when to regard a special character as part of a string, as in other parsers. | ||
</div><div title="The "Hello World | </div><div class="sect2" title="The "Hello World"><div class="titlepage"><div><div> | ||
=== The "Hello World" job === | === The "Hello World" job === | ||
</div></div></div> | </div></div></div> | ||
The following job executes a command to print the string "Hello world!" to the standard output. The executable is called <code class="code">test.sh</code> and is a shell script with execution flag enabled: | The following job executes a command to print the string "Hello world!" to the standard output. The executable is called <code class="code">test.sh</code> and is a shell script with execution flag enabled: | ||
<pre class="program">#!/bin/sh | <pre class="program">#!/bin/sh | ||
# test.sh | # test.sh | ||
echo $*</pre> | echo $*</pre> | ||
while the JDL file is: | while the JDL file is: | ||
<pre class="program"># test.jdl | <pre class="program"># test.jdl | ||
Line 128: | Line 122: | ||
StdError = "std.err"; | StdError = "std.err"; | ||
InputSandbox = {"test.sh"}; | InputSandbox = {"test.sh"}; | ||
OutputSandbox = {"std.out", "std.err"};</pre></div><div title="Submit a job | OutputSandbox = {"std.out", "std.err"};</pre></div><div class="sect2" title="Submit a job"><div class="titlepage"><div><div> | ||
=== Submit a job === | === Submit a job === | ||
</div></div></div> | </div></div></div> | ||
The submission of a job is accomplished by executing the command: | The submission of a job is accomplished by executing the command: | ||
<pre class="command">$ glite-wms-job-submit [-o out_file] <jdl></pre> | <pre class="command">$ glite-wms-job-submit [-o out_file] <jdl></pre> | ||
If successful, the command returns a string which identifies the job from that moment on (referred to as the <span class="emphasis">''jobId''</span>). The jobId has the format: | If successful, the command returns a string which identifies the job from that moment on (referred to as the <span class="emphasis">''jobId''</span>). The jobId has the format: | ||
<pre class="response">https://<wmproxy>[:port]/unique_string</pre> | <pre class="response">https://<wmproxy>[:port]/unique_string</pre> | ||
where <code class="code"><wmproxy></code> is the host name of the WMProxy server, usually the machine to which the job was submitted. | where <code class="code"><wmproxy></code> is the host name of the WMProxy server, usually the machine to which the job was submitted. | ||
The optional <code class="code">-o <out_file></code> appends the jobId to a file, which is rather convenient way to keep track of the submitted jobs. | The optional <code class="code">-o <out_file></code> appends the jobId to a file, which is rather convenient way to keep track of the submitted jobs. | ||
It is worth to noting that the WMS server chosen to submit the job depends on the configuration of the User Interface. | It is worth to noting that the WMS server chosen to submit the job depends on the configuration of the User Interface. | ||
<div title="Example: simple job submission | <div class="sect3" title="Example: simple job submission"><div class="titlepage"><div><div> | ||
==== Example: simple job submission ==== | ==== Example: simple job submission ==== | ||
</div></div></div> | </div></div></div> | ||
You can try now to submit the job described by the <code class="code">test.sh</code> and <code class="code">test.jdl</code>, with the following command: | You can try now to submit the job described by the <code class="code">test.sh</code> and <code class="code">test.jdl</code>, with the following command: | ||
<pre class="command">$ glite-wms-job-submit -a test.jdl</pre> | <pre class="command">$ glite-wms-job-submit -a test.jdl</pre> | ||
You will get an output similar to: | You will get an output similar to: | ||
<pre class="response">Connecting to the service https://prod-wms-01.pd.infn.it:7443/glite_wms_wmproxy_server | <pre class="response">Connecting to the service https://prod-wms-01.pd.infn.it:7443/glite_wms_wmproxy_server | ||
Line 154: | Line 148: | ||
https://prod-wms-01.pd.infn.it:9000/0rbdA50itN26nZJJEGGXnw | https://prod-wms-01.pd.infn.it:9000/0rbdA50itN26nZJJEGGXnw | ||
========================================================================== | ========================================================================== | ||
</pre> | </pre> | ||
if the job submission was successful. | if the job submission was successful. | ||
</div></div><div title="The status of a job | </div></div><div class="sect2" title="The status of a job"><div class="titlepage"><div><div> | ||
=== The status of a job === | === The status of a job === | ||
</div></div></div> | </div></div></div> | ||
The status of a job can be obtained using the command: | The status of a job can be obtained using the command: | ||
<pre class="template">$ glite-wms-job-status <jobId></pre> | <pre class="template">$ glite-wms-job-status <jobId></pre> | ||
or: | or: | ||
<pre class="template">$ glite-wms-job-status -i <file_path></pre> | <pre class="template">$ glite-wms-job-status -i <file_path></pre> | ||
where <code class="code"><jobId></code> and <code class="code"><file_path></code> are, respectively, a <code class="code">jobId</code> and a file containing a list of <code class="code">jobId</code>s (like the file produced by the <code class="code">-o</code> option of <code class="code">glite-wms-job-submit</code>). | where <code class="code"><jobId></code> and <code class="code"><file_path></code> are, respectively, a <code class="code">jobId</code> and a file containing a list of <code class="code">jobId</code>s (like the file produced by the <code class="code">-o</code> option of <code class="code">glite-wms-job-submit</code>). | ||
There are many possible states a job may find itself in. The following table has a brief description of their meaning. | There are many possible states a job may find itself in. The following table has a brief description of their meaning. | ||
<div class="informaltable"> | <div class="informaltable"> | ||
{| border="1" | {| border="1" | ||
|- | |- | ||
! Status | ! Status | ||
! Description | ! Description | ||
|- | |- | ||
| Submitted | | Submitted | ||
| The job has been accepted by the grid | | The job has been accepted by the grid | ||
|- | |- | ||
| Waiting | | Waiting | ||
| An appropriate resource has yet to be selected | | An appropriate resource has yet to be selected | ||
|- | |- | ||
| Ready | | Ready | ||
| An appropriate resource has been selected | | An appropriate resource has been selected | ||
|- | |- | ||
| Scheduled | | Scheduled | ||
| The job has been correctly submitted to the remote resource | | The job has been correctly submitted to the remote resource | ||
|- | |- | ||
| Running | | Running | ||
| The job is currently running | | The job is currently running | ||
|- | |- | ||
| Done | | Done | ||
| The job has finished (either correctly or not) | | The job has finished (either correctly or not) | ||
|- | |- | ||
| Cleared | | Cleared | ||
| The job's output has been retrieved | | The job's output has been retrieved | ||
|- | |- | ||
| Aborted | | Aborted | ||
| The job failed and has been terminated | | The job failed and has been terminated | ||
|- | |- | ||
| Cancelled | | Cancelled | ||
| The job has been cancelled | | The job has been cancelled | ||
|} | |} | ||
</div> | </div> | ||
The correct sequence of states for a good job is 'Submitted'-'Waiting'-'Ready'-'Scheduled'-'Running'-'Done'. Due to the internal latencies of the system, even for the shortest job, a few minutes will be needed to see 'Done' status and be able to recover the output. | The correct sequence of states for a good job is 'Submitted'-'Waiting'-'Ready'-'Scheduled'-'Running'-'Done'. Due to the internal latencies of the system, even for the shortest job, a few minutes will be needed to see 'Done' status and be able to recover the output. | ||
It is impossible to see the status of another users' jobs. | It is impossible to see the status of another users' jobs. | ||
<div title="Example: Checking the status of a job | <div class="sect3" title="Example: Checking the status of a job"><div class="titlepage"><div><div> | ||
==== Example: Checking the status of a job ==== | ==== Example: Checking the status of a job ==== | ||
</div></div></div> | </div></div></div> | ||
This command retrieves the job status of the previously submitted job, in section Submit a job: | This command retrieves the job status of the previously submitted job, in section Submit a job: | ||
<pre class="command">$ glite-wms-job-status https://prod-wms-01.pd.infn.it:9000/KJ3A7ooH3nNUK5M1jzBLuA</pre> | <pre class="command">$ glite-wms-job-status https://prod-wms-01.pd.infn.it:9000/KJ3A7ooH3nNUK5M1jzBLuA</pre> | ||
If the job ended correctly, the result will be similar to: | If the job ended correctly, the result will be similar to: | ||
<pre class="response">************************************************************* | <pre class="response">************************************************************* | ||
BOOKKEEPING INFORMATION: | BOOKKEEPING INFORMATION: | ||
Status info for the Job : https://prod-wms-01.pd.infn.it:9000/KJ3A7ooH3nNUK5M1jzBLuA | Status info for the Job : https://prod-wms-01.pd.infn.it:9000/KJ3A7ooH3nNUK5M1jzBLuA | ||
Current Status: Done (Success) | Current Status: Done (Success) | ||
*************************************************************</pre></div></div><div title="Recovering Results | *************************************************************</pre></div></div><div class="sect2" title="Recovering Results"><div class="titlepage"><div><div> | ||
=== Recovering Results === | === Recovering Results === | ||
</div></div></div> | </div></div></div> | ||
The 'standard' job result retrieval command is: | The 'standard' job result retrieval command is: | ||
<pre class="programlisting">$ glite-wms-job-output < job identifier ></pre> | <pre class="programlisting">$ glite-wms-job-output < job identifier ></pre> | ||
where the<code class="code">job identifier</code> is the job identifier string returned from the <code class="code">glite-wms-job-submit</code> command above. The result of the <code class="code">glite-wms-job-status</code> command for this job must indicate that the status of the job is "Done" before results can be retrieved. | where the<code class="code">job identifier</code> is the job identifier string returned from the <code class="code">glite-wms-job-submit</code> command above. The result of the <code class="code">glite-wms-job-status</code> command for this job must indicate that the status of the job is "Done" before results can be retrieved. | ||
For this example the job result recovery command is: | For this example the job result recovery command is: | ||
<pre class="command">glite-wms-job-output https://prod-wms-01.pd.infn.it:9000/fHJ43bqB-q_9gIGILxDTbg</pre> | <pre class="command">glite-wms-job-output https://prod-wms-01.pd.infn.it:9000/fHJ43bqB-q_9gIGILxDTbg</pre> | ||
After issuing this command, you should see: | After issuing this command, you should see: | ||
<pre class="response">Connecting to the service https://193.206.210.111:7443/glite_wms_wmproxy_server | <pre class="response">Connecting to the service https://193.206.210.111:7443/glite_wms_wmproxy_server | ||
================================================================================ | ================================================================================ | ||
Line 233: | Line 227: | ||
/grid/users/loomis/JobOutput/loomis_fHJ43bqB-q_9gIGILxDTbg | /grid/users/loomis/JobOutput/loomis_fHJ43bqB-q_9gIGILxDTbg | ||
================================================================================ | ================================================================================ | ||
</pre> | </pre> | ||
As a default, the output of the job is recovered to your home machine under the <code class="code">/tmp</code> directory in a directory named by the scheme <code class="code">< user id >_< job id ></code>. In this example, the output directory was overridden by a local configuration file that specified the "OutputStorage" parameter to be "/grid/users/loomis/JobOutput". It could equally as well been specified using the <code class="code">--dir</code> option. | As a default, the output of the job is recovered to your home machine under the <code class="code">/tmp</code> directory in a directory named by the scheme <code class="code">< user id >_< job id ></code>. In this example, the output directory was overridden by a local configuration file that specified the "OutputStorage" parameter to be "/grid/users/loomis/JobOutput". It could equally as well been specified using the <code class="code">--dir</code> option. | ||
</div><div title="Retrieving job history | </div><div class="sect2" title="Retrieving job history"><div class="titlepage"><div><div> | ||
=== Retrieving job history === | === Retrieving job history === | ||
</div></div></div> | </div></div></div> | ||
The commands used to retrieve the history of a job are: | The commands used to retrieve the history of a job are: | ||
<div class="informaltable"> | <div class="informaltable"> | ||
{| border="1" | {| border="1" | ||
|- | |- | ||
| See job history | | See job history | ||
| <code class="code">glite-wms-job-logging-info</code> | | <code class="code">glite-wms-job-logging-info</code> | ||
|} | |} | ||
</div> | </div> | ||
All job management commands require the user to have a valid proxy to work, as all interactions with the WMS are secure and authenticated. | All job management commands require the user to have a valid proxy to work, as all interactions with the WMS are secure and authenticated. | ||
<div title="Retrieve the history of a job | <div class="sect3" title="Retrieve the history of a job"><div class="titlepage"><div><div> | ||
==== Retrieve the history of a job ==== | ==== Retrieve the history of a job ==== | ||
</div></div></div> | </div></div></div> | ||
Sometimes, it is useful to know the history of a job through the WMS from submission to end, for example to better understand the reason of a failure, when <code class="code">glite-wms-job-status</code> is not enough. The history, also called <span class="emphasis">''logging information''</span> of a job can be retrieved by doing: | Sometimes, it is useful to know the history of a job through the WMS from submission to end, for example to better understand the reason of a failure, when <code class="code">glite-wms-job-status</code> is not enough. The history, also called <span class="emphasis">''logging information''</span> of a job can be retrieved by doing: | ||
<pre class="command">$ glite-wms-job-logging-info [-v <verbosity>] <jobId></pre> | <pre class="command">$ glite-wms-job-logging-info [-v <verbosity>] <jobId></pre> | ||
or | or | ||
<pre class="command">$ glite-wms-job-logging-info [-v <verbosity>] -i <file_path></pre> | <pre class="command">$ glite-wms-job-logging-info [-v <verbosity>] -i <file_path></pre> | ||
where <code class="code"><jobId></code> and <code class="code"><file_path></code> are respectively a jobId and a file containing a list of jobIds (like the file produced by the <code class="code">-o</code> option of <code class="code">glite-wms-job-submit</code>). The optional parameter <code class="code">-v <verbosity></code> is used to set the amount of detail in the output, and it is advisable to use <code class="code">-v 3</code> to get the most complete information. | where <code class="code"><jobId></code> and <code class="code"><file_path></code> are respectively a jobId and a file containing a list of jobIds (like the file produced by the <code class="code">-o</code> option of <code class="code">glite-wms-job-submit</code>). The optional parameter <code class="code">-v <verbosity></code> is used to set the amount of detail in the output, and it is advisable to use <code class="code">-v 3</code> to get the most complete information. | ||
</div></div><div title="Job-compatible Resources | </div></div><div class="sect2" title="Job-compatible Resources"><div class="titlepage"><div><div> | ||
=== Job-compatible Resources === | === Job-compatible Resources === | ||
</div></div></div> | </div></div></div> | ||
It is possible to see which Computing Element (CE) satisfies the requirements for a job by using the command | It is possible to see which Computing Element (CE) satisfies the requirements for a job by using the command | ||
<pre class="template">$ glite-wms-job-list-match -a [--rank] <jdl> | <pre class="template">$ glite-wms-job-list-match -a [--rank] <jdl> | ||
</pre> | </pre> | ||
where <code class="code"><jdl></code> is the JDL file describing the job to be submitted. The <code class="code">--rank</code> option also prints the rank of the CEs. The rank is a number expressing the adequacy of a CE (the higher, the better). | where <code class="code"><jdl></code> is the JDL file describing the job to be submitted. The <code class="code">--rank</code> option also prints the rank of the CEs. The rank is a number expressing the adequacy of a CE (the higher, the better). | ||
A JDL can specify the resources that it needs using the "Requirements" attribute, which is a Boolean ClassAd expression. To have a job scheduled to run on a given CE, this requirements expression must evaluate to true on the given CE. The evaluation is performed by the Workload Management System (WMS) during the match-making phase. | A JDL can specify the resources that it needs using the "Requirements" attribute, which is a Boolean ClassAd expression. To have a job scheduled to run on a given CE, this requirements expression must evaluate to true on the given CE. The evaluation is performed by the Workload Management System (WMS) during the match-making phase. | ||
This is an example of a requirements expression: | This is an example of a requirements expression: | ||
<pre class="template">Requirements = other.GlueCEInfoLRMSType == "PBS" && | <pre class="template">Requirements = other.GlueCEInfoLRMSType == "PBS" && | ||
other.GlueCEInfoTotalCPUs > 2 && Member("IDL1.7",other.GlueHostApplicationSoftwareRunTimeEnvironment); | other.GlueCEInfoTotalCPUs > 2 && Member("IDL1.7",other.GlueHostApplicationSoftwareRunTimeEnvironment); | ||
</pre> | </pre> | ||
The above expression requires a CE whose local resource manager is PBS, which has at least 2 CPUs, and where the IDL software version 1.7 is installed. | The above expression requires a CE whose local resource manager is PBS, which has at least 2 CPUs, and where the IDL software version 1.7 is installed. | ||
</div> | </div> | ||
[[Category:Operations_Manuals]] |
Latest revision as of 15:37, 10 January 2013
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
You will find an explanation of how to use the job management commands to prepare and submit a simple job, monitor its status and retrieve the output.
Job commands
Most jobs are submitted to the EGI Grid infrastructure through the gLite WMS (Workload Management System). This is a Grid meta-scheduling service that matches job requirements to the capabilities of resources and chooses the most appropriate resource for a particular job.
The commands used for handling jobs on the Grid are:
Submitting a job | glite-wms-job-submit
|
Checking job status | glite-wms-job-status
|
Retrieving job output | glite-wms-job-output
|
Checking job history | glite-wms-job-logging-info
|
Listing compatible resources | glite-wms-job-list-match
|
Delegate a proxy | glite-wms-job-delegate-proxy
|
All job management commands require the user to have a valid proxy to work, as all interactions with the WMS are secure and authenticated.
Proxy Delegation
The gLite WMS allows more flexible proxy delegation than previous meta-scheduling services, requires explicit delegation of a proxy. Consequently, the gLite WMS has an additional command to do this delegation (glite-wms-job-delegate-proxy
) and required delegation parameters for the glite-wms-job-submit
and glite-wms-job-list-match
commands.
Automatic Delegation
To use automatic delegation, one simply needs to add the -a
option when using the glite-wms-job-submit
and glite-wms-job-list-match
commands. This will transmit your active proxy to the WMS system and use it for the job.
Manual Delegation
You may want to use different proxies for different classes of jobs to allow those jobs to have different authorizations (e.g. via the use of VOMS groups or roles). To manually delegate a proxy, create a local proxy with the groups and roles you want with the voms-proxy-init
command. To delegate this proxy, do the following:
glite-wms-job-delegate-proxy -d myproxy
Replace "myproxy" with a descriptive name of your choice. You can have many different, named proxies delegated to the WMS system. This proxy can then be used for a particular job submission via the -d
option. For example,
glite-wms-job-submit -d myproxy MyJob.jdl
This will use the proxy identified by the name "myproxy" for the submitted job.
Preparing a Simple Job
The main purpose of the Grid is to allow people to run their programs in the most efficient way. In Grid terminology, a job is a unit of work intended as a program which starts, reads some data, does some calculation, produces an output and finishes. A job is described by a file, expressed in a language called Job Description Language, often referred to as a JDL file. A JDL file contains information such as
- the name of the program to run;
- the input files read by the program;
- the output files produced by the program;
- the requirements to be satisfied by the host which is going to execute the program.
The Grid is made up, among other things, of Computing Elements (CE), which correspond physically to clusters of computers located in a computer centre. Users submit their jobs to the WMS, which looks at the corresponding JDL files and dispatches the jobs to the best available resources satisfying the job requirements, where the adequacy of a resource is measured by an estimate of the time to wait from job submission to job execution (which can be large, in case the resource has already a long queue of jobs to process).
The Job Description Language
The Job Description Language is a high-level language used to describe jobs. A JDL file is a text file containing a series of key-value pairs with the format:
attribute = expression;
where every pair is terminated by a semi-colon and can span several lines. Comments in a JDL file must be preceded by a pound (#) or enclosed between /* and */.
In the following examples, the most important JDL attributes will be explained.
The simplest possible job
The most basic example of a Grid job consists in executing a simple command and retrieving its output. This JDL file shows code to do this:
# example.jdl Executable = "/bin/hostname"; StdOutput = "std.out"; StdError = "std.err";
The Executable
attribute specifies the command to run in the job. The StdOutput
attribute specifies the file where the standard output of the job will be written, and similarly the StdError
attribute specifies the file where the standard error of the job will be written.
The above JDL file does not allow to get back the output of the job. This is done via the concept of sandbox.
The sandbox
The sandbox mechanism allows users to specify what files should be sent together with the job from the UI to the execution host, and sent back from to execution host to UI upon successful completion of the job. This is achieved by using the InputSandbox
and the OutputSandbox
attributes, respectively.
The InputSandbox
attribute contains typically the executable to be run, when it is not already present on the execution host, and any other file possibly needed for the executable to run and which is located on the User Interface. For example:
InputSandbox = {"/home/doe/test.sh", "fileA", "fileB"};
Relative paths are relative to the current directory when the user submits the job to the Grid. Wildcards (the * character) can be used, but it is forbidden to specify two or more files with the same file name, even if their paths are different.
The OutputSandbox
attribute contains the files for the standard output and the standard error, plus any other file created by the job executable that the user wants to keep. For example:
OutputSandbox = {"std.out", "std.err", "output.data"};
All the files in the output sandbox must be expressed as relative paths. Their absolute paths cannot be known in advance, as the jobs are executed in temporary directories on the execution host.
It is very important that the files in the sandboxes are limited both in size and in number. This is because every sandbox file has to be separately transferred from the UI to the WMS and then to the execution host, or vice versa; jobs having a lot of files or very big files in the sandboxes have an impact on the performance of the WMS.
Other important attributes
The Arguments
attribute can be used to specify command-line arguments for the command specified in the Executable
attribute. For example:
Executable = "/bin/echo"; Arguments = "Hello world!";
Special characters such ", &, |, \, <, > must be escaped by preceding them with a \. This means that the parser which processes the JDL commands knows when to regard a special character as part of a string, as in other parsers.
The "Hello World" job
The following job executes a command to print the string "Hello world!" to the standard output. The executable is called test.sh
and is a shell script with execution flag enabled:
#!/bin/sh # test.sh echo $*
while the JDL file is:
# test.jdl Executable = "test.sh"; Arguments = "Hello world!"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"test.sh"}; OutputSandbox = {"std.out", "std.err"};
Submit a job
The submission of a job is accomplished by executing the command:
$ glite-wms-job-submit [-o out_file] <jdl>
If successful, the command returns a string which identifies the job from that moment on (referred to as the jobId). The jobId has the format:
https://<wmproxy>[:port]/unique_string
where <wmproxy>
is the host name of the WMProxy server, usually the machine to which the job was submitted.
The optional -o <out_file>
appends the jobId to a file, which is rather convenient way to keep track of the submitted jobs.
It is worth to noting that the WMS server chosen to submit the job depends on the configuration of the User Interface.
Example: simple job submission
You can try now to submit the job described by the test.sh
and test.jdl
, with the following command:
$ glite-wms-job-submit -a test.jdl
You will get an output similar to:
Connecting to the service https://prod-wms-01.pd.infn.it:7443/glite_wms_wmproxy_server ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: https://prod-wms-01.pd.infn.it:9000/0rbdA50itN26nZJJEGGXnw ==========================================================================
if the job submission was successful.
The status of a job
The status of a job can be obtained using the command:
$ glite-wms-job-status <jobId>
or:
$ glite-wms-job-status -i <file_path>
where <jobId>
and <file_path>
are, respectively, a jobId
and a file containing a list of jobId
s (like the file produced by the -o
option of glite-wms-job-submit
).
There are many possible states a job may find itself in. The following table has a brief description of their meaning.
Status | Description |
---|---|
Submitted | The job has been accepted by the grid |
Waiting | An appropriate resource has yet to be selected |
Ready | An appropriate resource has been selected |
Scheduled | The job has been correctly submitted to the remote resource |
Running | The job is currently running |
Done | The job has finished (either correctly or not) |
Cleared | The job's output has been retrieved |
Aborted | The job failed and has been terminated |
Cancelled | The job has been cancelled |
The correct sequence of states for a good job is 'Submitted'-'Waiting'-'Ready'-'Scheduled'-'Running'-'Done'. Due to the internal latencies of the system, even for the shortest job, a few minutes will be needed to see 'Done' status and be able to recover the output.
It is impossible to see the status of another users' jobs.
Example: Checking the status of a job
This command retrieves the job status of the previously submitted job, in section Submit a job:
$ glite-wms-job-status https://prod-wms-01.pd.infn.it:9000/KJ3A7ooH3nNUK5M1jzBLuA
If the job ended correctly, the result will be similar to:
************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://prod-wms-01.pd.infn.it:9000/KJ3A7ooH3nNUK5M1jzBLuA Current Status: Done (Success) *************************************************************
Recovering Results
The 'standard' job result retrieval command is:
$ glite-wms-job-output < job identifier >
where thejob identifier
is the job identifier string returned from the glite-wms-job-submit
command above. The result of the glite-wms-job-status
command for this job must indicate that the status of the job is "Done" before results can be retrieved.
For this example the job result recovery command is:
glite-wms-job-output https://prod-wms-01.pd.infn.it:9000/fHJ43bqB-q_9gIGILxDTbg
After issuing this command, you should see:
Connecting to the service https://193.206.210.111:7443/glite_wms_wmproxy_server ================================================================================ JOB GET OUTPUT OUTCOME Output sandbox files for the job: https://prod-wms-01.pd.infn.it:9000/fHJ43bqB-q_9gIGILxDTbg have been successfully retrieved and stored in the directory: /grid/users/loomis/JobOutput/loomis_fHJ43bqB-q_9gIGILxDTbg ================================================================================
As a default, the output of the job is recovered to your home machine under the /tmp
directory in a directory named by the scheme < user id >_< job id >
. In this example, the output directory was overridden by a local configuration file that specified the "OutputStorage" parameter to be "/grid/users/loomis/JobOutput". It could equally as well been specified using the --dir
option.
Retrieving job history
The commands used to retrieve the history of a job are:
See job history | glite-wms-job-logging-info
|
All job management commands require the user to have a valid proxy to work, as all interactions with the WMS are secure and authenticated.
Retrieve the history of a job
Sometimes, it is useful to know the history of a job through the WMS from submission to end, for example to better understand the reason of a failure, when glite-wms-job-status
is not enough. The history, also called logging information of a job can be retrieved by doing:
$ glite-wms-job-logging-info [-v <verbosity>] <jobId>
or
$ glite-wms-job-logging-info [-v <verbosity>] -i <file_path>
where <jobId>
and <file_path>
are respectively a jobId and a file containing a list of jobIds (like the file produced by the -o
option of glite-wms-job-submit
). The optional parameter -v <verbosity>
is used to set the amount of detail in the output, and it is advisable to use -v 3
to get the most complete information.
Job-compatible Resources
It is possible to see which Computing Element (CE) satisfies the requirements for a job by using the command
$ glite-wms-job-list-match -a [--rank] <jdl>
where <jdl>
is the JDL file describing the job to be submitted. The --rank
option also prints the rank of the CEs. The rank is a number expressing the adequacy of a CE (the higher, the better).
A JDL can specify the resources that it needs using the "Requirements" attribute, which is a Boolean ClassAd expression. To have a job scheduled to run on a given CE, this requirements expression must evaluate to true on the given CE. The evaluation is performed by the Workload Management System (WMS) during the match-making phase.
This is an example of a requirements expression:
Requirements = other.GlueCEInfoLRMSType == "PBS" && other.GlueCEInfoTotalCPUs > 2 && Member("IDL1.7",other.GlueHostApplicationSoftwareRunTimeEnvironment);
The above expression requires a CE whose local resource manager is PBS, which has at least 2 CPUs, and where the IDL software version 1.7 is installed.