Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "USG MPI Jobs"

From EGIWiki
Jump to navigation Jump to search
 
Line 7: Line 7:
----
----


[[Category:Operations_Manuals]]
You will find how to compile and run a simple Message Passing Interface job on the grid.  
 
<div class="sect2" title="MPI and its implementations"><div class="titlepage"><div><div>
How to compile and run a simple Message Passing Interface job on the grid.
=== MPI and its implementations ===
<div title="MPI and its implementations" class="sect2"><div class="titlepage"><div><div>
</div></div></div>  
=== MPI and its implementations ===
The <span class="emphasis">''Message Passing Interface (MPI)''</span> is commonly used to handle the communications between tasks in parallel applications. There are two versions of MPI, MPI-1 and MPI-2. Two implementations of MPI-1 (LAM and MPICH) and two implementations of MPI-2 (OpenMPI and MPICH2) are supported. Individual sites may choose to support only a subset of these implementations, or none at all.  
</div></div></div>
The <span class="emphasis">''Message Passing Interface (MPI)''</span> is commonly used to handle the communications between tasks in parallel applications. There are two versions of MPI, MPI-1 and MPI-2. Two implementations of MPI-1 (LAM and MPICH) and two implementations of MPI-2 (OpenMPI and MPICH2) are supported. Individual sites may choose to support only a subset of these implementations, or none at all.
 
In the past, running MPI applications on the EGEE infrastructure required significant hand-tuning for each site. This was needed to compensate for sites that did or did not have a shared file system, location of the default scratch space, etc. The current configuration allows jobs to be more portable and allows the user more flexibility.


The increased portability and flexibility is achieved by working around hard-coded constraints from the RB and by off-loading much of the initialisation work to the <code class="code">mpi-start</code> scripts. The <code class="code">mpi-start</code> scripts are developed by the [http://www.interactive-grid.eu/ int.eu.grid] project and based on the work of the [http://www.grid.ie/mpi/wiki MPI working group] that contains members from both int.eu.grid and EGEE.
In the past, running MPI applications on the infrastructure required significant hand-tuning for each site. This was needed to compensate for sites that did or did not have a shared file system, location of the default scratch space, etc. The current configuration allows jobs to be more portable and allows the user more flexibility.  


Using the <code class="code">mpi-start</code> system requires the user to define a <span class="emphasis">''wrapper
The increased portability and flexibility is achieved by working around hard-coded constraints from the RB and by off-loading much of the initialisation work to the <code class="code">mpi-start</code> scripts. Using the <code class="code">mpi-start</code> system requires the user to define a <span class="emphasis">''wrapper''</span>''and a set of <span class="emphasis" />''hooks''. ''The mpi-start system then handles most of the low-level details of running the MPI job on a particular site.  
script''</span> and a set of <span class="emphasis">''hooks''</span>. The mpi-start system then handles most of the low-level details of running the MPI job on a particular site.
</div><div class="sect2" title="Wrapper script for mpi-start"><div class="titlepage"><div><div>
</div><div title="Wrapper script for mpi-start" class="sect2"><div class="titlepage"><div><div>
=== Wrapper script for <code class="code">mpi-start</code> ===
=== Wrapper script for <code class="code">mpi-start</code> ===
</div></div></div>  
</div></div></div>
Users typically use a script that sets up paths and other internal settings to initiate the <code class="code">mpi-start</code> processing. The following script (named "mpi-start-wrapper.sh") is generic and should not need to have significant modifications made to it.  
Users typically use a script that sets up paths and other internal settings to initiate the <code class="code">mpi-start</code> processing. The following script (named "mpi-start-wrapper.sh") is generic and should not need to have significant modifications made to it.
<pre class="program">#!/bin/bash
<pre class="program">#!/bin/bash


Line 59: Line 54:


# Invoke mpi-start.
# Invoke mpi-start.
$I2G_MPI_START</pre>
$I2G_MPI_START</pre>  
The script first sets up the environment for the chosen flavor of MPI using environment variables supplied by the system administrator. It then defines the executable, arguments, MPI flavor, and location of the hook scripts for mpi-start. The user may optionally ask for more logging information with the verbose and debug environment variables. Lastly, the wrapper invokes <code class="code">mpi-start</code> itself.
The script first sets up the environment for the chosen flavor of MPI using environment variables supplied by the system administrator. It then defines the executable, arguments, MPI flavor, and location of the hook scripts for mpi-start. The user may optionally ask for more logging information with the verbose and debug environment variables. Lastly, the wrapper invokes <code class="code">mpi-start</code> itself.  
</div><div title="Hooks for mpi-start" class="sect2"><div class="titlepage"><div><div>
</div><div class="sect2" title="Hooks for mpi-start"><div class="titlepage"><div><div>
=== Hooks for mpi-start ===
=== Hooks for mpi-start ===
</div></div></div>
</div></div></div>  
The user may write a script that is called before and after the MPI executable is run. The <span class="emphasis">''pre-hook''</span> can be used, for example, to compile the executable itself or download data. The post-hook can be used to analyze results or to save the results on the grid.
The user may write a script that is called before and after the MPI executable is run. The <span class="emphasis">''pre-hook''</span> can be used, for example, to compile the executable itself or download data. The post-hook can be used to analyze results or to save the results on the grid.  


The following example (named <code class="code">"mpi-hooks.sh"</code>) compiles the executable before running it; the post-hook only writes a message to the standard output. A real-world job would likely save the results of the job somewhere on the grid for user retrieval.
The following example (named <code class="code">"mpi-hooks.sh"</code>) compiles the executable before running it; the post-hook only writes a message to the standard output. A real-world job would likely save the results of the job somewhere on the grid for user retrieval.  
<pre class="program">#!/bin/sh
<pre class="program">#!/bin/sh


Line 82: Line 77:
   echo $cmd
   echo $cmd
   $cmd
   $cmd
   if [ ! $? -eq 0 ]; then
   if [&nbsp;! $? -eq 0 ]; then
     echo "Error compiling program.  Exiting..."
     echo "Error compiling program.  Exiting..."
     exit 1
     exit 1
Line 104: Line 99:


   return 0
   return 0
}</pre>
}</pre>  
The pre- and post-hooks may be defined in separate files, but the names of the functions must be named exactly <code class="code">"pre_run_hook"</code> and <code class="code">"post_run_hook"</code>.
The pre- and post-hooks may be defined in separate files, but the names of the functions must be named exactly <code class="code">"pre_run_hook"</code> and <code class="code">"post_run_hook"</code>.  
</div><div title="Defining the job and executable" class="sect2"><div class="titlepage"><div><div>
</div><div class="sect2" title="Defining the job and executable"><div class="titlepage"><div><div>
=== Defining the job and executable ===
=== Defining the job and executable ===
</div></div></div>
</div></div></div>  
Running the MPI job itself is not significantly different from running a standard grid job. The user must define a JDL file describing the requirements for the job. An example is:
Running the MPI job itself is not significantly different from running a standard grid job. The user must define a JDL file describing the requirements for the job. An example is:  
<pre class="program">#
<pre class="program">#
# mpi-test.jdl
# mpi-test.jdl
Line 125: Line 120:
   &amp;&amp; Member("OPENMPI", other.GlueHostApplicationSoftwareRunTimeEnvironment)
   &amp;&amp; Member("OPENMPI", other.GlueHostApplicationSoftwareRunTimeEnvironment)
   # &amp;&amp; RegExp("grid.*.lal.in2p3.fr.*sdj$",other.GlueCEUniqueID)
   # &amp;&amp; RegExp("grid.*.lal.in2p3.fr.*sdj$",other.GlueCEUniqueID)
  ;
&nbsp;;
#
#
# - the end
# - the end
#
#
</pre>
</pre>  
The <code class="code">JobType</code> must be <code class="code">"Normal"</code> and the attribute <code class="code">CpuNumber</code> must be defined (16 in this example). Despite the name of the attribute, this attribute defines the number of CPUs required by the job. It is not possible to request more complicated topologies based on nodes and CPUs.
The <code class="code">JobType</code> must be <code class="code">"Normal"</code> and the attribute <code class="code">CpuNumber</code> must be defined (16 in this example). Despite the name of the attribute, this attribute defines the number of CPUs required by the job. It is not possible to request more complicated topologies based on nodes and CPUs.  


This example uses the OpenMPI implementation of the MPI-2 standard. The other supported implementations can be selected by changing <code class="code">OPENMPI</code> (in two places) to the name of the desired implementation. The other names are <code class="code">"LAM"</code>, <code class="code">"MPICH"</code>, and <code class="code">"MPICH2"</code>. The <code class="code">JobType</code> attribute must be <code class="code">"Normal"</code> in all cases; it selects for an MPI job in general and not the specific implementation.
This example uses the OpenMPI implementation of the MPI-2 standard. The other supported implementations can be selected by changing <code class="code">OPENMPI</code> (in two places) to the name of the desired implementation. The other names are <code class="code">"LAM"</code>, <code class="code">"MPICH"</code>, and <code class="code">"MPICH2"</code>. The <code class="code">JobType</code> attribute must be <code class="code">"Normal"</code> in all cases; it selects for an MPI job in general and not the specific implementation.  


All of the files for the above example JDL file have been defined except for the actual MPI program. This is a simple Hello World example written in C. The code is:
All of the files for the above example JDL file have been defined except for the actual MPI program. This is a simple Hello World example written in C. The code is:  
<pre class="program">/*  hello.c
<pre class="program">/*  hello.c
  *
  *
Line 156: Line 151:
   /* Find the number of processors */
   /* Find the number of processors */
   MPI_Comm_size(MPI_COMM_WORLD, &amp;numprocs);
   MPI_Comm_size(MPI_COMM_WORLD, &amp;numprocs);
   printf ("Hello world! from processor %d out of %d\n", procnum, numprocs);
   printf ("Hello world! from processor&nbsp;%d out of&nbsp;%d\n", procnum, numprocs);


   /* Shut down MPI */
   /* Shut down MPI */
   MPI_Finalize();
   MPI_Finalize();
   return 0;
   return 0;
}</pre>
}</pre>  
It is highly recommended to compile the MPI locally. Many compilation options are specific to the software installed or hardware installed on a site. Sending a binary file risks sub-optimal performance at best and crashes at worst.
It is highly recommended to compile the MPI locally. Many compilation options are specific to the software installed or hardware installed on a site. Sending a binary file risks sub-optimal performance at best and crashes at worst.  
</div><div title="Running the MPI job" class="sect2"><div class="titlepage"><div><div>
</div><div class="sect2" title="Running the MPI job"><div class="titlepage"><div><div>
=== Running the MPI job ===
=== Running the MPI job ===
</div></div></div>
</div></div></div>  
Running the MPI job is no different from any other grid job. Use the commands <code class="code">glite-wms-job-submit</code>, <code class="code">glite-wms-job-status</code>, and <code class="code">glite-wms-job-output</code> to submit, check the status, and recover the output of a job.
Running the MPI job is no different from any other grid job. Use the commands <code class="code">glite-wms-job-submit</code>, <code class="code">glite-wms-job-status</code>, and <code class="code">glite-wms-job-output</code> to submit, check the status, and recover the output of a job.  


If the job ran correctly, then the standard output should contain something like the following:
If the job ran correctly, then the standard output should contain something like the following:  
<pre class="response">Hello world! from processor 15 out of 16
<pre class="response">Hello world! from processor 15 out of 16
Hello world! from processor 0 out of 16
Hello world! from processor 0 out of 16
Line 185: Line 180:
Hello world! from processor 11 out of 16
Hello world! from processor 11 out of 16
Hello world! from processor 13 out of 16
Hello world! from processor 13 out of 16
</pre>
</pre>  
If there are problems running the job and the standard output and error do not contain enough information, setting the <code class="code">mpi-start</code> debug flag in the wrapper script may help.
If there are problems running the job and the standard output and error do not contain enough information, setting the <code class="code">mpi-start</code> debug flag in the wrapper script may help.  
</div>
</div>  
[[Category:Operations_Manuals]]

Latest revision as of 17:17, 10 January 2013

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators




<<  EGI User Start Guide


You will find how to compile and run a simple Message Passing Interface job on the grid.

MPI and its implementations

The Message Passing Interface (MPI) is commonly used to handle the communications between tasks in parallel applications. There are two versions of MPI, MPI-1 and MPI-2. Two implementations of MPI-1 (LAM and MPICH) and two implementations of MPI-2 (OpenMPI and MPICH2) are supported. Individual sites may choose to support only a subset of these implementations, or none at all.

In the past, running MPI applications on the infrastructure required significant hand-tuning for each site. This was needed to compensate for sites that did or did not have a shared file system, location of the default scratch space, etc. The current configuration allows jobs to be more portable and allows the user more flexibility.

The increased portability and flexibility is achieved by working around hard-coded constraints from the RB and by off-loading much of the initialisation work to the mpi-start scripts. Using the mpi-start system requires the user to define a wrapperand a set of hooks. The mpi-start system then handles most of the low-level details of running the MPI job on a particular site.

Wrapper script for mpi-start

Users typically use a script that sets up paths and other internal settings to initiate the mpi-start processing. The following script (named "mpi-start-wrapper.sh") is generic and should not need to have significant modifications made to it.

#!/bin/bash

# Pull in the arguments.
MY_EXECUTABLE=`pwd`/$1
MPI_FLAVOR=$2

# Convert flavor to lowercase for passing to mpi-start.
MPI_FLAVOR_LOWER=`echo $MPI_FLAVOR | tr '[:upper:]' '[:lower:]'`

# Pull out the correct paths for the requested flavor.
eval MPI_PATH=`printenv MPI_${MPI_FLAVOR}_PATH`

# Ensure the prefix is correctly set.  Don't rely on the defaults.
eval I2G_${MPI_FLAVOR}_PREFIX=$MPI_PATH
export I2G_${MPI_FLAVOR}_PREFIX

# Touch the executable.  It exist must for the shared file system
check.
# If it does not, then mpi-start may try to distribute the executable
# when it shouldn't.
touch $MY_EXECUTABLE

# Setup for mpi-start.
export I2G_MPI_APPLICATION=$MY_EXECUTABLE
export I2G_MPI_APPLICATION_ARGS=
export I2G_MPI_TYPE=$MPI_FLAVOR_LOWER
export I2G_MPI_PRE_RUN_HOOK=mpi-hooks.sh
export I2G_MPI_POST_RUN_HOOK=mpi-hooks.sh

# If these are set then you will get more debugging information.
export I2G_MPI_START_VERBOSE=1
#export I2G_MPI_START_DEBUG=1

# Invoke mpi-start.
$I2G_MPI_START

The script first sets up the environment for the chosen flavor of MPI using environment variables supplied by the system administrator. It then defines the executable, arguments, MPI flavor, and location of the hook scripts for mpi-start. The user may optionally ask for more logging information with the verbose and debug environment variables. Lastly, the wrapper invokes mpi-start itself.

Hooks for mpi-start

The user may write a script that is called before and after the MPI executable is run. The pre-hook can be used, for example, to compile the executable itself or download data. The post-hook can be used to analyze results or to save the results on the grid.

The following example (named "mpi-hooks.sh") compiles the executable before running it; the post-hook only writes a message to the standard output. A real-world job would likely save the results of the job somewhere on the grid for user retrieval.

#!/bin/sh

#
# This function will be called before the MPI executable is started.
# You can, for example, compile the executable itself.
#
pre_run_hook () {

  # Compile the program.
  echo "Compiling ${I2G_MPI_APPLICATION}"

  # Actually compile the program.
  cmd="mpicc ${MPI_MPICC_OPTS} -o ${I2G_MPI_APPLICATION} ${I2G_MPI_APPLICATION}.c"
  echo $cmd
  $cmd
  if [ ! $? -eq 0 ]; then
    echo "Error compiling program.  Exiting..."
    exit 1
  fi

  # Everything's OK.
  echo "Successfully compiled ${I2G_MPI_APPLICATION}"

  return 0
}

#
# This function will be called before the MPI executable is finished.
# A typical case for this is to upload the results to a storage
element.
#
post_run_hook () {

  echo "Executing post hook."
  echo "Finished the post hook."

  return 0
}

The pre- and post-hooks may be defined in separate files, but the names of the functions must be named exactly "pre_run_hook" and "post_run_hook".

Defining the job and executable

Running the MPI job itself is not significantly different from running a standard grid job. The user must define a JDL file describing the requirements for the job. An example is:

#
# mpi-test.jdl
#
JobType        = "Normal";
CpuNumber      = 16;
Executable     = "mpi-start-wrapper.sh";
Arguments      = "mpi-test OPENMPI";
StdOutput      = "mpi-test.out";
StdError       = "mpi-test.err";
InputSandbox   = {"mpi-start-wrapper.sh","mpi-hooks.sh","mpi-test.c"};
OutputSandbox  = {"mpi-test.err","mpi-test.out"};
Requirements =
  Member("MPI-START", other.GlueHostApplicationSoftwareRunTimeEnvironment)
  && Member("OPENMPI", other.GlueHostApplicationSoftwareRunTimeEnvironment)
  # && RegExp("grid.*.lal.in2p3.fr.*sdj$",other.GlueCEUniqueID)
  ;
#
# - the end
#

The JobType must be "Normal" and the attribute CpuNumber must be defined (16 in this example). Despite the name of the attribute, this attribute defines the number of CPUs required by the job. It is not possible to request more complicated topologies based on nodes and CPUs.

This example uses the OpenMPI implementation of the MPI-2 standard. The other supported implementations can be selected by changing OPENMPI (in two places) to the name of the desired implementation. The other names are "LAM", "MPICH", and "MPICH2". The JobType attribute must be "Normal" in all cases; it selects for an MPI job in general and not the specific implementation.

All of the files for the above example JDL file have been defined except for the actual MPI program. This is a simple Hello World example written in C. The code is:

/*  hello.c
 *
 *  Simple "Hello World" program in MPI.
 *
 */

#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[]) {

  int numprocs;  /* Number of processors */
  int procnum;   /* Processor number */

  /* Initialize MPI */
  MPI_Init(&argc, &argv);

  /* Find this processor number */
  MPI_Comm_rank(MPI_COMM_WORLD, &procnum);

  /* Find the number of processors */
  MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
  printf ("Hello world! from processor %d out of %d\n", procnum, numprocs);

  /* Shut down MPI */
  MPI_Finalize();
  return 0;
}

It is highly recommended to compile the MPI locally. Many compilation options are specific to the software installed or hardware installed on a site. Sending a binary file risks sub-optimal performance at best and crashes at worst.

Running the MPI job

Running the MPI job is no different from any other grid job. Use the commands glite-wms-job-submit, glite-wms-job-status, and glite-wms-job-output to submit, check the status, and recover the output of a job.

If the job ran correctly, then the standard output should contain something like the following:

Hello world! from processor 15 out of 16
Hello world! from processor 0 out of 16
Hello world! from processor 1 out of 16
Hello world! from processor 7 out of 16
Hello world! from processor 2 out of 16
Hello world! from processor 3 out of 16
Hello world! from processor 4 out of 16
Hello world! from processor 6 out of 16
Hello world! from processor 8 out of 16
Hello world! from processor 9 out of 16
Hello world! from processor 12 out of 16
Hello world! from processor 5 out of 16
Hello world! from processor 10 out of 16
Hello world! from processor 14 out of 16
Hello world! from processor 11 out of 16
Hello world! from processor 13 out of 16

If there are problems running the job and the standard output and error do not contain enough information, setting the mpi-start debug flag in the wrapper script may help.