Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

MAN03 MPI-Start Installation and Configuration

From EGIWiki
Revision as of 10:40, 8 March 2011 by Enolfc (talk | contribs)
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


DISCLAIMER: This manual obsoletes the previos specify version maintained at specify link

Title MPI-Start Installation and Configuration
Document link https://wiki.egi.eu/wiki/MAN03_MPI-Start_Installation_and_Configuration
Last review Tferrari 08:23, 7 March 2011 (UTC)
Policy Group Acronym OMB
Policy Group Name Operations Management Board
Contact Person I. Campos
Document Status APPROVED
Approved Date specify
Procedure Statement This manual provides information on MPI-Start Installation and Configuration.

UNDER CONSTRUCTION!

MPI-Start Installation and Configuration

This document is intended to help EGI site administrators to properly support MPI deployments using MPI-Start.


Installation of MPI implementation

In order to execute MPI jobs, the site must support one of the multiple MPI implementations available. Most extended are Open MPI and MPICH2. OS distributions provide ready to use packages that fit most use cases. SL5 provides the following packages:

  • openmpi and openmpi-devel for Open MPI.
  • mpich2 and mpich2-devel for MPICH2.

Installation of devel packages for the MPI implementation is recommended, this will allow users to compile their applications at the site.

Open MPI and Torque/PBS integration

Tight scheduler integration allows Open MPI to start the processes in the worker nodes using the native batch system utilities, thus providing better process control and accounting. SL5 packages already include support for SGE with the openmpi and openmpi-devel rpms. After Open MPI is installed, you should see one component named gridengine in the ompi_info output:

$ ompi_info | grep gridengine
                MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.4)

Check Open MPI FAQ for more information.

In the case of Torque/PBS in SL5 you will need to compile the packages for your site. The Open MPI FAQ includes instructions for doing so. You can adapt the SL5 packages to support Torque/PBS following these steps:

  • Download and install Open MPI source rpm from [1]
$ rpm -Uvh http://ftp2.scientificlinux.org/linux/scientific/5x/SRPMS/vendor/openmpi-1.4-4.el5.src.rpm
Retrieving http://ftp2.scientificlinux.org/linux/scientific/5x/SRPMS/vendor/openmpi-1.4-4.el5.src.rpm
warning: /var/tmp/rpm-xfer.DAMscP: Header V3 DSA signature: NOKEY, key ID 192a7d7d
   1:openmpi                warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
########################################### [100%]
warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
  • Modify the spec file to include Torque/PBS support:
--- openmpi.spec        2010-03-31 23:18:20.000000000 +0200
+++ openmpi.spec        2011-03-07 18:37:11.000000000 +0100
@@ -114,6 +114,7 @@
 ./configure --prefix=%{_libdir}/%{mpidir} --with-libnuma=/usr \
        --with-openib=/usr --enable-mpirun-prefix-by-default \
        --mandir=%{_libdir}/%{mpidir}/man %{?with_valgrind} \
+        --with-tm \
        --enable-openib-ibcm --with-sge \
        CC=%{opt_cc} CXX=%{opt_cxx} \
        LDFLAGS='-Wl,-z,noexecstack' \
  • Install Torque/PBS development libraries:
$ yum install libtorque-devel
  • Build the RPMs
$ rpmbuild -ba /usr/src/redhat/SPECS/openmpi.spec
  • Install the resulting RPMs:
$ yum localinstall -nogpgcheck /usr/src/redhat/RPMS/x86_64/openmpi-*
  • Check that the support for Torque/PBS is enabled:
$ /usr/lib64/openmpi/1.4-gcc/bin/ompi_info | grep tm
              MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4)
                 MCA ras: tm (MCA v2.0, API v2.0, Component v1.4)
                 MCA plm: tm (MCA v2.0, API v2.0, Component v1.4)

Configuration of batch system

Here you can find the instructions to manually configure different batch systems to execute MPI jobs. There is a yaim module that can perform automatic configuration for PBS/Torque schedulers.

Torque/PBS

Edit (create it if it does note exist) your torque configuration file (/var/spool/pbs/torque.cfg) and add a line containing:

SUBMITFILTER /var/spool/pbs/submit_filter.pl

Then download the submit_filter.pl from here and put it in the above location.

This filter modifies the script coming from the submission, rewriting the -l nodes=XX option with specific requests, based on the information given by pbsnodes -a command.

The submit filter is crucial. Failing to use the submit filter translates in the job being submitted to only one node, where all the MPI processes are allocated too, instead of distributing the job across several nodes.

Warning: glite updates tend to rewrite torque.cfg. Check that the submit filter line is still there after performing an update

Maui

Edit your configuration file (usually under /var/spool/maui/maui.cfg) and check that it contains the following lines:

ENABLEMULTINODEJOBS TRUE

ENABLEMULTIREQJOBS TRUE

These parameter allows a job to span to more than one node and to specify multiple independent resource requests.

SGE

UNDER CONSTRUCTION

MPI-Start installation

MPI-Start is the recommended solution to hide the implementation details for the submitted jobs. It was developed inside the int.eu.grid project and now its development is continued in the EMI project. Official packages for latests versions will be available from EMI repository in the next months, previous versions can be downloaded from here. It should be installed on every worker node involved with MPI. It might be installed also in user interface machines for testing purposes.

MPI-Start configuration

Worker Node Environment

Information System