MAN03 MPI-Start Installation and Configuration
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
DISCLAIMER: This manual obsoletes the previos specify version maintained at specify link
Title | MPI-Start Installation and Configuration |
Document link | https://wiki.egi.eu/wiki/MAN03_MPI-Start_Installation_and_Configuration |
Last review | Tferrari 08:23, 7 March 2011 (UTC) |
Policy Group Acronym | OMB |
Policy Group Name | Operations Management Board |
Contact Person | I. Campos |
Document Status | APPROVED |
Approved Date | specify |
Procedure Statement | This manual provides information on MPI-Start Installation and Configuration. |
UNDER CONSTRUCTION!
MPI-Start Installation and Configuration
This document is intended to help EGI site administrators to properly support MPI deployments using MPI-Start.
Installation of MPI implementation
In order to execute MPI jobs, the site must support one of the multiple MPI implementations available. Most extended are Open MPI and MPICH2. OS distributions provide ready to use packages that fit most use cases. SL5 provides the following packages:
- openmpi and openmpi-devel for Open MPI.
- mpich2 and mpich2-devel for MPICH2.
Installation of devel packages for the MPI implementation is recommended, this will allow users to compile their applications at the site.
Open MPI and Torque/PBS integration
Tight scheduler integration allows Open MPI to start the processes in the worker nodes using the native batch system utilities, thus providing better process control and accounting. SL5 packages already include support for SGE with the openmpi and openmpi-devel rpms. After Open MPI is installed, you should see one component named gridengine in the ompi_info output:
$ ompi_info | grep gridengine MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.4)
Check Open MPI FAQ for more information.
In the case of Torque/PBS in SL5 you will need to compile the packages for your site. The Open MPI FAQ includes instructions for doing so. You can adapt the SL5 packages to support Torque/PBS following these steps:
- Download and install Open MPI source rpm from [1]
$ rpm -Uvh http://ftp2.scientificlinux.org/linux/scientific/5x/SRPMS/vendor/openmpi-1.4-4.el5.src.rpm Retrieving http://ftp2.scientificlinux.org/linux/scientific/5x/SRPMS/vendor/openmpi-1.4-4.el5.src.rpm warning: /var/tmp/rpm-xfer.DAMscP: Header V3 DSA signature: NOKEY, key ID 192a7d7d 1:openmpi warning: user mockbuild does not exist - using root warning: group mockbuild does not exist - using root ########################################### [100%] warning: user mockbuild does not exist - using root warning: group mockbuild does not exist - using root warning: user mockbuild does not exist - using root warning: group mockbuild does not exist - using root warning: user mockbuild does not exist - using root warning: group mockbuild does not exist - using root warning: user mockbuild does not exist - using root warning: group mockbuild does not exist - using root warning: user mockbuild does not exist - using root warning: group mockbuild does not exist - using root warning: user mockbuild does not exist - using root warning: group mockbuild does not exist - using root
- Modify the spec file to include Torque/PBS support:
--- openmpi.spec 2010-03-31 23:18:20.000000000 +0200 +++ openmpi.spec 2011-03-07 18:37:11.000000000 +0100 @@ -114,6 +114,7 @@ ./configure --prefix=%{_libdir}/%{mpidir} --with-libnuma=/usr \ --with-openib=/usr --enable-mpirun-prefix-by-default \ --mandir=%{_libdir}/%{mpidir}/man %{?with_valgrind} \ + --with-tm \ --enable-openib-ibcm --with-sge \ CC=%{opt_cc} CXX=%{opt_cxx} \ LDFLAGS='-Wl,-z,noexecstack' \
- Install Torque/PBS development libraries:
$ yum install libtorque-devel
- Build the RPMs
$ rpmbuild -ba /usr/src/redhat/SPECS/openmpi.spec
- Install the resulting RPMs:
$ yum localinstall -nogpgcheck /usr/src/redhat/RPMS/x86_64/openmpi-*
- Check that the support for Torque/PBS is enabled:
$ /usr/lib64/openmpi/1.4-gcc/bin/ompi_info | grep tm MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4) MCA ras: tm (MCA v2.0, API v2.0, Component v1.4) MCA plm: tm (MCA v2.0, API v2.0, Component v1.4)
Configuration of batch system
Here you can find the instructions to manually configure different batch systems to execute MPI jobs. There is a yaim module that can perform automatic configuration for PBS/Torque schedulers.
Torque/PBS
Edit (create it if it does note exist) your torque configuration file (/var/spool/pbs/torque.cfg) and add a line containing:
SUBMITFILTER /var/spool/pbs/submit_filter.pl
Then download the submit_filter.pl from here and put it in the above location.
This filter modifies the script coming from the submission, rewriting the -l nodes=XX option with specific requests, based on the information given by pbsnodes -a command.
The submit filter is crucial. Failing to use the submit filter translates in the job being submitted to only one node, where all the MPI processes are allocated too, instead of distributing the job across several nodes.
Warning: glite updates tend to rewrite torque.cfg. Check that the submit filter line is still there after performing an update
Maui
Edit your configuration file (usually under /var/spool/maui/maui.cfg) and check that it contains the following lines:
ENABLEMULTINODEJOBS TRUE ENABLEMULTIREQJOBS TRUE
These parameter allows a job to span to more than one node and to specify multiple independent resource requests.
SGE
UNDER CONSTRUCTION
MPI-Start installation
MPI-Start is the recommended solution to hide the implementation details for the submitted jobs. It was developed inside the int.eu.grid project and now its development is continued in the EMI project. Official packages for latests versions will be available from EMI repository in the next months, previous versions can be downloaded from here. It should be installed on every worker node involved with MPI. It might be installed also in user interface machines for testing purposes.