Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

MPI User Guide

From EGIWiki
Jump to navigation Jump to search

Introduction

This document is intended to help EGI user community to execute MPI applications on the European Grid Infrastructure. The document has been prepared by the EGI-InSPIRE project. Please provide feedback to Enol Fernández del Castillo (enolfc_AT_ifca_DOT_unican_DOT_es).

Many of the sites that are involved in the European Grid Infrastructure support various MPI implementations, such as: OpenMPI, MPICH 1/2, OpenMP. Site administrators may deploy any version of any MPI implementation to serve the needs of the site users.

Execution of MPI applications requires sites that properly support the submission and execution of parallel applications and the availability of a MPI implementation. Site administrators should check the gLite MPI v.1.0.0 release notes of EMI 1 Kebnekaise release or MPI-START v1.0.4 manual with the relevant information about the configuration of sites. Since not all of them have this support enabled, special tags e.g 'MPI-START|OpenMPI|MPICH2' are published via the information system to allow users to discover which are the sites that can be used for their executions. Sites may also install different implementations (or flavors) of MPI. It is important therefore that users can use the information system to locate sites with the software they require.

(information about the MPI wrapper allowing MPI jobs to be executed on two or more nodes having different locations on a different grid sites )

Current status of MPI supporting sites

The Service Availability Monitoring infrastructure of EGI monitors the status and correct configuration of MPI sites and suspend erronuous sites if necessary. This monitoring system tests MPI-START wrapper and it's supported MPI flavours by Nagios probes, standalone MPI implementations are not tested if the site doesn't support MPI-START wrapper. You can always check latest monitoring data for the sites at central MyEGI webpage.

The execution of parallel application does not only require the middleware support for such jobs, it also needs a correct configuration of the sites where the jobs are actually run. In order to assure the correct execution of these applications, monitoring probes that check the proper support for such jobs are available.

The monitoring probes are executed at all the sites that publish the MPI-START tag in their information system and consist in the following steps:

  1. Assure that MPI-Start is actually available.
  2. Check of the information published by the site. This first step inspects the announced MPI flavors supports and selects the probes that will be run in the next steps.
  3. For each of the supported MPI flavors, submit a job to the site requesting 2 processes that is compiled from source using the MPI-Start hooks. The probe checks that the number of processes used by the application was really the requested number.

Although the probes request a low number of slots (2), they allow the early detection of basic problems. These probes are flagged as critical, thus any failure may cause the site to be suspended from the infrastructure.

Executing MPI applications with MPI-Start

MPI-Start is the recommended way of starting MPI jobs in the infrastructure. The MPI-START v1.0.4 User Guide contains a complete description on how to run MPI jobs in general and in the Grid. Documentation is focused on gLite resources, although MPI-Start can be used with ARC and UNICORE if installed and configured by the site administrator.

Examples can be found also at the tutorials materials prepared by the EGI-InSPIRE SA3 Support for parallel computing (MPI) task:

There are many ways to access and use MPI software on the Grid and most of the configurations are done by site administrators to assist users and allow flexible and fast way of running their MPI applications one of such software is MPI-START wrapper which is part of EMI middleware.

Sites supporting MPI-Start must publish the proper tags in the information system e.g. BDII attribute 'GlueHostApplicationSoftwareRunTimeEnvironment' should have the tag MPI-START.

  • Supported MPI Implementations:
    • OpenMP (basic)
    • OpenMPI
    • MPICH2
    • MPICH
    • LAM-MPI
    • PACX-MPI
  • File automatic distribution (non shared file systems)
  • Compiler automatic discovery

More about MPI-START design please read here.

Discovery of suitable sites

Discovery of resources is the first step that needs to be accomplished before the execution of applications. This can be done by using the GlueHostApplicationSoftwareRunTimeEnvironment attribute, which should include all the relevant MPI support information that allow users to locate the sites with the adequate software environment. The following sections describe the tags that site may publish

For each MPI implemenation supported, sites must publish a variable with the name of the MPI flavour that has been installed and tested. The supported flavours are: MPICH for MPICH, MPICH2 for MPICH2 , LAM for LA-MPI and OPENMPI for Open MPI. Most commonly supported flavours are Open MPI and MPICH2

Example:

  • GlueHostApplicationSoftwareRunTimeEnvironment: MPICH2
  • GlueHostApplicationSoftwareRunTimeEnvironment: OPENMPI

More specific version and compiler information can be also defined by the sites using variables with the form:

<MPI flavour>-<MPI version> or <MPI flavour>-<MPI version>-<Compiler>

These are not mandatory, although they should be published to allow users with special requirements to locate specific versions of MPI software. Users should assume gcc compiler suite is used if no other value is specified.

Examples:

  • GlueHostApplicationSoftwareRunTimeEnvironment: OPENMPI-1.4.2
  • GlueHostApplicationSoftwareRunTimeEnvironment: MPICH-1.2.7
  • GlueHostApplicationSoftwareRunTimeEnvironment: OPENMPI-1.3.7-ICC

Network Interconnections

Sites may publish the network interconnect available for the execution of MPI applications with a variable with the form:

MPI-<interconnect>

Currently the valid interconnects are: Ethernet, Infiniband, SCI, and Myrinet.

Examples

  • GlueHostApplicationSoftwareRunTimeEnvironment: MPI-Infiniband


Shared Homes

Sites supporting a shared filesystem for the execution of MPI applications publish the MPI_SHARED_HOME variable. If your application needs such feature, you should check the availability of that variable.

Sample Queries

If you are submitting your jobs through the gLite WMS, you should include in the Requirements expression the tags you want the site to support. The following example shows the requirements expression for a job that needs Open MPI and Infiniband and uses MPI-Start for execution:

Requirements  = member("MPI-START", other.GlueHostApplicationSoftwareRunTimeEnvironment)
                && member("OPENMPI", other.GlueHostApplicationSoftwareRunTimeEnvironment)
                && member("MPI-INFINIBAND", other.GlueHostApplicationSoftwareRunTimeEnvironment);

The lcg-info command can be used to perform similar queries.

  • Sites in ops.vo.ibergrid.eu VO that support MPICH2:
$ lcg-info --vo ops.vo.ibergrid.eu --list-ce --query 'Tag=MPICH2' 
- CE: ce02.ific.uv.es:8443/cream-pbs-infbandShort
  • Sites in biomed VO that support Open MPI v1.4.4:
$ lcg-info --vo biomed --list-ce --query 'Tag=OPENMPI-1.4.4' 
- CE: ce01.kallisto.hellasgrid.gr:2119/jobmanager-pbs-biomed
- CE: cream01.kallisto.hellasgrid.gr:8443/cream-pbs-biomed
- CE: egeece01.ifca.es:2119/jobmanager-sge-biomed
- CE: egeece02.ifca.es:2119/jobmanager-sge-biomed
- CE: egeece03.ifca.es:2119/jobmanager-sge-biomed
- CE: gridce01.ifca.es:8443/cream-sge-biomed
- CE: gridce02.ifca.es:8443/cream-sge-biomed
- CE: ngiescream.i3m.upv.es:8443/cream-pbs-biomed
  • Number of sites that support MPI in biomed VO:
$ lcg-info –vo biomed –list-ce –query 'Tag=*MPI*' –sed | wc
100
  • Sites with Infiniband interconnect in biomed VO:
$ lcg-info --vo biomed --list-ce --query 'Tag=MPI-INFINIBAND' 
- CE: ce.reef.man.poznan.pl:2119/jobmanager-pbs-biomed
- CE: ce002.ipp.acad.bg:2119/jobmanager-pbs-biomed
- CE: cr1.ipp.acad.bg:8443/cream-pbs-biomed
- CE: creamce.reef.man.poznan.pl:8443/cream-pbs-biomed
  • Sites with MPI-Start, Open MPI and Ethernet interconnect in biomed VO:
$ lcg-info --vo biomed --list-ce --query 'Tag=OPENMPI,Tag=MPI-START,Tag=MPI-ETHERNET' 
- CE: glite.univ.kiev.ua:2119/jobmanager-pbs-grid
- CE: glite.univ.kiev.ua:8443/cream-pbs-grid
- CE: grid-lab-ce.ii.edu.mk:2119/jobmanager-pbs-biomed

Job execution

Please read the MPI-START use with a Grid middleware here

Please read about submitting simple MPI Job to a Grid here

Advanced (Hooks Framework)

Please read the MPI-START User guide, section "Hooks" here.

Also please read about Hooks Framework here

Known issues

  • MPI-START wrapper version discovery: support for tagging the version of MPI-START in the information system. Currently only MPI-START tag is published, but version may be obtained only by sending grid job and performing discovery manually. User which is following the official manual may hit to the not supported command line parameters by the currently installed MPI-START wrapper version. Look for reference RT ticket click here

Future plans

EMI-1 release includes MPI-START v1.0.4. Please red more about the release here.

Included new MPI features like:

JDL attributes - WholeNodes, HostNumber and SMPGranularity

Please have a look at related feature requests:

https://savannah.cern.ch/bugs/?77096 https://savannah.cern.ch/bugs/?76971 https://savannah.cern.ch/bugs/?58878