MPI - Message Passing Interface

This document is intended to briefly describe what MPI is, and what implementations are available on the DAMTP Linux computers.

What is this MPI thing?

MPI provides an Application Programming Interface (API) to perform communication between co-operating processes. This is commonly used to write portable parallel programs which can run on a wide range of systems - from highly advanced shared memory super-computers (e.g. COSMOS), to high-performance clusters (e.g. Darwin), to loosely connected clusters of workstations - or even a on a single desktop.

MPI implementations vary in how they actually perform the communication. Such details depend, for example, on the available hardware for interconnections and so can differ greatly in performance. Despite these differences a program written to use the MPI APIs should be able to be moved between implementations simply by re-compiling.

Please note that a particular system will often have a preferred MPI implementation, optimised for its particular hardware; it is generally not a good idea to compile your own and try to use it (please see the system's local documentation). Furthermore, one should always recompile a code natively when using a new system.

Note: MPI implementations provide wrappers round the standard compilers which will add extra arguments e.g. to pull in the MPI libraries, set special compiler options and generally hide tedious implementation-specific details. So for example when compiling FORTRAN-90 MPI code it is usual to use the command:

   mpif90

rather than directly using f90, gfortran or ifort or similar. Compiling and linking MPI programs should only be done with the MPI wrappers or things will not work properly.

As long as code is written to only use the documented MPI API (and being careful to avoid any implementation problems/bugs) the same source code should be portable between a tiny desktop system and a high performance system with many thousands of CPUs. Obviously structuring your computation in such a way that it can make use of huge numbers of CPUs can be quite difficult!

For more information about MPI concepts see the MPI-forum documentation and the Wikipedia MPI article.

What MPI implementations do we currently have?

In DAMTP we used to only provide the LAM MPI implementation, but in recent installations of Scientific Linux we have OpenMPI instead.

OpenMPI is in many ways the direct descendant of LAM, but is a new implementation which includes support for the MPI-2 specification which offers a number of extra features over the classic MPI-1.2 API.

In sl55 we provide OpenMPI 1.4 (see DAMTP news item 1475 for some info about the date of that update). Code which was compiled against older versions of OpenMPI (e.g. from sl53 or earlier) may need to be recompiled to work with the newer version.

In the sl5 setup there is a tool mpi-selector which allows each user to pick a default MPI implementation to use - rather than there just being a system-wide default. They system defaults will be much the same as before ie openmpi-1.4-gcc-i386 on 32-bit systems and openmpi-1.4-gcc-x86_64 on 64-bit machines, but you can override that so that the MPI commands used come from any available MPI implementation. Run the command:

    mpi-selector --list

to see the available implementations.

What about MPICH?

MPICH was one of the first MPI implementations and became widely adopted by a number of high performance systems. It is still commonly used on systems which have very high performance dedicated interconnect between nodes.

MPICH can be compiled with support for a large number of different types of interconnect - what MPICH refers to as the device layer.

MPICH has two major versions - the old MPICH-1 and the newer MPICH-2. MPICH-1 has not been updated for some time since the developers are mostly working on MPICH-2. The current version of MPICH-1 is 1.2.7p1 and was released on November 4th, 2005. As you may expect from the name MPICH-2 supports MPI-2 but is (currently) less common on high performance systems.

Starting with sl51 we have been providing an implementation of MPICH-1, configured to to be as close as possible to the one provided on the Cambridge HPC Service (formerly known as the HPCF, CC-HPCF).

In order to be as compatible as possible we are using MPICH version 1.2.7p1 compiled with version 9.1 of the Intel compilers. We don't have any of the high-performance interconnects so we are using the ch_p4 device layer configured with the comm=shared option to give good performance when communication is between CPUs on the same host. We don't want to replace the standard OpenMPI so this MPICH is installed in a non-standard location.

We have arranged to allow mpi-selector to pick these MPI implementations as well as the openmpi based ones.

Note that the recommended and supported implementation of MPI on the HPC Service is Qlogic (formerly Infinipath) MPI. We can't provide this, however it is based on, and should be compatible with, MPICH-1.2.7 which we can provide. Qlogic MPI is therefore an implementation of MPI-1.2, but it also contains a small subset of MPI-2 features; if you are interested in a full MPI-2 implementation, you will probably want to try OpenMPI (currently available experimentally on Darwin).

If that all sounds like gibberish then the simple bullet points to remember are:

  • MPICH is not on the standard PATH
  • this is using MPICH 1.2.7p1 compiled with ifort9.1
  • ch_p4 will use ssh for communication between hosts, this will be very slow over our network!
  • these can be picked using the mpi-selector tool
  • it will communicate using shared-memory (ie fast) when the CPUs are on the same host
  • on other systems you will need to check local policies and procedures for compiling and running jobs.

Because this MPICH is fairly similar to the one offered on the HPC Service it should be easier to test simple/small cases on local machines (e.g. for developing/debugging the code), before recompiling it on the HPC Service for handling larger calculations.

So how do I use it?

The HPC Service systems are all running the x86_64 version of Linux. Not all of our machines are currently set up as x86_64 - some of the older machines don't have new enough hardware, and those installed before summer 2009 will usually have been installed with the 32-bit system since that was our default.

Therefore we are providing two versions, one built for i386/32-bit hardware and the other built for x86_64/64-bit hardware. The 32-bit version will be installed on all the sl5 machines, while the 64-bit version will only be available on machines which are installed with x86_64 sl5.

Arch/size Path to installation
i386, 32-bit /usr/local/mpich-1.2.7p1-intel-i386
x86_64, 64-bit /usr/local/mpich-1.2.7p1-intel64-x86_64

So for example on a 32-bit machine you may simply want to put /usr/local/mpich-1.2.7p1-intel-i386/bin early on your PATH and then mpif90, mpirun (etc) will use our 32-bit MPICH-1.2.7p1 setup. Similarly on a x86_64 64-bit machine you could add /usr/local/mpich-1.2.7p1-intel64-x86_64/bin/ instead and get the 64-bit version.

With mpi-selector setting the PATH etc is somewhat easier.

Our build of the MPICH code installs (on each machine) a machines.LINUX file listing just that machine and specifying how many CPUs it has. This greatly simplifies running small codes entirely on one machine (with the faster communication). For jobs to be run over several machines (warning: ch_p4 over ssh will be very slow) you will need to construct a suitable machines.LINUX file. See section 4.14.1 of the MPICH documentation (/usr/local/mpich-1.2.7p1-intel.../doc/mpichman-chp4.pdf page 37 onwards) or the online html mpich ch_p4 manual.

The standard machines.LINUX file can be found in the /usr/local/mpich-1.2.7p1-intel.../share/machines.LINUX file. e.g. on a 32-bit system it contains something like:

$ cat /usr/local/mpich-1.2.7p1-intel-i386/share/machines.LINUX 
# Change this file to contain the machines that you want to use
# to run MPI jobs on.  The format is one host name per line, with either
#    hostname
# or 
#    hostname:n
# where n is the number of processors in an SMP.  The hostname should
# be the same as the result from the command "hostname"
##
## Generated fragment during package postinstall
deluxe.damtp.cam.ac.uk:2
## End of fragment

That shows that the machine (deluxe) has two CPUs - in fact it is a Dell desktop Intel Core-2-Duo E6550 based system.

Tutorials/courses/examples

Tutorials covering MPI concepts, along with some examples:

How to get help

Please contact the help-desk for advice. [ We may need to refer you to some local MPI experts... ]