[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
- About Platform LSF HPC and the Intel® MPI Library
- Configuring LSF HPC to Work with Intel MPI
- Working with the Multi-purpose Daemon (MPD)
- Submitting Intel MPI Jobs
[ Top ]
About Platform LSF HPC and the Intel® MPI Library
The Intel® MPI Library ("Intel MPI") is a high-performance message-passing library for developing applications that can run on multiple cluster interconnects chosen by the user at runtime. It supports TCP, shared memory, and high-speed interconnects like InfiniBand and Myrinet.
Intel MPI supports all MPI-1 features and many MPI-2 features, including file I/O, generalized requests, and preliminary thread support. it is based on the MPICH2 specification.
The LSF HPC Intel® MPI integration is based on the LSF HPC generic PJL framework. It supports the LSF HPC task geometry feature.
Requirements
- Intel® MPI version 1.0.2 or later
You should upgrade all your hosts to the same version of Intel MPI.
Assumptions and limitations
- Intel MPI is installed and configured correctly
- When an Intel MPI job is killed, PAM reports exit status unknown
- When MPI tasks get killed, MPD automatically kills TaskStarter
- LSF host names must be the official host names recognized by the system
Glossary
Multi-Purpose Daemon (MPD) job startup mechanism
(Message Passing Interface) A message passing standard. It defines a message passing API useful for parallel and distributed applications.
A portable implementation of the MPI standard.
An MPI implementation for platforms such as clusters, SMPs, and massively parallel processors.
(Parallel Application Manager) The supervisor of any parallel job.
(Parallel Job Launcher) Any executable script or binary capable of starting parallel tasks on all hosts assigned for a parallel job.
(Remote Execution Server) An LSF daemon residing on each host. It monitors and manages all LSF tasks on the host.
(TaskStarter) An executable responsible for starting a task on the local host and reporting the process ID and host name to the PAM.
For more information
- See the Mathematics and Computer Science Division (MCS) of Argonne National Laboratory (ANL) MPICH Web pages:
www-unix.mcs.anl.gov/mpi/mpich/
for more information about MPICH.www-unix.mcs.anl.gov/mpi/mpich2/
for more information about MPICH2.- See the Intel Software Network > Software Products > Cluster Tools > Intel MPI Library at
www.intel.com
for more information about the Intel MPI Library.- See Getting Started with the Intel® MPI Library (
Getting_Started.pdf
in the Intel MPI installation documentation directory for more information about using the Intel MPI library and commands.Files installed by lsfinstall
During installation,
lsfinstall
copies these files to the following directories:
These files... Are installed to... TaskStarter
LSF_BINDIR
pam
LSF_BINDIR
esub.intelmpi
LSF_SERVERDIR
intelmpi_wrapper
LSF_BINDIR
mpirun.lsf
LSF_BINDIR
pjllib.sh
LSF_BINDIR
Resources and parameters configured by lsfinstall
- External resources in
lsf.shared
:Begin Resource RESOURCE_NAME TYPE INTERVAL INCREASING DESCRIPTION ... intelmpi Boolean () () (Intel MPI) ... End ResourcesThe
intelmpi
Boolean resource is used for mapping hosts with Intel MPI available.
You should add theintelmpi
resource name under the RESOURCES column of the Host section oflsf.cluster.
cluster_name.
- Parameter to
lsf.conf
:LSB_SUB_COMMANDNAME=y[ Top ]
Configuring LSF HPC to Work with Intel MPI
intelmpi_wrapper script
Modify the
intelmpi_wrapper
script inLSF_BINDIR
to set MPI_TOPDIR The default value is:MPI_TOPDIR="/opt/intel/mpi/2.0"lsf.conf (optional)
To improve performance and scalability for large parallel jobs, tune the following parameters as described in Tuning PAM Scalability and Fault Tolerance:
The user's environment can override these.
[ Top ]
Working with the Multi-purpose Daemon (MPD)
The Intel® MPI Library ("Intel MPI") uses a Multi-Purpose Daemon (MPD) job startup mechanism. MPD daemons must be up and running on the hosts where an MPI job is supposed to start before
mpiexec
is started.How Platform LSF HPC manages MPD rings
LSF HPC manages MPD rings for users automatically using
mpdboot
andmpdtrace
commands.Each MPI job running under LSF uses a uniquely labeled MPD ring. The ring is started by the
intelmpi_wrapper
during job launch and terminated by theintelmpi_wrapper
after MPI application exits, either normally or abnormally. This allows coexistence of multiple MPI jobs belonging to different users as well as multiple jobs from the same user on the same set of hosts.For more information
- See Getting Started with the Intel® MPI Library (
Getting_Started.pdf
) in the Intel MPI installation documentation directory for more information about using the Intel MPI library and commands- See Administering Platform LSF for information about using job starters
[ Top ]
Submitting Intel MPI Jobs
bsub command
Use
bsub -a intelmpi
to submit jobs.If the starting command is
mpd
, you must submit your Intel MPI jobs as exclusive jobs (bsub -x
).bsub -a intelmpi -n
number_cpusmpirun.lsf
[-pam "
pam_options"
] [mpi_options] job [job_options]
-a intelmpi
tellsesub
the job is an Intel MPI job and invokesesub.intelmpi
.-n
number_cpus specifies the number of processors required to run the job
mpirun.lsf
reads the environment variable LSF_PJL_TYPE=intelmpi set byesub.
intelmpi
, and generates the appropriatepam
command line to invoke Intel MPI as the PJLFor example:
% bsub -a
intelmpi-n 3 mpirun.lsf /examples/cpi
A job named
cpi
will be dispatched and run on 3 CPUs in parallel.Task geometry with Intel MPI jobs
Intel MPI supports the LSF HPC task geometry feature
Submitting a job with a job script
A wrapper script is often used to call Intel MPI. You can submit a job using a job script as an embedded script or directly as a job, for example:
% bsub -a
intelmpi-n 4 < embedded_jobscript
% bsub -a intelmpi-n 4
jobscriptYour job script must use
mpirun.lsf
in place of thempirun
command.Using Intel MPI configuration files (-configfile)
All
mpiexec -configfile
options are supported.-configfile
should be the only option after thempiexec
command.The placement options in the configuration file (
-gn
,-gnp
,-n
,-np
,-host
) must agree with the value of the LSB_MCPU_HOSTS and LSB_HOSTS environment variables.mpiexec limitations
The
-file
option ofmpiexec
is not supported. You can use the-configfile
option.If you submit an Intel MPI job with
-file
, theintelmpi_wrapper
will exit and fail the job. If you specify the log file forintelmpi_wrapper
, an error message is appended to the log file:
mpiexec
requires host names as they are returned by thehostname
command or thegethostname()
system call. For example:%hostname
hosta %mpiexec -l -n 2 -host hosta.domain.com ./hmpi
mpdrun: unable to start all procs; may have invalid machine names remaining specified hosts: hosta.domain.com %mpiexec -l -n 2 -host hosta ./hmpi
0: myrank 0, n_processes 2 1: myrank 1, n_processes 2 0: From process 1: Slave process 1!The
-genvlist
options does not work if the configuration file for-configfile
has more than one entry.For more information
- See Running Parallel Jobs for information about generic PJL wrapper script components
- See the Platform LSF Command Reference for information about the
bsub
command- See Administering Platform LSF for information about submitting jobs with job scripts
[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: August 20, 2009
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.