pam

Parallel Application Manager – job starter for MPI applications

HP-UX vendor MPI syntax

bsub pam -mpi mpirun [mpirun_options ] mpi_app [argument ...]

SGI vendor MPI syntax

bsub pam [-n num_tasks ] -mpi -auto_place mpi_app [argument ...]

Generic PJL framework syntax

bsub pam [-t] [-v] [-n num_tasks ] -g [num_args] pjl_wrapper [pjl_options] mpi_app [argument ...] pam [-h] pam [-V]

Description

The Parallel Application Manager (PAM) is the point of control for Platform LSF. PAM is fully integrated with Platform LSF to interface the user application with LSF. PAM acts as the supervisor of a parallel LSF job.

MPI jobs started by pam can only be submitted through the LSF Batch system. PAM cannot be used interactively to start parallel jobs. sbatchd starts PAM on the first execution host.

For all parallel application processes (tasks), PAM:
  • Uses a vendor MPI library or an MPI Parallel Job Launcher (PJL); for example, mpirun, poe start a parallel job on a specified set of hosts in an LSF cluster.

  • PAM contacts RES on each execution host allocated to the parallel job.

  • PAM queries RES periodically to collect resource usage for each parallel task and passes control signals through RES to all process groups and individual running tasks, and cleans up tasks as needed.

  • Passes job-level resource usage and process IDs (PIDs and PGIDs) to sbatchd for enforcement

  • Collects resource usage information and exit status upon termination

Task startup for vendor MPI jobs

The pam command starts a vendor MPI job on a specified set of hosts in a LSF cluster. Using pam to start an MPI job requires the underlying MPI system to be LSF aware, using a vendor MPI implementation that supports LSF (SGI IRIX vendor MPI or HP-UX vendor MPI).

PAM uses the vendor MPI library to spawn the child processes needed for the parallel tasks that make up your MPI application. It starts these tasks on the systems allocated by LSF. The allocation includes the number of execution hosts needed, and the number of child processes needed on each host.

Task startup for LSF HPC generic PJL jobs

For parallel jobs submitted with bsub:
  • PAM invokes the PJL, which in turn invokes the TaskStarter (TS).

  • TS starts the tasks on each execution host, reports the process ID to PAM, and waits for the task to finish.

Two environment variables allow you to run scripts or binaries before or after PAM is invoked. These are useful if you customize mpirun.lsf and have job scripts that call mpirun.lsf more than once.
  • $MPIRUN_LSF_PRE_EXEC: Runs before PAM is invoked.

  • $MPIRUN_LSF_POST_EXEC: Runs after PAM is invoked.

Options for vendor MPI jobs

-auto_place

The -auto_place option on the pam command line tells the SGI IRIX mpirun library to launch the MPI application according to the resources allocated by LSF.

-mpi

In the SGI environment, the -mpi option on the bsub and pam command line is equivalent to the mpirun command.

On HP-UX, you can have LSF manage the allocation of hosts to achieve better resource utilization by coordinating the start-up phase with mpirun. This is done by preceding the regular HP MPI mpirun command with:

bsub pam -mpi

For HP-UX vendor MPI jobs, the -mpi option must be the first option of the pam command.

For example, to run a single-host job and have LSF select the host, the command:

mpirun -np 14 a.out

is entered as:

bsub pam -mpi mpirun -np 14 a.out

-n num_tasks

The number of processors required to run the parallel application, typically the same as the number of parallel tasks in the job. If the host is a multiprocessor, one host can start several tasks.

You can use both bsub -n and pam -n in the same job submission. The number specified in the pam -n option should be less than or equal to the number specified by bsub -n. If the number of tasks specified with pam -n is greater than the number specified by bsub -n, the pam -n is ignored.

For example, on SGI IRIX or SGI Altix, you can specify:

bsub -n 5 pam -n 2 -mpi -auto_place a.out

Here, the job requests 5 processors, but PAM only starts 2 parallel tasks.

mpi_app [argument ...]

The name of the MPI application to be run on the listed hosts. This must be the last argument on the command line.

-h

Prints command usage to stderr and exit.

-V

Prints LSF release version to stderr and exit.

Options for LSF HPC generic PJL jobs

-t

This option tells pam not to print out the MPI job tasks summary report to the standard output. By default, the summary report prints out the task ID, the host on which it was executed, the command that was executed, the exit status, and the termination time.

-v

Verbose mode. Displays the name of the execution host or hosts.

-g [num_args] pjl_wrapper [pjl_options]
The -g option is required to use the LSF generic PJL framework. You must specify all the other pam options before -g.
num_args

Specifies how many space-separated arguments in the command line are related to the PJL (after that, the remaining section of the command line is assumed to be related to the binary application that launches the parallel tasks).

pjl_wrapper

The name of the PJL

pjl_options

Optional arguments to the PJL

For example:
  • A PJL named no_arg_pjl takes no options, so num_args=1. The syntax is:
    pam [pam_options] -g 1 no_arg_pjl job [job_options]
  • A PJL is named 3_arg_pjl and takes the options -a, -b, and group_name, so num_args=4. The syntax is:
    pam [pam_options] -g 4 3_arg_pjl -a -b group_name job [job_options]
-n num_tasks

The number of processors required to run the MPI application, typically the number of parallel tasks in the job. If the host is a multiprocessor, one host can start several tasks.

You can use both bsub -n and pam -n in the same job submission. The number specified in the pam -n option should be less than or equal to the number specified by bsub -n. If the number of tasks specified with pam -n is greater than the number specified by bsub -n, the pam -n is ignored.

mpi_app [argument ...]

The name of the MPI application to be run on the listed hosts. This must be the last argument on the command line.

-h

Prints command usage to stderr and exit.

-V

Prints LSF release version to stderr and exit.

Exit Status

pam exits with the exit status of mpirun or the PJL wrapper.

See also

bsub(1)