[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
LSF HPC makes use of HP-UX processor sets (psets) to create an efficient execution environment that allows a mix of users and jobs to coexist in the HP Superdome cell- based architecture.
[ Top ]
About HP-UX Psets
HP-UX processor sets (psets) are available as an optional software product for HP-UX 11i Superdome multiprocessor systems. A pset is a set of active processors group for the exclusive access of the application assigned to the set. A pset manages processor resources among applications and users.
The operating system restricts applications to run only on the processors in their assigned psets. Processes bound to a pset can only run on the CPUs belonging to that pset, so applications assigned to different psets do not contend for processor resources.
A newly created pset initially has no processors assigned to it.
Dynamic application binding
Each running application in the system is bound to some pset, which defines the processors that the application can run on.
A pset defines a scheduling allocation domain that restricts applications to run only on the processors in its assigned pset.
At system startup, the HP-UX system is automatically configured with one system default pset to which all enabled processors are assigned. Processor 0 is always assigned to the default pset. All users in the system can access the default pset.
See the HP-UX 11i system administration documentation for information about defining and managing psets.
How LSF HPC uses psets
On HP-UX 11i Superdome multiprocessor systems, psets can be created and deallocated dynamically out of available machine resources. The pset provides processor isolation, so that a job requiring a specific number of CPUs only run on those CPUs.
Processor distance is a value used to measure how fast the process running on one processor access local memory of another processor. The bigger the value is, the slower memory access is. For example, the processor distance of two processes within one cell is less than that of two processes between cells.
When creating a pset for the job, LSF uses a best-fit algorithm for pset allocation to choose processors as close as possible to each other. LSF attempts to choose the set of processors with the smallest processor distance.
LSF HPC makes use of HP-UX processor sets (psets) to create an efficient execution environment that allows a mix of users and jobs to coexist in the HP Superdome cell- based architecture.
When a job is submitted, LSF:
- Chooses the best CPUs based on job resource requirements (number of processors requested and pset topology)
- Creates a pset for the job. The operating system assigns a unique pset identifier (pset ID) to it.
LSF has no control over the pset ID assigned to a newly created pset.
- Places the job processes in the pset when the job starts running
After the job finishes, LSF destroys the pset. If no host meets the CPU requirements, the job remains pending until processors become available to allocate the pset.
CPU 0 in the default pset 0 is always considered last for a job, and cannot be taken out of pset 0, since all system processes are running on it. LSF cannot create a pset with CPU 0; it only uses the default pset if it cannot create a pset without CPU 0.
RLA runs on each HP-UX11i host. It is started and monitored by
sbatchd
. RLA provides services for external clients, including pset scheduler plugin andsbatchd
to:
- Allocate and deallocate job psets
- Get the job pset ID
- Suspend a pset when job is suspended, and reassign all CPUs within pset back to pset 0
- Resume a pset, and before a job is resumed, assign all original CPUs back to the job pset
- Get pset topology information, cells, CPUs, and processor distance between cells.
- Get updated free CPU map
- Get job resource map
RLA maintains a status file in the directory defined by LSB_RLA_WORKDIR in
lsf.conf
, which keeps track of job pset allocation information. When RLA starts, it reads the status file and recovers the current status.Assumptions and limitations
User-level account and system account mapping are not supported. If a user account does not exist on the remote host, LSF cannot create a pset for it.
Jobs running in a pset cannot be resized.
By default, job start time is not accurately predicted for pset jobs with topology options, so the forecast start time shown by
bjobs -l
is optimistic. LSF HPC may incorrectly indicate that the job can start at a certain time, when it actually cannot start until some time after the indicated time.For a more accuration start-time estimate, you should configure time-based slot reservation. With time-based reservation, a set of pending jobs will get future allocation and estimated start time.
See Administering Platform LSF for more information about time-based slot reservation.
Jobs submitted to a chunk job queue are not chunked together, but run outside of a pset as a normal LSF job.
- When LSF HPC selects pset jobs to preempt, specialized preemption preferences, such as MINI_JOB and LEAST_RUN_TIME in the PREEMPT_FOR parameter in
lsb.params
, and others are ignored when slot preemption is required.- Preemptable queue preference is not supported.
When a job is suspended with
bstop
, all CPUs in the pset are released and reassigned back to the default pset (pset 0). Before resuming the job LSF reallocates the pset and rebinds all job processes to the job pset.Job pre-execution programs run within the job pset, since they are part of the job. Post- execution programs run outside of the job pset.
[ Top ]
Configuring LSF HPC with HP-UX Psets
Automatic configuration at installation
During installation,
lsfinstall
adds theschmod_pset
external scheduler plugin module name to the PluginModule section oflsb.modules
:Begin PluginModule SCH_PLUGIN RB_PLUGIN SCH_DISABLE_PHASES schmod_default () () schmod_fcfs () () schmod_fairshare () () schmod_limit () () schmod_preemption () () ... schmod_pset () () End PluginModule
The schmod_pset plugin name must be configured after the standard LSF plugin names in the PluginModule list.
See the Platform LSF Configuration Reference for more information about
lsb.modules
.During installation,
lsfinstall
sets the following parameters inlsf.conf
:
- On HP-UX hosts, sets the full path to the HP vendor MPI library
libmpirm.sl
.LSF_VPLUGIN="/opt/mpi/lib/pa1.1/libmpirm.sl"- On Linux hosts running HP MPI, sets the full path to the HP vendor MPI library
libmpirm.so
.For example, if HP MPI is installed in
/opt/hpmpi
:LSF_VPLUGIN="/opt/hpmpi/lib/linux_ia32/libmpirm.so"- LSF_ENABLE_EXTSCHEDULER=Y
LSF uses an external scheduler for pset allocation.
- LSB_RLA_PORT=port_number
Where port_number is the TCP port used for communication between the LSF HPC topology adapter (RLA) and
sbatchd
.The default port number is 6883.
- LSB_SHORT_HOSTLIST=1
Displays an abbreviated list of hosts in
bjobs
andbhist
for a parallel job where multiple processes of a job are running on a host. Multiple processes are displayed in the following format:processes*hostADuring installation, the Boolean resource
pset
is defined inlsf.shared
:Begin Resource RESOURCENAME TYPE INTERVAL INCREASING DESCRIPTION ... pset Boolean () () (PSET) ... End Resource
You should add the pset resource name under the RESOURCES column of the Host section oflsf.cluster.
cluster_name. Hosts without the pset resource specified are not considered for scheduling pset jobs.
For each pset host,
lsfinstall
enables "!" in the MXJ column of the HOSTS section oflsb.hosts
for the HPPA11 host type.For example:
Begin Host HOST_NAME MXJ r1m pg ls tmp DISPATCH_WINDOW # Keywords #hostA () 3.5/4.5 15/ 12/15 0 () # Example default ! () () () () () HPPA11 ! () () () () () #pset host End Hostlsf.cluster.cluster_name
For each pset host,
hostsetup
adds thepset
Boolean resource to the HOST section oflsf.cluster.
cluster_name.Configuring default and mandatory pset options
Use the DEFAULT_EXTSCHED and MANDATORY_EXTSCHED queue paramters in
lsb.queues
to configure default and mandatory pset options.DEFAULT_EXTSCHED=PSET[topology]
where topology is:
[CELLS=
num_cells |PTILE=
cpus_per_cell] [;CELL_LIST=
cell_list]Specifies default pset topology scheduling options for the queue.
-extsched
options on thebsub
command override any conflicting queue-level options set by DEFAULT_EXTSCHED.For example, if the queue specifies:
DEFAULT_EXTSCHED=PSET[PTILE=2]and a job is submitted with no topology requirements requesting 6 CPUs (
bsub -n 6
), a pset is allocated using 3 cells with 2 CPUs in each cell.If the job is submitted:
bsub -n 6 -ext "PSET[PTILE=3]" myjobThe pset option in the command overrides the DEFAULT_EXTSCHED, so a pset is allocated using 2 cells with 3 CPUs in each cell.
MANDATORY_EXTSCHED=PSET[topology]
Specifies mandatory pset topology scheduling options for the queue.
MANDATORY_EXTSCHED options override any conflicting job-level options set by
-extsched
options on thebsub
command.For example, if the queue specifies:
MANDATORY_EXTSCHED=PSET[CELLS=2]and a job is submitted with no topology requirements requesting 6 CPUs (
bsub n 6
), a pset is allocated using 2 cells with 3 CPUs in each cell.If the job is submitted:
bsub -n 6 -ext "PSET[CELLS=3]" myjobMANDATORY_EXTSCHED overrides the pset option in the command, so a pset is allocated using 2 cells with 3 CPUs in each cell.
Use the CELL_LIST option in MANDATORY_EXTSCHED to restrict the cells available for allocation to pset jobs. For example, if the queue specifies:
MANDATORY_EXTSCHED=PSET[CELL_LIST=1-7]job psets can only use cells 1 to 7; cell 0 is not used for pset jobs.
[ Top ]
Using LSF HPC with HP-UX Psets
Specifying pset topology options
To specify processor topology scheduling policy options for pset jobs, use:
- The
-extsched
option ofbsub
.
You can abbreviate the -extsched option to -ext.
- DEFAULT_EXTSCHED or MANDATORY_EXTSCHED, or both, in the queue definition (
lsb.queues
).If LSB_PSET_BIND_DEFAULT is set in
lsf.conf
, and no pset options are specified for the job, Platform LSF HPC binds the job to the default pset 0. If LSB_PSET_BIND_DEFAULT is not set, Platform LSF HPC must still attach the job to a pset, and so binds the job to the same pset being used by the Platform LSF HPC daemons.For more information about job operations, see Administering Platform LSF.
For more information about
bsub
, see the Platform LSF Command Reference.-ext
[sched
]"PSET[
topology]"
where topology is:
[CELLS=
num_cells |PTILE=
cpus_per_cell][;CELL_LIST=
cell_list]
CELLS
=num_cellsDefines the exact number of cells the LSF job requires. For example, if
CELLS=4
, and the job requests 6 processors (bsub -n 6
) on a 4-CPU/cell HP Superdome system with no other jobs running, the pset uses 4 cells, and the allocation is 2, 2, 1, 1 on each cell. If LSF cannot satisfy the CELLS request, the job remains pending.If CELLS is greater than 1 and you specify minimum and maximum processors (for example,
bsub -n 2,8
), only the minimum is used.To enforce job processes to run within one cell, use
"PSET[CELLS=1]"
.PTILE=
cpus_per_cellDefines the exact number of processors allocated on each cell up to the maximum for the system. For example, if
PTILE=2
, and the job requests 6 processors (bsub -n 6
) on a 4-CPU/cell HP Superdome system with no other jobs running, the pset spreads across 3 cells instead of 2 cells, and the allocation is 2, 2, 2 on each cell.The value for
-n
and the PTILE value must be divisible by the same number. If LSF cannot satisfy the PTILE request, the job remains pending. For example:bsub -n 5 -ext "PSET[PTILE=3] ...is incorrect.
To enforce jobs to run on the cells that no others jobs are running on, use
"PSET[PTILE=4]"
on 4 CPU/cell system.
You can specify either one CELLS or one PTILE option in the same PSET[] option, not both.
CELL_LIST=
min_cell_ID[-
max_cell_ID][,
min_cell_ID[-
max_cell_ID] ...]The LSF job uses only cells specified in the specified cell list to allocate the pset. For example, if
CELL_LIST=1,2
, and the job requests 8 processors (bsub -n 8
) on a 4-CPU/cell HP Superdome system with no other jobs running, the pset uses cells 1 and 2, and the allocation is 4 CPUs on each cell. If LSF cannot satisfy the CELL_LIST request, the job remains pending.If CELL_LIST is defined in DEFAULT_EXTSCHED in the queue, and you do not want to specify a cell list for your job, use the CELL_LIST keyword with no value. For example, if
DEFAULT_EXTSCHED=PSET[CELL_LIST=1-8]
, and you do not want to specify a cell list, use-ext "PSET[CELL_LIST=]"
.Priority of topology scheduling options
The options set by
-extsched
can be combined with the queue-level MANDATORY_EXTSCHED or DEFAULT_EXTSCHED parameters. If-extsched
and MANDATORY_EXTSCHED set the same option, the MANDATORY_EXTSCHED setting is used. If-extsched
and DEFAULT_EXTSCHED set the same options, the-extsched
setting is used.topology scheduling options are applied in the following priority order of level from highest to lowest:
- Queue-level MANDATORY_EXTSCHED options override ...
- Job level
-ext
options, which override ...- Queue-level DEFAULT_EXTSCHED options
For example, if the queue specifies:
DEFAULT_EXTSCHED=PSET[CELLS=2]and the job is submitted with:
bsub -n 4 -ext "PSET[PTILE=1]" myjobThe pset option in the job submission overrides the DEFAULT_EXTSCHED, so the job will run in a pset allocated using 4 cells, honoring the job-level PTILE option.
If the queue specifies:
MANDATORY_EXTSCHED=PSET[CELLS=2]
and the job is submitted with:
bsub -n 4 -ext "PSET[PTILE=1]" myjobThe job will run on 2 cells honoring the cells option in MANDATORY_EXTSCHED.
Partitioning the system for specific jobs (CELL_LIST)
Use the
bsub -ext "PSET[CELL_LIST=
cell_list]"
option to partition a large Superdome machine. Instead of allocating CPUs from the entire machine, LSF creates a pset containing only the cells specified in the cell list.Non-existent cells are ignored during scheduling, but the job can be dispatched as long as enough cells are available to satisfy the job requirements. For example, in a cluster with both 32-CPU and 64-CPU machines and a cell list specification CELL_LIST=1-15, jobs can use cells 1-7 on the 32-CPU machine, and cells 1-15 on the 64-CPU machine.
You can use CELL_LIST with the PSET[CELLS=num_cells] option. The number of requested cells in the cell list must be less than or equal to the number of cells in the CELLS option; otherwise, the job remains pending.
You can use CELL_LIST with the PSET[PTILE=cpus_per_cell] option. The PTILE option allows the job pset to spread across several cells. The number of required cells equals the number of requested processors divided by the PTILE value. The resulting number of cells must be less than or equal to the number of cells in the cell list; otherwise, the job remains pending.
For example, the following is a correct specification:
bsub -n 8 -ext "PSET[PTILE=2;CELL_LIST=1-4]" myjobThe job requests 8 CPUs spread over 4 cells (8/2=4), which is equal to the 4 cells requested in the CELL_LIST option.
Viewing pset allocations for jobs
After a pset job starts to run, use
bjobs -l
to display the job pset ID. For example, if LSF creates pset 23 onhostA
for job 329,bjobs
shows:bjobs -l 329
Job <329>, User <user1>, Project <default>, Status <RUN>, Queue <normal>, Ext sched <PSET[]>, Command <sleep 60> Thu Jan 22 12:04:31: Submitted from host <hostA>, CWD <$HOME>, 2 Processors Requested; Thu Jan 22 12:04:38: Started on 2 Hosts/Processors <2*hostA>, Execution Home </home/user1>, Execution CWD </home/user1>; Thu Jan 22 12:04:38: psetid=hostA:23; Thu Jan 22 12:04:39: Resource usage collected. MEM: 1 Mbytes; SWAP: 2 Mbytes; NTHREAD: 1 PGID: 18440; PIDs: 18440 SCHEDULING PARAMETERS: r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - - - loadStop - - - - - - - - - - - EXTERNAL MESSAGES: MSG_ID FROM POST_TIME MESSAGE ATTACHMENT 0 - - - - 1 user1 Jan 22 12:04 PSET[]
The pset ID string for bjobs does not change after the job is dispatched.
Use
bhist
to display historical information about pset jobs:bhist -l 329
Job <329>, User <user1>, Project <default>, Extsched <PSET[]>, Command <sleep 60> Thu Jan 22 12:04:31: Submitted from host <hostA>, to Queue <normal>, CWD <$H OME>, 2 Processors Requested; Thu Jan 22 12:04:38: Dispatched to 2 Hosts/Processors <2*hostA>; Thu Jan 22 12:04:38: psetid=hostA:23; Thu Jan 22 12:04:39: Starting (Pid 18440); Thu Jan 22 12:04:39: Running with execution home </home/user1>, Execution CWD </home/user1>, Execution Pid <18440>; Thu Jan 22 12:05:39: Done successfully. The CPU time used is 0.1 seconds; Thu Jan 22 12:05:40: Post job process done successfully; Summary of time in seconds spent in various states by Thu Jan 22 12:05:40 PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL 7 0 61 0 0 0 68Use
bacct
to display accounting information about pset jobs:bacct -l 329
Accounting information about jobs that are: - submitted by all users. - accounted on all projects. - completed normally or exited - executed on all hosts. - submitted to all queues. - accounted on all service classes. ------------------------------------------------------------------------------ Job <331>, User <user1>, Project <default>, Status <DONE>, Queue <normal>, Co mmand <sleep 60> Thu Jan 22 18:23:14: Submitted from host <hostA>, CWD <$HOME>; Thu Jan 22 18:23:23: Dispatched to <hostA>; Thu Jan 22 18:23:23: psetid=hostA:23; Thu Jan 22 18:24:24: Completed <done>. Accounting information about this job: CPU_T WAIT TURNAROUND STATUS HOG_FACTOR MEM SWAP 0.12 9 70 done 0.0017 1M 2M ------------------------------------------------------------------------------ SUMMARY: ( time unit: second ) Total number of done jobs: 1 Total number of exited jobs: 0 Total CPU time consumed: 0.1 Average CPU time consumed: 0.1 Maximum CPU time of a job: 0.1 Minimum CPU time of a job: 0.1 Total wait time in queues: 9.0 Average wait time in queue: 9.0 Maximum wait time in queue: 9.0 Minimum wait time in queue: 9.0 Average turnaround time: 70 (seconds/job) Maximum turnaround time: 70 Minimum turnaround time: 70 Average hog factor of a job: 0.00 ( cpu time / turnaround time ) Maximum hog factor of a job: 0.00 Minimum hog factor of a job: 0.00Examples
The following examples assume a 4-CPU/cell HP Superdome system with no other jobs running:
- Submit a pset job without topology requirement:
bsub -n 8 -ext "PSET[]" myjobA pset containing 8 cpus is created for the job. According to default scheduler policy, these 8 cpus will come from 2 cells on a single host.
- Submit a pset job specifying 1 CPU per cell:
bsub -n 6 -ext "PSET[PTILE=1]" myjobA pset containing 6 processors is created for the job. The allocation uses 6 cells with 1 processor per cell.
- Submit a pset job specifying 4 cells:
bsub -n 6 -ext "PSET[CELLS=4]" myjobA pset containing 6 processors is created for the job. The allocation uses 4 cells: 2 cells with 2 processors and 2 cells with 1 processor.
- Submit a pset job with a range of CPUs and 3 CPUs per cell:
bsub -n 7,10 -ext "PSET[PTILE=3]" myjobA pset containing 9 processors is created for the job. The allocation uses 3 cells, with 3 CPUs each.
- Submit a pset job with a range of CPUs and 4 cells:
bsub -n 7,10 -ext "PSET[CELLS=4]" myjobA pset containing 7 processors is created for the job. The allocation uses 4 cells, 3 cells with 2 CPUs and 1 cell with 1 CPU:
- Submit a pset job with a range of CPUs and 1 cell:
bsub -n 2,4 -ext "PSET[CELLS=1]" myjobA pset containing 4 processors is created for the job. The allocation uses 1 cell with 4 CPUs.
- Submit a pset job requiring cells 1 and 2 with 4 CPUs per cell:
bsub -n 8 -ext"PSET[PTILE=4;CELL_LIST=1,2]" myjobA pset containing 8 processors is created for the job. The allocation uses cells 1 and 2, each with 4 CPUs.
- Submit a pset job requiring a specific range of 6 cells:
bsub -n 16 -ext "PSET[CELL_LIST=4-9]" myjobA pset containing 16 processors is created for the job. The allocation uses cells between 4 and 9.
- Submit a pset job requiring processors from two ranges of cells, separated by a comma:
bsub -n 16 -ext "PSET[CELL_LIST=1-5,8-15]" myjobA pset containing 16 processors is created for the job. The allocation uses processors from cells 1 through 5 and cells 8 through 15.
[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: August 20, 2009
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.