Knowledge Center Contents Previous Next Index |
Upgrading Platform LSF HPC
Contents
Upgrade Platform LSF HPC
You can install Platform LSF HPC on UNIX or Linux hosts. Refer to the installation guide for more information.
When you run lsfinstall for Platform LSF HPC, a number of changes are made for you automatically.
A number of shared resources are added to
lsf.shared
that are required by LSF HPC. When you upgrade Platform LSF HPC, you should add the appropriate resource names under the RESOURCES column of the Host section oflsf.cluster.
cluster_name
.What lsfinstall does
- Installs Platform LSF HPC binary and configuration files
- Installs the LSF HPC license file
- Automatically configures the following files:
lsb.hosts
For the default host, lsfinstall enables "!" in the MXJ column of the HOSTS section of
lsb.hosts
. For example:Begin Host HOST_NAME MXJ r1m pg ls tmp DISPATCH_WINDOW # Keywords #hostA () 3.5/4.5 15/ 12/15 0 () # Example default ! () () () () () HPPA11 ! () () () () () #pset host End Hostlsb.modules
- Adds the external scheduler plugin module names to the PluginModule section of
lsb.modules
:Begin PluginModule SCH_PLUGIN RB_PLUGIN SCH_DISABLE_PHASES schmod_default () () schmod_fcfs () () schmod_fairshare () () schmod_limit () () schmod_reserve () () schmod_preemption () () schmod_advrsv () () ... schmod_cpuset () () schmod_pset () () schmod_rms () () schmod_crayx1 () () schmod_crayxt3 () () End PluginModule
note:
The LSF HPC plugin names must be configured after the standard LSF plugin names in the PluginModule list.lsb.resources
For IBM POE jobs, lsfinstall configures the ReservationUsage section in lsb.resources to reserve HPS resources on a per-slot basis.
Resource usage defined in the ReservationUsage section overrides the cluster-wide RESOURCE_RESERVE_PER_SLOT parameter defined in lsb.params if it also exists.
Begin ReservationUsag RESOURCE METHOD adapter_windows PER_SLOT ntbl_windows PER_SLOT csss PER_SLOT css0 PER_SLOT End ReservationUsagelsb.queues
- Configures hpc_ibm queue for IBM POE jobs and the hpc_ibm_tv queue for debugging IBM POE jobs through Etnus TotalView®:
Begin Queue QUEUE_NAME = hpc_ibm PRIORITY = 30 NICE = 20 # ... RES_REQ = select[ poe > 0 ] EXCLUSIVE = Y REQUEUE_EXIT_VALUES = 133 134 135 DESCRIPTION = Platform HPC 7 for IBM. This queue is to run POE jobs ONLY. End Queue
Begin Queue QUEUE_NAME = hpc_ibm_tv PRIORITY = 30 NICE = 20 # ... RES_REQ = select[ poe > 0 ] REQUEUE_EXIT_VALUES = 133 134 135 TERMINATE_WHEN = LOAD PREEMPT WINDOW RERUNNABLE = NO INTERACTIVE = NO DESCRIPTION = Platform HPC 7 for IBM TotalView debug queue. This queue is to run POE jobs ONLY. End Queue
- Configures hpc_linux queue for LAM/MPI and MPICH-GM jobs and hpc_linux_tv queue for debugging LAM/MPI and MPICH-GM jobs through Etnus TotalView®:
Begin Queue QUEUE_NAME = hpc_linux PRIORITY = 30 NICE = 20 # ... DESCRIPTION = Platform HPC 7 for linux. End Queue
Begin Queue QUEUE_NAME = hpc_linux_tv PRIORITY = 30 NICE = 20 # ... TERMINATE_WHEN = LOAD PREEMPT WINDOW RERUNNABLE = NO INTERACTIVE = NO DESCRIPTION = Platform HPC 7 for linux TotalView Debug queue. End QueueBy default, LSF sends a SIGUSR2 signal to terminate a job that has reached its run limit or deadline. Since LAM/MPI does not respond to the SIGUSR2 signal, you should configure the hpc_linux queue with a custom job termination action specified by the JOB_CONTROLS parameter.
- Configures rms queue for RMS jobs running in LSF HPC for LinuxQsNet.
Begin Queue QUEUE_NAME = rms PJOB_LIMIT = 1 PRIORITY = 30 NICE = 20 STACKLIMIT = 5256 DEFAULT_EXTSCHED = RMS[RMS_SNODE] # LSF will using this scheduling policy if # -extsched is not defined. # MANDATORY_EXTSCHED = RMS[RMS_SNODE] # LSF enforces this scheduling policy RES_REQ = select[rms==1] DESCRIPTION = Run RMS jobs only on hosts that have resource 'rms' defined End Queue
tip:
To make the one of the LSF queues the default queue, setDEFAULT_QUEUE
inlsb.params
.Use the
bqueues -l
command to view the queue configuration details. Before using LSF HPC, see the Platform LSF Configuration Reference to understand queue configuration parameters in lsb.queues.lsf.cluster.
cluster_name
- Removes
lsf_data
andlsf_parallel
from the PRODUCTS line oflsf.cluster.
cluster_name
if they are already there.- For IBM POE jobs, configures the ResourceMap section of
lsf.cluster.
cluster_name
to map the following shared resources for POE jobs to all hosts in the cluster:Begin ResourceMap RESOURCENAME LOCATION adapter_windows [default] ntbl_windows [default] poe [default] dedicated_tasks (0@[default]) ip_tasks (0@[default]) us_tasks (0@[default]) End ResourceMaplsf.conf
LSB_SUB_COMMANDNAME=Y
tolsf.conf
to enable theLSF_SUB_COMMANDLINE
environment variable required by esub.LSF_ENABLE_EXTSCHEDULER=Y
- LSF uses an external scheduler for topology-aware external scheduling.
LSB_CPUSET_BESTCPUS=Y
- LSF schedules jobs based on the shortest CPU radius in the processor topology using a best-fit algorithm for SGI cpuset allocation.
tip:
LSF_IRIX_BESTCPUS is obsolete.
- On SGI IRIX and SGI Altix hosts, sets the full path to the SGI vendor MPI library
libxmpi.so
:
- On SGI IRIX:
LSF_VPLUGIN="/usr/lib32/libxmpi.so"
- On SGI Altix:
LSF_VPLUGIN="/usr/lib/libxmpi.so"
- You can specify multiple paths for
LSF_VPLUGIN
, separated by colons (:). For example, the following configures both/usr/lib32/libxmpi.so
for SGI IRIX, and/usr/lib/libxmpi.so
for SGI IRIX:LSF_VPLUGIN="/usr/lib32/libxmpi.so:/usr/lib/libxmpi.so"
- On HP-UX hosts, sets the full path to the HP vendor MPI library libmpirm.sl
LSF_VPLUGIN="/opt/mpi/lib/pa1.1/libmpirm.sl"
- LSB_RLA_PORT=
port_number
- Where
port_number
is the TCP port used for communication between the Platform LSF HPC topology adapter (RLA) and sbatchd.- The default port number is 6883.
LSB_SHORT_HOSTLIST=1
- Displays an abbreviated list of hosts in bjobs and bhist for a parallel job where multiple processes of a job are running on a host. Multiple processes are displayed in the following format:
- processes*hostA
lsf.shared
Defines the following shared resources required by LSF HPC in
lsf.shared
:Begin Resource RESOURCENAME TYPE INTERVAL INCREASING DESCRIPTION # Keywords rms Boolean () () (RMS) pset Boolean () () (PSET) slurm Boolean () () (SLURM) cpuset Boolean () () (CPUSET) mpich_gm Boolean () () (MPICH GM MPI) lammpi Boolean () () (LAM MPI) mpichp4 Boolean () () (MPICH P4 MPI) mvapich Boolean () () (Infiniband MPI) sca_mpimon Boolean () () (SCALI MPI) ibmmpi Boolean () () (IBM POE MPI) hpmpi Boolean () () (HP MPI) sgimpi Boolean () () (SGI MPI) intelmpi Boolean () () (Intel MPI) crayxt3 Boolean () () (Cray XT3 MPI) crayx1 Boolean () () (Cray X1 MPI) fluent Boolean () () (fluent availability) ls_dyna Boolean () () (ls_dyna availability) nastran Boolean () () (nastran availability) pvm Boolean () () (pvm availability) openmp Boolean () () (openmp availability) ansys Boolean () () (ansys availability) blast Boolean () () (blast availability) gaussian Boolean () () (gaussian availability) lion Boolean () () (lion availability) scitegic Boolean () () (scitegic availability) schroedinger Boolean () () (schroedinger availability) hmmer Boolean () () (hmmer availability) adapter_windows Numeric 30 N (free adapter windows on css0 on IBM SP) ntbl_windows Numeric 30 N (free ntbl windows on IBM HPS) poe Numeric 30 N (poe availability) css0 Numeric 30 N (free adapter windows on css0 on IBM SP) csss Numeric 30 N (free adapter windows on csss on IBM SP) dedicated_tasks Numeric () Y (running dedicated tasks) ip_tasks Numeric () Y (running IP tasks) us_tasks Numeric () Y (running US tasks) End Resource
tip:
You should add the appropriate resource names under the RESOURCES column of the Host section oflsf.cluster.
cluster_name
.
Platform Computing Inc.
www.platform.com |
Knowledge Center Contents Previous Next Index |