lsf.conf

The lsf.conf file controls the operation of LSF.

About lsf.conf

lsf.conf is created during installation and records all the settings chosen when LSF was installed. The lsf.conf file dictates the location of the specific configuration files and operation of individual servers and applications.

The lsf.conf file is used by LSF and applications built on top of it. For example, information in lsf.conf is used by LSF daemons and commands to locate other configuration files, executables, and network services. lsf.conf is updated, if necessary, when you upgrade to a new version.

This file can also be expanded to include application-specific parameters.

Parameters in this file can also be set as environment variables, except for the parameters related to job packs.

Corresponding parameters in ego.conf

When Platform EGO is enabled in LSF Version 7, you can configure some LSF parameters in lsf.conf that have corresponding Platform EGO parameter names in EGO_CONFDIR/ego.conf (LSF_CONFDIR/lsf.conf is a separate file from EGO_CONFDIR/ego.conf). If both the LSF and the EGO parameters are set in their respective files, the definition in ego.conf is used. You must continue to set LSF parameters only in lsf.conf.

When EGO is enabled in the LSF cluster (LSF_ENABLE_EGO=Y), you also can set the following EGO parameters related to LIM, PIM, and ELIM in either lsf.conf or ego.conf:
  • EGO_DISABLE_UNRESOLVABLE_HOST (dynamically added hosts only)

  • EGO_ENABLE_AUTO_DAEMON_SHUTDOWN

  • EGO_DAEMONS_CPUS

  • EGO_DEFINE_NCPUS

  • EGO_SLAVE_CTRL_REMOTE_HOST

  • EGO_WORKDIR

  • EGO_PIM_SWAP_REPORT

  • EGO_ESLIM_TIMEOUT

If EGO is not enabled, you do not need to set these parameters.

See Administering Platform LSF for more information about configuring LSF for EGO. See the Platform EGO Reference for information about ego.conf parameters.

Change lsf.conf configuration

Depending on the parameters you change in lsf.conf, you may need to run the following commands:
  • lsadmin reconfig to reconfigure LIM

  • badmin mbdrestart to restart mbatchd

  • badmin hrestart to restart sbatchd

If you have installed LSF in a mixed cluster, you must make sure that lsf.conf parameters set on UNIX and Linux match any corresponding parameters in the local lsf.conf files on your Windows hosts.

Location

The default location of lsf.conf is in $LSF_TOP/conf. This default location can be overridden when necessary by either the environment variable LSF_ENVDIR or the command line option -d available to some of the applications.

Format

Each entry in lsf.conf has one of the following forms:
NAME=VALUE
NAME=
NAME="STRING1 STRING2 ..."

The equal sign = must follow each NAME even if no value follows and there should be no space beside the equal sign.

A value that contains multiple strings separated by spaces must be enclosed in quotation marks.

Lines starting with a pound sign (#) are comments and are ignored. Do not use #if as this is reserved syntax for time-based configuration.

DAEMON_SHUTDOWN_DELAY

Syntax

DAEMON_SHUTDOWN_DELAY=time_in_seconds

Description

Applies when EGO_ENABLE_AUTO_DAEMON_SHUTDOWN=Y. Controls amount of time the slave LIM waits to communicate with other (RES and SBD) local daemons before exiting. Used to shorten or lengthen the time interval between a host attempting to join the cluster and, if it was unsuccessful, all of the local daemons shutting down.

The value should not be less than the minimum interval of RES and SBD housekeeping. Most administrators should set this value to somewhere between 3 minutes and 60 minutes.

Default

1800 seconds (30 minutes)

EGO_DEFINE_NCPUS

Syntax

EGO_DEFINE_NCPUS=procs | cores | threads

Description

If defined, enables an administrator to define a value other than the number of cores available. Follow one of the three equations below for an accurate value.

  • EGO_DEFINE_NCPUS=procs-ncpus=number of processors

  • EGO_DEFINE_NCPUS=cores-ncpus=number of processors x number of cores

  • EGO_DEFINE_NCPUS=threads-ncpus=number of processors x number of cores x number of threads.

Note:

When PARALLEL_SCHED_BY_SLOT=Y in lsb.params, the resource requirement string keyword ncpus refers to the number of slots instead of the number of CPUs, however lshosts output will continue to show ncpus as defined by EGO_DEFINE_NCPUS in lsf.conf.

Default

EGO_DEFINE_NCPUS=cores

EGO_ENABLE_AUTO_DAEMON_SHUTDOWN

Syntax

EGO_ENABLE_AUTO_DAEMON_SHUTDOWN="Y" | "N"

Description

For hosts that attempted to join the cluster but failed to communicate within the LSF_DYNAMIC_HOST_WAIT_TIME period, automatically shuts down any running daemons.

This parameter can be useful if an administrator remove machines from the cluster regularly (by editing lsf.cluster file) or when a host belonging to the cluster is imaged, but the new host should not be part of the cluster. An administrator no longer has to go to each host that is not a part of the cluster to shut down any running daemons.

Default

N (daemons continue to run on hosts that were not successfully added to the cluster)

EGO_PARAMETER

EGO_ENABLE_AUTO_DAEMON_SHUTDOWN

EGO_ESLIM_TIMEOUT

Syntax

EGO_ESLIM_TIMEOUT=time_seconds

Description

Controls how long the LIM waits for any external static LIM scripts to run. After the timeout period expires, the LIM stops the scripts.

Use the external static LIM to automatically detect the operating system type and version of hosts.

LSF automatically detects the operating systems types and versions and displays them when running lshosts -l or lshosts -s. You can then specify those types in any -R resource requriement string. For example, bsub -R "select[ostype=RHEL4.6]".

Default

10 seconds

EGO_PARAMETER

EGO_ESLIM_TIMEOUT

JOB_STARTER_EXTEND

Syntax

JOB_STARTER_EXTEND="preservestarter" | "preservestarter userstarter"

Description

Applies to Windows execution hosts only.

Allows you to use a job starter that includes symbols (for example: &&, |, ||). The job starter configured in JOB_STARTER_EXTEND can handle these special characters. The file $LSF_TOP/8.0/misc/examples/preservestarter.c is the only extended job starter created by default. Users can also develop their own extended job starters based on preservestarter.c.

You must also set JOB_STARTER=preservestarter in lsb.queues.

Default

Not defined.

LSB_API_CONNTIMEOUT

Syntax

LSB_API_CONNTIMEOUT=time_seconds

Description

The timeout in seconds when connecting to LSF.

Valid values

Any positive integer or zero

Default

10

See also

LSB_API_RECVTIMEOUT

LSB_API_RECVTIMEOUT

Syntax

LSB_API_RECVTIMEOUT=time_seconds

Description

Timeout in seconds when waiting for a reply from LSF.

Valid values

Any positive integer or zero

Default

10

See also

LSB_API_CONNTIMEOUT

LSB_API_VERBOSE

Syntax

LSB_API_VERBOSE=Y | N

Description

When LSB_API_VERBOSE=Y, LSF batch commands will display a retry error meesage to stderr when LIM is not available:
LSF daemon (LIM) not responding ... still trying

When LSB_API_VERBOSE=N, LSF batch commands will not display a retry error message when LIM is not available.

Default

Y. Retry message is displayed to stderr.

LSB_BJOBS_CONSISTENT_EXIT_CODE

Syntax

LSB_BJOBS_CONSISTENT_EXIT_CODE=Y | N

Description

When LSB_BJOBS_CONSISTENT_EXIT_CODE=Y, the bjobs command exits with 0 only when unfinished jobs are found, and 255 when no jobs are found, or a non-existent job ID is entered.

No jobs are running:
bjobs
No unfinished job found 
echo $?
255
Job 123 does not exist:
bjobs 123
Job <123> is not found
echo $?
255
Job 111 is running:
bjobs 111
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
111     user1   RUN   normal     hostA       hostB       myjob      Oct 22 09:22
echo $?
0
Job 111 is running, and job 123 does not exist:
bjobs 111 123
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
111     user1 RUN   normal     hostA hostB myjob Oct 22 09:22
Job <123> is not found
echo $?
255
Job 111 is finished:
bjobs 111
No unfinished job found 
echo $?
255

When LSB_BJOBS_CONSISTENT_EXIT_CODE=N, the bjobs command exits with 255 only when a non-existent job ID is entered. bjobs returns 0 when no jobs are found, all jobs are finished, or if at least one job ID is valid.

No jobs are running:
bjobs
No unfinished job found 
echo $?
0
Job 123 does not exist:
bjobs 123
Job <123> is not found
echo $?
0
Job 111 is running:
bjobs 111
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
111     user1 RUN   normal     hostA hostB myjob Oct 22 09:22
echo $?
0
Job 111 is running, and job 123 does not exist:
bjobs 111 123
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
111     user1 RUN   normal     hostA hostB myjob Oct 22 09:22
Job <123> is not found
echo $?
255
Job 111 is finished:
bjobs 111
No unfinished job found 
echo $?
0

Default

N.

LSB_BLOCK_JOBINFO_TIMEOUT

Syntax

LSB_BLOCK_JOBINFO_TIMEOUT=time_minutes

Description

Timeout in minutes for job information query commands (e.g., bjobs).

Valid values

Any positive integer

Default

Not defined (no timeout)

See also

MAX_JOBINFO_QUERY_PERIOD in lsb.params

LSB_BPEEK_REMOTE_OUTPUT

Syntax

LSB_BPEEK_REMOTE_OUTPUT=y|Y|n|N

Description

If disabled (set to N), the bpeek command attempts to retrieve the job output from the local host first. If that fails, bpeek attempts to retrieve the job output from the remote host instead.

If enabled (set to Y), it is the opposite. The bpeek command attempts to retrieve the job output from the remote host first, then the local host.

When attempting to retrieve the job output from the remote host, bpeek attempts to use RES first, then rsh. If neither is running on the remote host, the bpeek command cannot retrieve job output.

Best Practices

Three directories are related to the bpeek command:

  • the user’s home directory

  • JOB_SPOOL_DIR

  • the checkpoint directory

If these directories are on a shared file system, this parameter can be disabled.

If any of these directories are not on a shared file system, this parameter should be enabled, and either RES or rsh should be started on the remote job execution host.

Default

N

LSB_CHUNK_RUSAGE

Syntax

LSB_CHUNK_RUSAGE=y

Description

Applies only to chunk jobs. When set, sbatchd contacts PIM to retrieve resource usage information to enforce resource usage limits on chunk jobs.

By default, resource usage limits are not enforced for chunk jobs because chunk jobs are typically too short to allow LSF to collect resource usage.

If LSB_CHUNK_RUSAGE=Y is defined, limits may not be enforced for chunk jobs that take less than a minute to run.

Default

Not defined. No resource usage is collected for chunk jobs.

LSB_CMD_LOG_MASK

Syntax

LSB_CMD_LOG_MASK=log_level

Description

Specifies the logging level of error messages from LSF batch commands.

To specify the logging level of error messages for LSF commands, use LSF_CMD_LOG_MASK. To specify the logging level of error messages for LSF daemons, use LSF_LOG_MASK.

LSB_CMD_LOG_MASK sets the log level and is used in combination with LSB_DEBUG_CMD, which sets the log class for LSF batch commands. For example:
LSB_CMD_LOG_MASK=LOG_DEBUG LSB_DEBUG_CMD="LC_TRACE LC_EXEC" 

LSF commands log error messages in different levels so that you can choose to log all messages, or only log messages that are deemed critical. The level specified by LSB_CMD_LOG_MASK determines which messages are recorded and which are discarded. All messages logged at the specified level or higher are recorded, while lower level messages are discarded.

For debugging purposes, the level LOG_DEBUG contains the fewest number of debugging messages and is used for basic debugging. The level LOG_DEBUG3 records all debugging messages, and can cause log files to grow very large; it is not often used. Most debugging is done at the level LOG_DEBUG2.

The commands log to the syslog facility unless LSB_CMD_LOGDIR is set.

Valid values

The log levels from highest to lowest are:
  • LOG_EMERG

  • LOG_ALERT

  • LOG_CRIT

  • LOG_ERR

  • LOG_WARNING

  • LOG_NOTICE

  • LOG_INFO

  • LOG_DEBUG

  • LOG_DEBUG1

  • LOG_DEBUG2

  • LOG_DEBUG3

Default

LOG_WARNING

See also

LSB_CMD_LOGDIR, LSB_DEBUG, LSB_DEBUG_CMD, LSB_TIME_CMD, LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR, LSF_TIME_CMD

LSB_CMD_LOGDIR

Syntax

LSB_CMD_LOGDIR=path

Description

Specifies the path to the LSF command log files.

Default

/tmp

See also

LSB_CMD_LOGDIR, LSB_DEBUG, LSB_DEBUG_CMD, LSB_TIME_CMD, LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR, LSF_TIME_CMD

LSB_CPUSET_BESTCPUS

Syntax

LSB_CPUSET_BESTCPUS=y | Y

Description

If set, enables the best-fit algorithm for SGI cpusets

Default

Y (best-fit)

LSB_CONFDIR

Syntax

LSB_CONFDIR=path

Description

Specifies the path to the directory containing the LSF configuration files.

The configuration directories are installed under LSB_CONFDIR.

Configuration files for each cluster are stored in a subdirectory of LSB_CONFDIR. This subdirectory contains several files that define user and host lists, operation parameters, and queues.

All files and directories under LSB_CONFDIR must be readable from all hosts in the cluster. LSB_CONFDIR/cluster_name/configdir must be owned by the LSF administrator.

If live reconfiguration through the bconf command is enabled by the parameter LSF_LIVE_CONFDIR, configuration files are written to and read from the directory set by LSF_LIVE_CONFDIR.

Do not change this parameter after LSF has been installed.

Default

LSF_CONFDIR/lsbatch

See also

LSF_CONFDIR, LSF_LIVE_CONFDIR

LSB_CRDIR

Syntax

LSB_CRDIR=path

Description

Specifies the path and directory to the checkpointing executables on systems that support kernel-level checkpointing. LSB_CRDIR specifies the directory containing the chkpnt and restart utility programs that sbatchd uses to checkpoint or restart a job.

For example:
LSB_CRDIR=/usr/bin

If your platform supports kernel-level checkpointing, and if you want to use the utility programs provided for kernel-level checkpointing, set LSB_CRDIR to the location of the utility programs.

Default

Not defined. The system uses /bin.

LSB_DEBUG

Syntax

LSB_DEBUG=1 | 2

Description

Sets the LSF batch system to debug.

If defined, LSF runs in single user mode:
  • No security checking is performed

  • Daemons do not run as root

When LSB_DEBUG is defined, LSF does not look in the system services database for port numbers. Instead, it uses the port numbers defined by the parameters LSB_MBD_PORT/LSB_SBD_PORT in lsf.conf. If these parameters are not defined, it uses port number 40000 for mbatchd and port number 40001 for sbatchd.

You should always specify 1 for this parameter unless you are testing LSF.

Can also be defined from the command line.

Valid values

LSB_DEBUG=1

The LSF system runs in the background with no associated control terminal.

LSB_DEBUG=2

The LSF system runs in the foreground and prints error messages to tty.

Default

Not defined

See also

LSB_DEBUG, LSB_DEBUG_CMD, LSB_DEBUG_MBD, LSB_DEBUG_NQS, LSB_DEBUG_SBD, LSB_DEBUG_SCH, LSF_DEBUG_LIM, LSF_DEBUG_RES, LSF_LIM_PORT, LSF_RES_PORT, LSB_MBD_PORT, LSB_SBD_PORT, LSF_LOGDIR, LSF_LIM_DEBUG, LSF_RES_DEBUG

LSB_DEBUG_CMD

Syntax

LSB_DEBUG_CMD=log_class

Description

Sets the debugging log class for commands and APIs.

Specifies the log class filtering to be applied to LSF batch commands or the API. Only messages belonging to the specified log class are recorded.

LSB_DEBUG_CMD sets the log class and is used in combination with LSB_CMD_LOG_MASK, which sets the log level. For example:
LSB_CMD_LOG_MASK=LOG_DEBUG LSB_DEBUG_CMD="LC_TRACE LC_EXEC" 

Debugging is turned on when you define both parameters.

The daemons log to the syslog facility unless LSB_CMD_LOGDIR is defined.

To specify multiple log classes, use a space-separated list enclosed by quotation marks. For example:
LSB_DEBUG_CMD="LC_TRACE LC_EXEC"

Can also be defined from the command line.

Valid values

Valid log classes are:
  • LC_ADVRSV and LC2_ADVRSV: Log advance reservation modifications

  • LC_AFS and LC2_AFS: Log AFS messages

  • LC_AUTH and LC2_AUTH: Log authentication messages

  • LC_CHKPNT and LC2_CHKPNT: Log checkpointing messages

  • LC_COMM and LC2_COMM: Log communication messages

  • LC_DCE and LC2_DCE: Log messages pertaining to DCE support

  • LC_EEVENTD and LC2_EEVENTD: Log eeventd messages

  • LC_ELIM and LC2_ELIM: Log ELIM messages

  • LC_EXEC and LC2_EXEC: Log significant steps for job execution

  • LC_FAIR and LC2_FAIR: Log fairshare policy messages

  • LC_FILE and LC2_FILE: Log file transfer messages

  • LC_FLEX and LC2_FLEX: Log messages related to FlexNet

  • LC2_GUARANTEE: Log messages related to guarantee SLAs

  • LC_HANG and LC2_HANG: Mark where a program might hang

  • LC_JARRAY and LC2_JARRAY: Log job array messages

  • LC_JLIMIT and LC2_JLIMIT: Log job slot limit messages

  • LC_LICENSE and LC2_LICENSE : Log license management messages (LC_LICENCE is also supported for backward compatibility)

  • LC2_LIVECONF: Log live reconfiguration messages

  • LC_LOADINDX and LC2_LOADINDX: Log load index messages

  • LC_M_LOG and LC2_M_LOG: Log multievent logging messages

  • LC_MEMORY and LC2_MEMORY: Log messages related to MEMORY allocation

  • LC_MPI and LC2_MPI: Log MPI messages

  • LC_MULTI and LC2_MULTI: Log messages pertaining to MultiCluster

  • LC_PEND and LC2_PEND: Log messages related to job pending reasons

  • LC_PERFM and LC2_PERFM: Log performance messages

  • LC_PIM and LC2_PIM: Log PIM messages

  • LC_PREEMPT and LC2_PREEMPT: Log preemption policy messages

  • LC_RESOURCE and LC2_RESOURCE: Log messages related to resource broker

  • LC_RESREQ and LC2_RESREQ: Log resource requirement messages

  • LC_SCHED and LC2_SCHED: Log messages pertaining to the mbatchd scheduler.

  • LC_SIGNAL and LC2_SIGNAL: Log messages pertaining to signals

  • LC_SYS and LC2_SYS: Log system call messages

  • LC_TRACE and LC2_TRACE: Log significant program walk steps

  • LC_XDR and LC2_XDR: Log everything transferred by XDR

  • LC_XDRVERSION and LC2_XDRVERSION: Log messages for XDR version

Default

Not defined

See also

LSB_CMD_LOG_MASK, LSB_CMD_LOGDIR, LSB_DEBUG, LSB_DEBUG_MBD, LSB_DEBUG_NQS, LSB_DEBUG_SBD, LSB_DEBUG_SCH, LSF_DEBUG_LIM, LSF_DEBUG_RES, LSF_LIM_PORT, LSF_RES_PORT, LSB_MBD_PORT, LSB_SBD_PORT, LSF_LOGDIR, LSF_LIM_DEBUG, LSF_RES_DEBUG

LSB_DEBUG_MBD

Syntax

LSB_DEBUG_MBD=log_class

Description

Sets the debugging log class for mbatchd.

Specifies the log class filtering to be applied to mbatchd. Only messages belonging to the specified log class are recorded.

LSB_DEBUG_MBD sets the log class and is used in combination with LSF_LOG_MASK, which sets the log level. For example:
LSF_LOG_MASK=LOG_DEBUG LSB_DEBUG_MBD="LC_TRACE LC_EXEC"
To specify multiple log classes, use a space-separated list enclosed in quotation marks. For example:
LSB_DEBUG_MBD="LC_TRACE LC_EXEC"

You need to restart the daemons after setting LSB_DEBUG_MBD for your changes to take effect.

If you use the command badmin mbddebug to temporarily change this parameter without changing lsf.conf, you do not need to restart the daemons.

Valid values

Valid log classes are the same as for LSB_DEBUG_CMD except for the log class LC_ELIM, which cannot be used with LSB_DEBUG_MBD. See LSB_DEBUG_CMD.

Default

Not defined

See also

LSB_CMD_LOG_MASK, LSB_CMD_LOGDIR, LSB_DEBUG, LSB_DEBUG_MBD, LSB_DEBUG_NQS, LSB_DEBUG_SBD, LSB_DEBUG_SCH, LSF_DEBUG_LIM, LSF_DEBUG_RES, LSF_LIM_PORT, LSF_RES_PORT, LSB_MBD_PORT, LSB_SBD_PORT, LSF_LOGDIR, LSF_LIM_DEBUG, LSF_RES_DEBUG

LSB_DEBUG_NQS

Syntax

LSB_DEBUG_NQS=log_class

Description

Sets the log class for debugging the NQS interface.

Specifies the log class filtering to be applied to NQS. Only messages belonging to the specified log class are recorded.

LSB_DEBUG_NQS sets the log class and is used in combination with LSF_LOG_MASK, which sets the log level. For example:
LSF_LOG_MASK=LOG_DEBUG LSB_DEBUG_NQS="LC_TRACE LC_EXEC" 

Debugging is turned on when you define both parameters.

To specify multiple log classes, use a space-separated list enclosed in quotation marks. For example:
LSB_DEBUG_NQS="LC_TRACE LC_EXEC"

This parameter can also be defined from the command line.

Valid values

For a list of valid log classes, see LSB_DEBUG_CMD.

Default

Not defined

See also

LSB_DEBUG_CMD, LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR

LSB_DEBUG_SBD

Syntax

LSB_DEBUG_SBD=log_class

Description

Sets the debugging log class for sbatchd.

Specifies the log class filtering to be applied to sbatchd. Only messages belonging to the specified log class are recorded.

LSB_DEBUG_SBD sets the log class and is used in combination with LSF_LOG_MASK, which sets the log level. For example:
LSF_LOG_MASK=LOG_DEBUG LSB_DEBUG_SBD="LC_TRACE LC_EXEC" 
To specify multiple log classes, use a space-separated list enclosed in quotation marks. For example:
LSB_DEBUG_SBD="LC_TRACE LC_EXEC"

You need to restart the daemons after setting LSB_DEBUG_SBD for your changes to take effect.

If you use the command badmin sbddebug to temporarily change this parameter without changing lsf.conf, you do not need to restart the daemons.

Valid values

Valid log classes are the same as for LSB_DEBUG_CMD except for the log class LC_ELIM, which cannot be used with LSB_DEBUG_SBD. See LSB_DEBUG_CMD.

Default

Not defined

See also

LSB_DEBUG_MBD, LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR, badmin

LSB_DEBUG_SCH

Syntax

LSB_DEBUG_SCH=log_class

Description

Sets the debugging log class for mbschd.

Specifies the log class filtering to be applied to mbschd. Only messages belonging to the specified log class are recorded.

LSB_DEBUG_SCH sets the log class and is used in combination with LSF_LOG_MASK, which sets the log level. For example:
LSF_LOG_MASK=LOG_DEBUG LSB_DEBUG_SCH="LC_SCHED"
To specify multiple log classes, use a space-separated list enclosed in quotation marks. For example:
LSB_DEBUG_SCH="LC_SCHED LC_TRACE LC_EXEC"

You need to restart the daemons after setting LSB_DEBUG_SCH for your changes to take effect.

Valid values

Valid log classes are the same as for LSB_DEBUG_CMD except for the log class LC_ELIM, which cannot be used with LSB_DEBUG_SCH, and LC_HPC and LC_SCHED, which are only valid for LSB_DEBUG_SCH. See LSB_DEBUG_CMD.

Default

Not defined

See also

LSB_DEBUG_MBD, LSB_DEBUG_SBD, LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR, badmin

LSB_DISABLE_LIMLOCK_EXCL

Syntax

LSB_DISABLE_LIMLOCK_EXCL=y | n

Description

If preemptive scheduling is enabled, this parameter enables preemption of and preemption by exclusive jobs when PREEMPT_JOBTYPE=EXCLUSIVE in lsb.params. Changing this parameter requires a restart of all sbatchds in the cluster (badmin hrestart). Do not change this parameter while exclusive jobs are running.

When LSB_DISABLE_LIMLOCK_EXCL=y, for a host running an exclusive job:
  • LIM is not locked on a host running an exclusive job

  • lsload displays the host status ok.

  • bhosts displays the host status closed.

  • Users can run tasks on the host using lsrun or lsgrun. To prevent users from running tasks during execution of an exclusive job, the parameter LSF_DISABLE_LSRUN=y must be defined in lsf.conf.

Default

n. LSF locks the LIM on a host running an exclusive job and unlocks the LIM when the exclusive job finishes.

LSB_DISABLE_RERUN_POST_EXEC

Syntax

LSB_DISABLE_RERUN_POST_EXEC=y | Y

Description

If set, and the job is rerunnable, the POST_EXEC configured at the job level or the queue level is not executed if the job is rerun.

Running of post-execution commands upon restart of a rerunnable job may not always be desirable. For example, if the post-exec removes certain files, or does other cleanup that should only happen if the job finishes successfully, use LSB_DISABLE_RERUN_POST_EXEC to prevent the post-exec from running and allow the successful continuation of the job when it reruns.

The POST_EXEC may still run for a job rerun when the execution host loses contact with the master host due to network problems. In this case mbatchd assumes the job has failed and restarts the job on another host. The original execution host, out of contact with the master host, completes the job and runs the POST_EXEC.

Default

Not defined

LSB_DISPLAY_YEAR

Syntax

LSB_DISPLAY_YEAR=y|Y|n|N

Description

Toggles on and off inclusion of the year in the time string displayed by the commands bjobs -l, bacct -l, and bhist -l|-b|-t.

Default

N

LSB_ECHKPNT_KEEP_OUTPUT

Syntax

LSB_ECHKPNT_KEEP_OUTPUT=y | Y

Description

Saves the standard output and standard error of custom echkpnt and erestart methods to:
  • checkpoint_dir/$LSB_JOBID/echkpnt.out

  • checkpoint_dir/$LSB_JOBID/echkpnt.err

  • checkpoint_dir/$LSB_JOBID/erestart.out

  • checkpoint_dir/$LSB_JOBID/erestart.err

Can also be defined as an environment variable.

Default

Not defined. Standard error and standard output messages from custom echkpnt and erestart programs is directed to /dev/null and discarded by LSF.

See also

LSB_ECHKPNT_METHOD, LSB_ECHKPNT_METHOD_DIR

LSB_ECHKPNT_METHOD

Syntax

LSB_ECHKPNT_METHOD="method_name [method_name] ..."

Description

Name of custom echkpnt and erestart methods.

Can also be defined as an environment variable, or specified through the bsub -k option.

The name you specify here is used for both your custom echkpnt and erestart programs. You must assign your custom echkpnt and erestart programs the name echkpnt.method_name and erestart.method_name. The programs echkpnt.method_name and erestart.method_name. must be in LSF_SERVERDIR or in the directory specified by LSB_ECHKPNT_METHOD_DIR.

Do not define LSB_ECHKPNT_METHOD=default as default is a reserved keyword to indicate to use the default echkpnt and erestart methods of LSF. You can however, specify bsub -k "my_dir method=default" my_job to indicate that you want to use the default checkpoint and restart methods.

When this parameter is not defined in lsf.conf or as an environment variable and no custom method is specified at job submission through bsub -k, LSF uses echkpnt.default and erestart.default to checkpoint and restart jobs.

When this parameter is defined, LSF uses the custom checkpoint and restart methods specified.

Limitations

The method name and directory (LSB_ECHKPNT_METHOD_DIR) combination must be unique in the cluster.

For example, you may have two echkpnt applications with the same name such as echkpnt.mymethod but what differentiates them is the different directories defined with LSB_ECHKPNT_METHOD_DIR. It is the cluster administrator’s responsibility to ensure that method name and method directory combinations are unique in the cluster.

Default

Not defined. LSF uses echkpnt.default and erestart.default to checkpoint and restart jobs

See also

LSB_ECHKPNT_METHOD_DIR, LSB_ECHKPNT_KEEP_OUTPUT

LSB_ECHKPNT_METHOD_DIR

Syntax

LSB_ECHKPNT_METHOD_DIR=path

Description

Absolute path name of the directory in which custom echkpnt and erestart programs are located.

The checkpoint method directory should be accessible by all users who need to run the custom echkpnt and erestart programs.

Can also be defined as an environment variable.

Default

Not defined. LSF searches in LSF_SERVERDIR for custom echkpnt and erestart programs.

See also

LSB_ESUB_METHOD, LSB_ECHKPNT_KEEP_OUTPUT

LSB_ESUB_METHOD

Syntax

LSB_ESUB_METHOD="esub_application [esub_application] ..."

Description

Specifies a mandatory esub that applies to all job submissions. LSB_ESUB_METHOD lists the names of the application-specific esub executables used in addition to any executables specified by the bsub -a option.

For example, LSB_ESUB_METHOD="dce fluent" runs LSF_SERVERDIR/esub.dce and LSF_SERVERDIR/esub.fluent for all jobs submitted to the cluster. These esubs define, respectively, DCE as the mandatory security system and FLUENT as the mandatory application for all jobs.

LSB_ESUB_METHOD can also be defined as an environment variable.

The value of LSB_ESUB_METHOD must correspond to an actual esub file. For example, to use LSB_ESUB_METHOD=fluent, the file esub.fluent must exist in LSF_SERVERDIR.

The name of the esub program must be a valid file name. Valid file names contain only alphanumeric characters, underscore (_) and hyphen (-).

Restriction:

The name esub.user is reserved. Do not use the name esub.user for an application-specific esub.

The master esub (mesub) uses the name you specify to invoke the appropriate esub program. The esub and esub.esub_application programs must be located in LSF_SERVERDIR.

LSF does not detect conflicts based on esub names. For example, if LSB_ESUB_METHOD="openmpi" and bsub -a pvm is specified at job submission, the job could fail because these esubs define two different types of parallel job handling.

Default

Not defined. LSF does not apply a mandatory esub to jobs submitted to the cluster.

LSB_EVENTS_FILE_KEEP_OPEN

Syntax

LSB_EVENTS_FILE_KEEP_OPEN=Y|N

Description

Windows only.

Specify Y to open the events file once, and keep it open always.

Specify N to open and close the events file each time a record is written.

Default

Y

LSB_HJOB_PER_SESSION

Syntax

LSB_HJOB_PER_SESSION=max_num

Description

Specifies the maximum number of jobs that can be dispatched in each scheduling cycle to each host

Valid values

Any positive integer

Default

Not defined

Notes

LSB_HJOB_PER_SESSION is activated only if the JOB_ACCEPT_INTERVAL parameter is set to 0.

See also

JOB_ACCEPT_INTERVAL parameter in lsb.params

LSB_INDEX_BY_JOB

Syntax

LSB_INDEX_BY_JOB="JOBNAME"

Description

When set to JOBNAME, creates a job index of job names. Define when using job dependency conditions (bsub -w) with job names to optimize job name searches.

Valid values

JOBNAME

Default

Not defined. Job index is not created.

LSB_INTERACT_MSG_ENH

Syntax

LSB_INTERACT_MSG_ENH=y | Y

Description

If set, enables enhanced messaging for interactive batch jobs. To disable interactive batch job messages, set LSB_INTERACT_MSG_ENH to any value other than y or Y; for example, LSB_INTERACT_MSG_ENH=N.

Default

Not defined

See also

LSB_INTERACT_MSG_INTVAL

LSB_INTERACT_MSG_INTVAL

Syntax

LSB_INTERACT_MSG_INTVAL=time_seconds

Description

Specifies the update interval in seconds for interactive batch job messages. LSB_INTERACT_MSG_INTVAL is ignored if LSB_INTERACT_MSG_ENH is not set.

Job information that LSF uses to get the pending or suspension reason is updated according to the value of PEND_REASON_UPDATE_INTERVAL in lsb.params.

Default

Not defined. If LSB_INTERACT_MSG_INTVAL is set to an incorrect value, the default update interval is 60 seconds.

See also

LSB_INTERACT_MSG_ENH

LSB_JOB_CPULIMIT

Syntax

LSB_JOB_CPULIMIT=y | n

Description

Determines whether the CPU limit is a per-process limit enforced by the OS or whether it is a per-job limit enforced by LSF:
  • The per-process limit is enforced by the OS when the CPU time of one process of the job exceeds the CPU limit.

  • The per-job limit is enforced by LSF when the total CPU time of all processes of the job exceed the CPU limit.

This parameter applies to CPU limits set when a job is submitted with bsub -c, and to CPU limits set for queues by CPULIMIT in lsb.queues.
  • LSF-enforced per-job limit: When the sum of the CPU time of all processes of a job exceed the CPU limit, LSF sends a SIGXCPU signal (where supported by the operating system) from the operating system to all processes belonging to the job, then SIGINT, SIGTERM and SIGKILL. The interval between signals is 10 seconds by default. The time interval between SIGXCPU, SIGINT, SIGKILL, SIGTERM can be configured with the parameter JOB_TERMINATE_INTERVAL in lsb.params.

    Restriction:

    SIGXCPU is not supported by Windows.

  • OS-enforced per process limit: When one process in the job exceeds the CPU limit, the limit is enforced by the operating system. For more details, refer to your operating system documentation for setrlimit().

The setting of LSB_JOB_CPULIMIT has the following effect on how the limit is enforced:

LSB_JOB_CPULIMIT LSF per-job limit OS per-process limit

y Enabled Disabled

n Disabled Enabled

Not defined Enabled Enabled

Default

Not defined

Notes

To make LSB_JOB_CPULIMIT take effect, use the command badmin hrestart all to restart all sbatchds in the cluster.

Changing the default Terminate job control action: You can define a different terminate action in lsb.queues with the parameter JOB_CONTROLS if you do not want the job to be killed. For more details on job controls, see Administering Platform LSF.

Limitations

If a job is running and the parameter is changed, LSF is not able to reset the type of limit enforcement for running jobs.
  • If the parameter is changed from per-process limit enforced by the OS to per-job limit enforced by LSF (LSB_JOB_CPULIMIT=n changed to LSB_JOB_CPULIMIT=y), both per-process limit and per-job limit affect the running job. This means that signals may be sent to the job either when an individual process exceeds the CPU limit or the sum of the CPU time of all processes of the job exceed the limit. A job that is running may be killed by the OS or by LSF.

  • If the parameter is changed from per-job limit enforced by LSF to per-process limit enforced by the OS (LSB_JOB_CPULIMIT=y changed to LSB_JOB_CPULIMIT=n), the job is allowed to run without limits because the per-process limit was previously disabled.

See also

lsb.queues, bsub, JOB_TERMINATE_INTERVAL in lsb.params, LSB_MOD_ALL_JOBS

LSB_JOB_MEMLIMIT

Syntax

LSB_JOB_MEMLIMIT=y | n

Description

Determines whether the memory limit is a per-process limit enforced by the OS or whether it is a per-job limit enforced by LSF.
  • The per-process limit is enforced by the OS when the memory allocated to one process of the job exceeds the memory limit.

  • The per-job limit is enforced by LSF when the sum of the memory allocated to all processes of the job exceeds the memory limit.

This parameter applies to memory limits set when a job is submitted with bsub -M mem_limit, and to memory limits set for queues with MEMLIMIT in lsb.queues.

The setting of LSB_JOB_MEMLIMIT has the following effect on how the limit is enforced:


When LSB_JOB_MEMLIMIT is

LSF-enforced per-job limit

OS-enforced per-process limit

y

Enabled

Disabled

n or not defined

Disabled

Enabled


When LSB_JOB_MEMLIMIT is Y, the LSF-enforced per-job limit is enabled, and the OS-enforced per-process limit is disabled.

When LSB_JOB_MEMLIMIT is N or not defined, the LSF-enforced per-job limit is disabled, and the OS-enforced per-process limit is enabled.

LSF-enforced per-job limit: When the total memory allocated to all processes in the job exceeds the memory limit, LSF sends the following signals to kill the job: SIGINT, SIGTERM, then SIGKILL. The interval between signals is 10 seconds by default.

On UNIX, the time interval between SIGINT, SIGKILL, SIGTERM can be configured with the parameter JOB_TERMINATE_INTERVAL in lsb.params.

OS-enforced per process limit: When the memory allocated to one process of the job exceeds the memory limit, the operating system enforces the limit. LSF passes the memory limit to the operating system. Some operating systems apply the memory limit to each process, and some do not enforce the memory limit at all.

OS memory limit enforcement is only available on systems that support RLIMIT_RSS for setrlimit().

The following operating systems do not support the memory limit at the OS level and the job is allowed to run without a memory limit:
  • Windows

  • Sun Solaris 2.x

Default

Not defined. Per-process memory limit enforced by the OS; per-job memory limit enforced by LSF disabled

Notes

To make LSB_JOB_MEMLIMIT take effect, use the command badmin hrestart all to restart all sbatchds in the cluster.

If LSB_JOB_MEMLIMIT is set, it overrides the setting of the parameter LSB_MEMLIMIT_ENFORCE. The parameter LSB_MEMLIMIT_ENFORCE is ignored.

The difference between LSB_JOB_MEMLIMIT set to y and LSB_MEMLIMIT_ENFORCE set to y is that with LSB_JOB_MEMLIMIT, only the per-job memory limit enforced by LSF is enabled. The per-process memory limit enforced by the OS is disabled. With LSB_MEMLIMIT_ENFORCE set to y, both the per-job memory limit enforced by LSF and the per-process memory limit enforced by the OS are enabled.

Changing the default Terminate job control action: You can define a different Terminate action in lsb.queues with the parameter JOB_CONTROLS if you do not want the job to be killed. For more details on job controls, see Administering Platform LSF.

Limitations

If a job is running and the parameter is changed, LSF is not able to reset the type of limit enforcement for running jobs.
  • If the parameter is changed from per-process limit enforced by the OS to per-job limit enforced by LSF (LSB_JOB_MEMLIMIT=n or not defined changed to LSB_JOB_MEMLIMIT=y), both per-process limit and per-job limit affect the running job. This means that signals may be sent to the job either when the memory allocated to an individual process exceeds the memory limit or the sum of memory allocated to all processes of the job exceed the limit. A job that is running may be killed by LSF.

  • If the parameter is changed from per-job limit enforced by LSF to per-process limit enforced by the OS (LSB_JOB_MEMLIMIT=y changed to LSB_JOB_MEMLIMIT=n or not defined), the job is allowed to run without limits because the per-process limit was previously disabled.

See also

LSB_MEMLIMIT_ENFORCE, LSB_MOD_ALL_JOBS, lsb.queues, bsub, JOB_TERMINATE_INTERVAL in lsb.params

LSB_JOB_OUTPUT_LOGGING

Syntax

LSB_JOB_OUTPUT_LOGGING=Y | N

Description

Determines whether jobs write job notification messages to the logfile.

Default

Not defined (jobs do not write job notification messages to the logfile).

LSB_JOBID_DISP_LENGTH

Syntax

LSB_JOBID_DISP_LENGTH=integer

Description

By default, LSF commands bjobs and bhist display job IDs with a maximum length of 7 characters. Job IDs greater than 9999999 are truncated on the left.

When LSB_JOBID_DISP_LENGTH=10, the width of the JOBID column in bjobs and bhist increases to 10 characters.

Valid values

Specify an integer between 7 and 10.

Default

Not defined. LSF uses the default 7-character length for job ID display.

LSB_KEEP_SYSDEF_RLIMIT

Syntax

LSB_KEEP_SYSDEF_RLIMIT=y | n

Description

If resource limits are configured for a user in the SGI IRIX User Limits Database (ULDB) domain specified in LSF_ULDB_DOMAIN, and there is no domain default, the system default is honored.

If LSB_KEEP_SYSDEF_RLIMIT=n, and no resource limits are configured in the domain for the user and there is no domain default, LSF overrides the system default and sets system limits to unlimited.

Default

Not defined. No resource limits are configured in the domain for the user and there is no domain default.

LSB_LOAD_TO_SERVER_HOSTS (OBSOLETE)

Syntax

LSB_LOAD_TO_SERVER_HOSTS=Y | y

Description

Note:

This parameter is obsolete in LSF 7 Update 2. By default, client sbatchd contacts the local LIM for host status and load information.

Highly recommended for large clusters to decrease the load on the master LIM. Forces the client sbatchd to contact the local LIM for host status and load information. The client sbatchd only contacts the master LIM or a LIM on one of the LSF_SERVER_HOSTS if sbatchd cannot find the information locally.

Default

Y. Client sbatchd contacts the local LIM for host status and load information.

See also

LSF_SERVER_HOSTS in slave.config

LSB_LOCALDIR

Syntax

LSB_LOCALDIR=path

Description

Enables duplicate logging.

Specify the path to a local directory that exists only on the first LSF master host. LSF puts the primary copies of the event and accounting log files in this directory. LSF puts the duplicates in LSB_SHAREDIR.

Important:

Always restart both the mbactchd and sbatchd when modifying LSB_LOCALDIR.

Example

LSB_LOCALDIR=/usr/share/lsbatch/loginfo

Default

Not defined

See also

LSB_SHAREDIR, EVENT_UPDATE_INTERVAL in lsb.params

LSB_MAILPROG

Syntax

LSB_MAILPROG=file_name

Description

Path and file name of the mail program used by LSF to send email. This is the electronic mail program that LSF uses to send system messages to the user. When LSF needs to send email to users it invokes the program defined by LSB_MAILPROG in lsf.conf. You can write your own custom mail program and set LSB_MAILPROG to the path where this program is stored.

LSF administrators can set the parameter as part of cluster reconfiguration. Provide the name of any mail program. For your convenience, LSF provides the sendmail mail program, which supports the sendmail protocol on UNIX.

In a mixed cluster, you can specify different programs for Windows and UNIX. You can set this parameter during installation on Windows. For your convenience, LSF provides the lsmail.exe mail program, which supports SMTP and Microsoft Exchange Server protocols on Windows. If lsmail is specified, the parameter LSB_MAILSERVER must also be specified.

If you change your mail program, the LSF administrator must restart sbatchd on all hosts to retrieve the new value.

UNIX

By default, LSF uses /usr/lib/sendmail to send email to users. LSF calls LSB_MAILPROG with two arguments; one argument gives the full name of the sender, and the other argument gives the return address for mail.

LSB_MAILPROG must read the body of the mail message from the standard input. The end of the message is marked by end-of-file. Any program or shell script that accepts the arguments and input, and delivers the mail correctly, can be used.

LSB_MAILPROG must be executable by any user.

Windows

If LSB_MAILPROG is not defined, no email is sent.

Examples

LSB_MAILPROG=lsmail.exe
LSB_MAILPROG=/serverA/tools/lsf/bin/unixhost.exe 

Default

/usr/lib/sendmail (UNIX)

blank (Windows)

See also

LSB_MAILSERVER, LSB_MAILTO

LSB_MAILSERVER

Syntax

LSB_MAILSERVER=mail_protocol:mail_server

Description

Part of mail configuration on Windows.

This parameter only applies when lsmail is used as the mail program (LSB_MAILPROG=lsmail.exe).Otherwise, it is ignored.

Both mail_protocol and mail_server must be indicated.

Set this parameter to either SMTP or Microsoft Exchange protocol (SMTP or EXCHANGE) and specify the name of the host that is the mail server.

This parameter is set during installation of LSF on Windows or is set or modified by the LSF administrator.

If this parameter is modified, the LSF administrator must restart sbatchd on all hosts to retrieve the new value.

Examples

LSB_MAILSERVER=EXCHANGE:Host2@company.com
LSB_MAILSERVER=SMTP:MailHost

Default

Not defined

See also

LSB_LOCALDIR

LSB_MAILSIZE_LIMIT

Syntax

LSB_MAILSIZE_LIMIT=email_size_KB

Description

Limits the size in KB of the email containing job output information.

The system sends job information such as CPU, process and memory usage, job output, and errors in email to the submitting user account. Some batch jobs can create large amounts of output. To prevent large job output files from interfering with your mail system, use LSB_MAILSIZE_LIMIT to set the maximum size in KB of the email containing the job information. Specify a positive integer.

If the size of the job output email exceeds LSB_MAILSIZE_LIMIT, the output is saved to a file under JOB_SPOOL_DIR or to the default job output directory if JOB_SPOOL_DIR is not defined. The email informs users of where the job output is located.

If the -o option of bsub is used, the size of the job output is not checked against LSB_MAILSIZE_LIMIT.

If you use a custom mail program specified by the LSB_MAILPROG parameter that can use the LSB_MAILSIZE environment variable, it is not necessary to configure LSB_MAILSIZE_LIMIT.

Default

By default, LSB_MAILSIZE_LIMIT is not enabled. No limit is set on size of batch job output email.

See also

LSB_MAILPROG, LSB_MAILTO

LSB_MAIL_FROM_DOMAIN

Syntax

LSB_MAIL_FROM_DOMAIN=domain_name

Description

Windows only.

LSF uses the username as the from address to send mail. In some environments the from address requires domain information. If LSB_MAIL_FROM_DOMAIN is set, the domain name specified in this parameter will be added to the from address.

For example, if LSB_MAIL_FROM_DOMAIN is not set the, from address is SYSTEM; if LSB_MAIL_FROM_DOMAIN=platform.com, the from address is SYSTEM@platform.com.

Default

Not defined.

LSB_MAILTO

Syntax

LSB_MAILTO=mail_account

Description

LSF sends electronic mail to users when their jobs complete or have errors, and to the LSF administrator in the case of critical errors in the LSF system. The default is to send mail to the user who submitted the job, on the host on which the daemon is running; this assumes that your electronic mail system forwards messages to a central mailbox.

The LSB_MAILTO parameter changes the mailing address used by LSF. LSB_MAILTO is a format string that is used to build the mailing address.

Common formats are:
  • !U :  Mail is sent to the submitting user's account name on the local host. The substring !U, if found, is replaced with the user’s account name.

  • !U@company_name.com :  Mail is sent to user@company_name.com on the mail server. The mail server is specified by LSB_MAILSERVER.

  • !U@!H :  Mail is sent to user@submission_hostname. The substring !H is replaced with the name of the submission host. This format is valid on UNIX only. It is not supported on Windows.

All other characters (including any other ‘!’) are copied exactly.

If this parameter is modified, the LSF administrator must restart sbatchd on all hosts to retrieve the new value.

Windows only: When a job exception occurs (for example, a job is overrun or underrun), an email is sent to the primary administrator set in the lsf.cluster.cluster_name file to the doman set in LSB_MAILTO. For example, if the primary administrator is lsfadmin and LSB_MAILTO=fred@company.com, an email is sent to lsfadmin@company.com. The email must be a valid Windows email account.

Default

!U

See also

LSB_MAILPROG, LSB_MAILSIZE_LIMIT

LSB_MAX_ASKED_HOSTS_NUMBER

Syntax

LSB_MAX_ASKED_HOSTS_NUMBER=integer

Description

Limits the number of hosts a user can specify with the -m (host preference) option of the following commands:

  • bsub

  • brun

  • bmod

  • brestart

  • brsvadd

  • brsvmod

  • brsvs

The job is rejected if more hosts are specified than the value of LSB_MAX_ASKED_HOSTS_NUMBER.

CAUTION:

If this value is set high, there will be a performance effect if users submit or modify jobs using the -m option and specify a large number of hosts. 512 hosts is the suggested upper limit.

Valid values

Any whole, positive integer.

Default

512

LSB_MAX_JOB_DISPATCH_PER_SESSION

Syntax

LSB_MAX_JOB_DISPATCH_PER_SESSION=integer

Description

Defines the maximum number of jobs that mbatchd can dispatch during one job scheduling session.

Both mbatchd and sbatchd must be restarted when you change the value of this parameter.

If set to a value greater than 300, the file descriptor limit is increased on operating systems that support a file descriptor limit greater than 1024.

Use together with MAX_SBD_CONNS in lsb.params. Set LSB_MAX_JOB_DISPATCH_PER_SESSION to a value no greater than one-half the value of MAX_SBD_CONNS. This setting configures mbatchd to dispatch jobs at a high rate while maintaining the processing speed of other mbatchd tasks.

Examples

LSB_MAX_JOB_DISPATCH_PER_SESSION=300

The file descriptor limit is 1024.

LSB_MAX_JOB_DISPATCH_PER_SESSION=1000

The file descriptor limit is greater than 1024 on operating systems that support a greater limit.

Default

300

See also

MAX_SBD_CONNS in lsb.params

LSB_MAX_PACK_JOBS

Syntax

LSB_MAX_PACK_JOBS=integer

Description

Applies to job packs only. Enables the job packs feature and specifies the maximum number of job submission requests in one job pack.

If the value is 0, job packs are disabled.

If the value is 1, jobs from the file are submitted individually, as if submitted directly using the bsub command.

We recommend 100 as the initial pack size. Tune this parameter based on cluster performance. The larger the pack size, the faster the job submission rate is for all the job requests the job submission file. However, while mbatchd is processing a pack, mbatchd is blocked from processing other requests, so increasing pack size can affect mbatchd response time for other job submissions.

If you change the configuration of this parameter, you must restart mbatchd.

Parameters related to job packs are not supported as environment variables.

Valid Values

Any positive integer or 0.

Default

0 (disabled)

LSB_MAX_PROBE_SBD

Syntax

LSB_MAX_PROBE_SBD=integer

Description

Specifies the maximum number of sbatchd instances can be polled by mbatchd in the interval MBD_SLEEP_TIME/10 (6 seconds by default). Use this parameter in large clusters to reduce the time it takes for mbatchd to probe all sbatchds.

The value of LSB_MAX_PROBE_SBD cannot be greater than the number of hosts in the cluster. If it is, mbatchd adjusts the value of LSB_MAX_PROBE_SBD to be same as the number of hosts.

After modifying LSB_MAX_PROBE_SBD, use badmin mbdrestart to restart mbatchd and let the modified value take effect.

If LSB_MAX_PROBE_SBD is defined, the value of MAX_SBD_FAIL in lsb.params can be less than 3.

Valid values

Any positive integer between 0 and 64

Default

20

See also

MAX_SBD_FAIL in lsb.params

LSB_MAX_NQS_QUEUES

Syntax

LSB_MAX_NQS_QUEUES=nqs_queues

Description

The maximum number of NQS queues allowed in the LSF cluster. Required for LSF to work with NQS. You must restart mbatchd if you change the value of LSB_MAX_NQS_QUEUES.

The total number of NQS queues configured by NQS_QUEUES in lsb.queues cannot exceed the value of LSB_MAX_NQS_QUEUES. NQS queues in excess of the maximum queues are ignored.

If you do not define LSB_MAX_NQS_QUEUES or define an incorrect value, LSF-NQS interoperation is disabled.

Valid values

Any positive integer

Default

None

LSB_MBD_BUSY_MSG

Syntax

LSB_MBD_BUSY_MSG="message_string"

Description

Specifies the message displayed when mbatchd is too busy to accept new connections or respond to client requests.

Define this parameter if you want to customize the message.

Valid values

String, either non-empty or empty.

Default

Not defined. By default, LSF displays the message "LSF is processing your request. Please wait..."

Batch commands retry the connection to mbatchd at the intervals specified by the parameters LSB_API_CONNTIMEOUT and LSB_API_RECVTIMEOUT.

LSB_MBD_CONNECT_FAIL_MSG

Syntax

LSB_MBD_CONNECT_FAIL_MSG="message_string"

Description

Specifies the message displayed when internal system connections to mbatchd fail.

Define this parameter if you want to customize the message.

Valid values

String, either non-empty or empty.

Default

Not defined. By default, LSF displays the message "Cannot connect to LSF. Please wait..."

Batch commands retry the connection to mbatchd at the intervals specified by the parameters LSB_API_CONNTIMEOUT and LSB_API_RECVTIMEOUT.

LSB_MBD_DOWN_MSG

Syntax

LSB_MBD_DOWN_MSG="message_string"

Description

Specifies the message displayed by the bhosts command when mbatchd is down or there is no process listening at either the LSB_MBD_PORT or the LSB_QUERY_PORT.

Define this parameter if you want to customize the message.

Valid values

String, either non-empty or empty.

Default

Not defined. By default, LSF displays the message "LSF is down. Please wait..."

Batch commands retry the connection to mbatchd at the intervals specified by the parameters LSB_API_CONNTIMEOUT and LSB_API_RECVTIMEOUT.

LSB_MBD_MAX_SIG_COUNT

Syntax

LSB_MBD_MAX_SIG_COUNT=integer

Description

When a host enters an unknown state, the mbatchd attempts to retry any pending jobs. This parameter specifies the maximum number of pending signals that the mbatchd deals with concurrently in order not to overload it. A high value for LSB_MBD_MAX_SIG_COUNT can negatively impact the performance of your cluster.

Valid Valid values

Integers between 5-100, inclusive.

Default

5

LSB_MBD_PORT

See LSF_LIM_PORT, LSF_RES_PORT, LSB_MBD_PORT, LSB_SBD_PORT.

LSB_MC_CHKPNT_RERUN

Syntax

LSB_MC_CHKPNT_RERUN=y | n

Description

For checkpointable MultiCluster jobs, if a restart attempt fails, the job is rerun from the beginning (instead of from the last checkpoint) without administrator or user intervention.

The submission cluster does not need to forward the job again. The execution cluster reports the job’s new pending status back to the submission cluster, and the job is dispatched to the same host to restart from the beginning

Default

n

LSB_MC_INITFAIL_MAIL

Syntax

LSB_MC_INITFAIL_MAIL=Y | All | Administrator

Description

MultiCluster job forwarding model only.

Specify Y to make LSF email the job owner when a job is suspended after reaching the retry threshold.

Specify Administrator to make LSF email the primary administrator when a job is suspended after reaching the retry threshold.

Specify All to make LSF email both the job owner and the primary administrator when a job is suspended after reaching the retry threshold.

Default

not defined

LSB_MC_INITFAIL_RETRY

Syntax

LSB_MC_INITFAIL_RETRY=integer

Description

MultiCluster job forwarding model only. Defines the retry threshold and causes LSF to suspend a job that repeatedly fails to start. For example, specify 2 retry attempts to make LSF attempt to start a job 3 times before suspending it.

Default

5

LSB_MEMLIMIT_ENFORCE

Syntax

LSB_MEMLIMIT_ENFORCE=y | n

Description

Specify y to enable LSF memory limit enforcement.

If enabled, LSF sends a signal to kill all processes that exceed queue-level memory limits set by MEMLIMIT in lsb.queues or job-level memory limits specified by bsub -M mem_limit.

Otherwise, LSF passes memory limit enforcement to the OS. UNIX operating systems that support RLIMIT_RSS for setrlimit() can apply the memory limit to each process.

The following operating systems do not support memory limit at the OS level:
  • Windows

  • Sun Solaris 2.x

Default

Not defined. LSF passes memory limit enforcement to the OS.

See also

lsb.queues

LSB_MIG2PEND

Syntax

LSB_MIG2PEND=0 | 1

Description

Applies only to migrating checkpointable or rerunnable jobs.

When defined with a value of 1, requeues migrating jobs instead of restarting or rerunning them on the first available host. Requeues the jobs in the PEND state in order of the original submission time and with the original job priority.

If you want to place the migrated jobs at the bottom of the queue without considering submission time, define both LSB_MIG2PEND=1 and LSB_REQUEUE_TO_BOTTOM=1 in lsf.conf.

Ignored in a MultiCluster environment.

Default

Not defined. LSF restarts or reruns migrating jobs on the first available host.

See also

LSB_REQUEUE_TO_BOTTOM

LSB_MIXED_PATH_DELIMITER

Syntax

LSB_MIXED_PATH_DELIMITER="|"

Description

Defines the delimiter between UNIX and Windows paths if LSB_MIXED_PATH_ENABLE=y. For example, /home/tmp/J.out|c:\tmp\J.out.

Default

A pipe "|" is the default delimiter.

See also

LSB_MIXED_PATH_ENABLE

LSB_MIXED_PATH_ENABLE

Syntax

LSB_MIXED_PATH_ENABLE=y | n

Description

Allows you to specify both a UNIX and Windows path when submitting a job in a mixed cluster (both Windows and UNIX hosts).

The format is always unix_path_cmd|windows_path_cmd.

Applies to the following options of bsub:

  • -o, -oo

  • -e, -eo

  • -i, -is

  • -cwd

  • -E, -Ep

  • CMD

  • queue level PRE_EXEC, POST_EXEC

  • application level PRE_EXEC, POST_EXEC

For example:

bsub -o "/home/tmp/job%J.out|c:\tmp\job%J.out" -e "/home/tmp/err%J.out|c:\tmp\err%J.out" -E "sleep 9| sleep 8" -Ep "sleep 7| sleep 6" -cwd "/home/tmp|c:\tmp" "sleep 121|sleep 122"

The delimiter is configurable: LSB_MIXED_PATH_DELIMITER.

Note:

LSB_MIXED_PATH_ENABLE doesn't support interactive mode (bsub -I).

Default

Not defined. LSF jobs submitted .

See also

LSB_MIXED_PATH_DELIMITER

LSB_MOD_ALL_JOBS

Syntax

LSB_MOD_ALL_JOBS=y | Y

Description

If set, enables bmod to modify resource limits and location of job output files for running jobs.

After a job has been dispatched, the following modifications can be made:
  • CPU limit (-c [hour:]minute[/host_name | /host_model] | -cn)

  • Memory limit (-M mem_limit | -Mn)

  • Rerunnable jobs (-r | -rn)

  • Resource requirements (-R "res_req" except -R "cu[cu_string]")

  • Run limit (-W run_limit[/host_name | /host_model] | -Wn)

  • Standard output file name (-o output_file | -on)

  • Standard error file name (-e error_file | -en)

  • Overwrite standard output (stdout) file name up to 4094 characters for UNIX or 255 characters for Windows (-oo output_file)

  • Overwrite standard error (stderr) file name up to 4094 characters for UNIX or 255 characters for Windows (-eo error_file)

To modify the CPU limit or the memory limit of running jobs, the parameters LSB_JOB_CPULIMIT=Y and LSB_JOB_MEMLIMIT=Y must be defined in lsf.conf.

Important:

Always run badmin mbdrestart after modifying LSB_MOD_ALL_JOBS.

Default

Not defined

See also

LSB_JOB_CPULIMIT, LSB_JOB_MEMLIMIT

LSB_NCPU_ENFORCE

Description

When set to 1, enables parallel fairshare and considers the number of CPUs when calculating dynamic priority for queue-level user-based fairshare. LSB_NCPU_ENFORCE does not apply to host-partition user-based fairshare. For host-partition user-based fairshare, the number of CPUs is automatically considered.

Default

Not defined

LSB_NQS_PORT

Syntax

LSB_NQS_PORT=port_number

Description

Required for LSF to work with NQS.

TCP service port to use for communication with NQS.

Where defined

This parameter can alternatively be set as an environment variable or in the services database such as /etc/services.

Example

LSB_NQS_PORT=607

Default

Not defined

LSB_NUM_NIOS_CALLBACK_THREADS

Syntax

LSB_NUM_NIOS_CALLBACK_THREADS=integer

Description

Specifies the number of callback threads to use for batch queries.

If your cluster runs a large amount of blocking mode (bsub -K) and interactive jobs (bsub -I), response to batch queries can become very slow. If you run large number of bsub -I or bsub -K jobs, you can define the threads to the number of processors on the master host.

Default

Not defined

LSB_PACK_MESUB

Syntax

LSB_PACK_MESUB=Y|y|N|n

Description

Applies to job packs only.

If LSB_PACK_MESUB=N, mesub will not be executed for any jobs in the job submission file, even if there are esubs configured at the application level (-a option of bsub), or using LSB_ESUB_METHOD in lsf.conf, or through a named esub executable under LSF_SERVERDIR.

If LSB_PACK_MESUB=Y, mesub is executed for every job in the job submission file.

Parameters related to job packs are not supported as environment variables.

Default

Y

LSB_PACK_SKIP_ERROR

Syntax

LSB_PACK_SKIP_ERROR=Y|y|N|n

Description

Applies to job packs only.

If LSB_PACK_SKIP_ERROR=Y, all requests in the job submission file are submitted, even if some of the job submissions fail. The job submission process always continues to the end of the file.

If LSB_PACK_SKIP_ERROR=N, job submission stops if one job submission fails. The remaining requests in the job submission file are not submitted.

If you change the configuration of this parameter, you must restart mbatchd.

Parameters related to job packs are not supported as environment variables.

Default

N

LSB_PSET_BIND_DEFAULT

Syntax

LSB_PSET_BIND_DEFAULT=y | Y

Description

If set, LSF binds a job that is not explicitly associated with an HP-UX pset to the default pset 0. If LSB_PSET_BIND_DEFAULT is not set, LSF must still attach the job to a pset, and so binds the job to the same pset used by the LSF HPC daemons.

Use LSB_PSET_BIND_DEFAULT to improve LSF daemon performance by automatically unbinding a job with no pset options from the pset used by the LSF daemons, and binding it to the default pset.

Default

Not defined

LSB_QUERY_PORT

Syntax

LSB_QUERY_PORT=port_number

Description

Optional. Applies only to UNIX platforms that support thread programming.

When using MultiCluster, LSB_QUERY_PORT must be defined on all clusters.

This parameter is recommended for busy clusters with many jobs and frequent query requests to increase mbatchd performance when you use the bjobs command.

This may indirectly increase overall mbatchd performance.

The port_number is the TCP/IP port number to be used by mbatchd to only service query requests from the LSF system. mbatchd checks the query port during initialization.

If LSB_QUERY_PORT is not defined:
  • mbatchd uses the port specified by LSB_MBD_PORT in lsf.conf, or, if LSB_MBD_PORT is not defined, looks into the system services database for port numbers to communicate with other hosts in the cluster.

  • For each query request it receives, mbatchd forks one child mbatchd to service the request. Each child mbatchd processes one request and then exits.

If LSB_QUERY_PORT is defined:
  • mbatchd prepares this port for connection.The default behavior of mbatchd changes, a child mbatchd is forked, and the child mbatchd creates threads to process requests.

  • mbatchd responds to requests by forking one child mbatchd. As soon as mbatchd has forked a child mbatchd, the child mbatchd takes over and listens on the port to process more query requests. For each request, the child mbatchd creates a thread to process it.

The interval used by mbatchd for forking new child mbatchds is specified by the parameter MBD_REFRESH_TIME in lsb.params.

The child mbatchd continues to listen to the port number specified by LSB_QUERY_PORT and creates threads to service requests until the job changes status, a new job is submitted, or the time specified in MBD_REFRESH_TIME in lsb.params has passed (see MBD_REFRESH_TIME in lsb.params for more details). When any of these happens, the parent mbatchd sends a message to the child mbatchd to exit.

LSB_QUERY_PORT must be defined when NEWJOB_REFRESH=Y in lsb.params to enable a child mbatchd to get up to date information about new jobs from the parent mbatchd.

Operating system support

Tip:

See the Online Support area of the Platform Computing Web site at www.platform.com for the latest information about operating systems that support multithreaded mbatchd.

Default

Not defined

See also

MBD_REFRESH_TIME and NEWJOB_REFRESH in lsb.params

LSB_REQUEUE_TO_BOTTOM

Syntax

LSB_REQUEUE_TO_BOTTOM=0 | 1

Description

Specify 1 to put automatically requeued jobs at the bottom of the queue instead of at the top. Also requeues migrating jobs to the bottom of the queue if LSB_MIG2PEND is also defined with a value of 1.

Specify 0 to requeue jobs to the top of the queue.

Ignored in a MultiCluster environment.

Default

0 (LSF requeues jobs to the top of the queue).

See also

LSB_MIG2PEND, REQUEUE_EXIT_VALUES in lsb.queues

LSB_RLA_PORT

Syntax

LSB_RLA_PORT=port_number

Description

TCP port used for communication between the LSF topology adapter (RLA) and the HPC scheduler plugin.

Default

6883

LSB_RLA_UPDATE

Syntax

LSB_RLA_UPDATE=time_seconds

Description

Specifies how often the HPC scheduler refreshes free node information from the LSF topology adapter (RLA).

Default

600 seconds

LSB_RLA_WORKDIR

Syntax

LSB_RLA_WORKDIR=directory

Description

Directory to store the LSF topology adapter (RLA) status file. Allows RLA to recover its original state when it restarts. When RLA first starts, it creates the directory defined by LSB_RLA_WORKDIR if it does not exist, then creates subdirectories for each host.

You should avoid using /tmp or any other directory that is automatically cleaned up by the system. Unless your installation has restrictions on the LSB_SHAREDIR directory, you should use the default for LSB_RLA_WORKDIR.

Default

LSB_SHAREDIR/cluster_name/rla_workdir

LSB_SACCT_ONE_UG

Syntax

LSB_SACCT_ONE_UG=y | Y | n | N

Description

When set to Y, minimizes overall memory usage of mbatchd during fairshare accounting at job submission by limiting the number of share account nodes created on mbatchd startup. Most useful when there are a lot of user groups with all members in the fairshare policy.

When a default user group is defined, inactive user share accounts are still defined for the default user group.

When setting this parameter, you must restart the mbatchd.

Default

N

LSB_SBD_PORT

See LSF_LIM_PORT, LSF_RES_PORT, LSB_MBD_PORT, LSB_SBD_PORT.

LSB_SET_TMPDIR

Syntax

LSB_SET_TMPDIR=y | n

If y, LSF sets the TMPDIR environment variable, overwriting the current value with /tmp/job_ID.tmpdir.

Default

n

LSB_SHAREDIR

Syntax

LSB_SHAREDIR=directory

Description

Directory in which the job history and accounting logs are kept for each cluster. These files are necessary for correct operation of the system. Like the organization under LSB_CONFDIR, there is one subdirectory for each cluster.

The LSB_SHAREDIR directory must be owned by the LSF administrator. It must be accessible from all hosts that can potentially become the master host, and must allow read and write access from the master host.

The LSB_SHAREDIR directory typically resides on a reliable file server.

Default

LSF_INDEP/work

See also

LSB_LOCALDIR

LSB_SHORT_HOSTLIST

Syntax

LSB_SHORT_HOSTLIST=1

Description

Displays an abbreviated list of hosts in bjobs and bhist for a parallel job where multiple processes of a job are running on a host. Multiple processes are displayed in the following format:
processes*hostA
For example, if a parallel job is running 5 processes on hostA, the information is displayed in the following manner:
5*hostA

Setting this parameter may improve mbatchd restart performance and accelerate event replay.

Default

Not defined

LSB_SIGSTOP

Syntax

LSB_SIGSTOP=signal_name | signal_value

Description

Specifies the signal sent by the SUSPEND action in LSF. You can specify a signal name or a number.

If this parameter is not defined, by default the SUSPEND action in LSF sends the following signals to a job:
  • Parallel or interactive jobs: SIGTSTP is sent to allow user programs to catch the signal and clean up. The parallel job launcher also catches the signal and stops the entire job (task by task for parallel jobs). Once LSF sends SIGTSTP, LSF assumes the job is stopped.

  • Other jobs: SIGSTOP is sent. SIGSTOP cannot be caught by user programs. The same set of signals is not supported on all UNIX systems. To display a list of the symbolic names of the signals (without the SIG prefix) supported on your system, use the kill -l command.

Example

LSB_SIGSTOP=SIGKILL

In this example, the SUSPEND action sends the three default signals sent by the TERMINATE action (SIGINT, SIGTERM, and SIGKILL) 10 seconds apart.

Default

Not defined. Default SUSPEND action in LSF is sent.

LSB_SSH_XFORWARD_CMD

Syntax

LSB_SSH_XFORWARD_CMD=[/path[/path]]ssh command [ssh options]

Description

Optional when submitting jobs with SSH X11 forwarding. Allows you to specify an SSH command and options when a job is submitted with -XF.

Replace the default value with an SSH command (full PATH and options allowed).

When running a job with the -XF option, runs the SSH command specified here.

Default

ssh -X -n

LSB_STDOUT_DIRECT

Syntax

LSB_STDOUT_DIRECT=y | Y

Description

When set, and used with the -o or -e options of bsub, redirects standard output or standard error from the job directly to a file as the job runs.

If LSB_STDOUT_DIRECT is not set and you use the bsub -o option, the standard output of a job is written to a temporary file and copied to the file you specify after the job finishes.

LSB_STDOUT_DIRECT is not supported on Windows.

Default

Not defined

LSB_STOP_IGNORE_IT

Usage

LSB_STOP_IGNORE_IT= Y | y

Description

Allows a solitary job to be stopped regardless of the idle time (IT) of the host that the job is running on. By default, if only one job is running on a host, the host idle time must be zero in order to stop the job.

Default

Not defined

LSB_SUB_COMMANDNAME

Syntax

LSB_SUB_COMMANDNAME=y | Y

Description

If set, enables esub to use the variable LSB_SUB_COMMAND_LINE in the esub job parameter file specified by the $LSB_SUB_PARM_FILE environment variable.

The LSB_SUB_COMMAND_LINE variable carries the value of the bsub command argument, and is used when esub runs.

Example

esub contains:
#!/bin/sh . $LSB_SUB_PARM_FILE exec 1>&2 if [ $LSB_SUB_COMMAND_LINE="netscape" ]; then echo "netscape is not allowed to run in batch mode" exit $LSB_SUB_ABORT_VALUE fi
LSB_SUB_COMMAND_LINE is defined in $LSB_SUB_PARM_FILE as:
LSB_SUB_COMMAND_LINE=netscape

A job submitted with:

bsub netscape ...
Causes esub to echo the message:
netscape is not allowed to run in batch mode

Default

Not defined

See also

LSB_SUB_COMMAND_LINE and LSB_SUB_PARM_FILE environment variables

LSB_SUBK_SHOW_EXEC_HOST

Syntax

LSB_SUBK_SHOW_EXEC_HOST=Y | N

Description

When enabled, displays the execution host in the output of the command bsub -K. If the job runs on multiple hosts, only the first execution host is shown.

In a MultiCluster environment, this parameter must be set in both clusters.

Tip:

Restart sbatchd on the execution host to make changes take effect.

Default

N

LSB_TIME_CMD

Syntax

LSB_TIME_CMD=timimg_level

Description

The timing level for checking how long batch commands run.

Time usage is logged in milliseconds; specify a positive integer.

Example: LSB_TIME_CMD=1

Default

Not defined

See also

LSB_TIME_MBD, LSB_TIME_SBD, LSF_TIME_LIM, LSF_TIME_RES

LSB_TIME_MBD

Syntax

LSB_TIME_MBD=timing_level

Description

The timing level for checking how long mbatchd routines run.

Time usage is logged in milliseconds; specify a positive integer.

Example: LSB_TIME_MBD=1

Default

Not defined

See also

LSB_TIME_CMD, LSB_TIME_SBD, LSF_TIME_LIM, LSF_TIME_RES

LSB_TIME_RESERVE_NUMJOBS

Syntax

LSB_TIME_RESERVE_NUMJOBS=maximum_reservation_jobs

Description

Enables time-based slot reservation. The value must be positive integer.

LSB_TIME_RESERVE_NUMJOBS controls maximum number of jobs using time-based slot reservation. For example, if LSB_TIME_RESERVE_NUMJOBS=4, only the top 4 jobs get their future allocation information.

Use LSB_TIME_RESERVE_NUMJOBS=1 to allow only the highest priority job to get accurate start time prediction.

Recommended value

3 or 4 is the recommended setting. Larger values are not as useful because after the first pending job starts, the estimated start time of remaining jobs may be changed.

Default

Not defined

LSB_TIME_SBD

Syntax

LSB_TIME_SBD=timing_level

Description

The timing level for checking how long sbatchd routines run.

Time usage is logged in milliseconds; specify a positive integer.

Example: LSB_TIME_SBD=1

Default

Not defined

See also

LSB_TIME_CMD, LSB_TIME_MBD, LSF_TIME_LIM, LSF_TIME_RES

LSB_TIME_SCH

Syntax

LSB_TIME_SCH=timing_level

Description

The timing level for checking how long mbschd routines run.

Time usage is logged in milliseconds; specify a positive integer.

Example: LSB_TIME_SCH=1

Default

Not defined

LSB_UTMP

Syntax

LSB_UTMP=y | Y

Description

If set, enables registration of user and account information for interactive batch jobs submitted with bsub -Ip or bsub -Is. To disable utmp file registration, set LSB_UTMP to any value other than y or Y; for example, LSB_UTMP=N.

LSF registers interactive batch jobs the job by adding a entries to the utmp file on the execution host when the job starts. After the job finishes, LSF removes the entries for the job from the utmp file.

Limitations

Registration of utmp file entries is supported on the following platforms:
  • SGI IRIX (6.4 and later)

  • Solaris (all versions)

  • HP-UX (all versions)

  • Linux (all versions)

utmp file registration is not supported in a MultiCluster environment.

Because interactive batch jobs submitted with bsub -I are not associated with a pseudo-terminal, utmp file registration is not supported for these jobs.

Default

Not defined

LSF_AFS_CELLNAME

Syntax

LSF_AFS_CELLNAME=AFS_cell_name

Description

Must be defined to AFS cell name if the AFS file system is in use.

Example:
LSF_AFS_CELLNAME=xxx.ch

Default

Not defined

LSF_AM_OPTIONS

Syntax

LSF_AM_OPTIONS=AMFIRST | AMNEVER

Description

Determines the order of file path resolution when setting the user’s home directory.

This variable is rarely used but sometimes LSF does not properly change the directory to the user’s home directory when the user’s home directory is automounted. Setting LSF_AM_OPTIONS forces LSF to change directory to $HOME before attempting to automount the user’s home.

When this parameter is not defined or set to AMFIRST, LSF, sets the user’s $HOME directory from the automount path. If it cannot do so, LSF sets the user’s $HOME directory from the passwd file.

When this parameter is set to AMNEVER, LSF, never uses automount to set the path to the user’s home. LSF sets the user’s $HOME directory directly from the passwd file.

Valid values

The two values are AMFIRST and AMNEVER

Default

Same as AMFIRST

LSF_API_CONNTIMEOUT

Syntax

LSF_API_CONNTIMEOUT=time_seconds

Description

Timeout when connecting to LIM.

EGO parameter

EGO_LIM_CONNTIMEOUT

Default

5

See also

LSF_API_RECVTIMEOUT

LSF_API_RECVTIMEOUT

Syntax

LSF_API_RECVTIMEOUT=time_seconds

Description

Timeout when receiving a reply from LIM.

EGO parameter

EGO_LIM_RECVTIMEOUT

Default

20

See also

LSF_API_CONNTIMEOUT

LSF_AUTH

Syntax

LSF_AUTH=eauth | ident

Description

Enables either external authentication or authentication by means of identification daemons. This parameter is required for any cluster that contains Windows hosts, and is optional for UNIX-only clusters. After defining or changing the value of LSF_AUTH, you must shut down and restart the LSF daemons on all server hosts to apply the new authentication method.
eauth

For site-specific customized external authentication. Provides the highest level of security of all LSF authentication methods.

ident

For authentication using the RFC 931/1413/1414 protocol to verify the identity of the remote client. If you want to use ident authentication, you must download and install the ident protocol, available from the public domain, and register ident as required by your operating system.

For UNIX-only clusters, privileged ports authentication (setuid) can be configured by commenting out or deleting the LSF_AUTH parameter. If you choose privileged ports authentication, LSF commands must be installed as setuid programs owned by root. If the commands are installed in an NFS-mounted shared file system, the file system must be mounted with setuid execution allowed, that is, without the nosuid option.
Restriction:

To enable privileged ports authentication, LSF_AUTH must not be defined; setuid is not a valid value for LSF_AUTH.

Default

eauth

During LSF installation, a default eauth executable is installed in the directory specified by the parameter LSF_SERVERDIR in the lsf.conf file. The default executable provides an example of how the eauth protocol works. You should write your own eauth executable to meet the security requirements of your cluster.

LSF_ASPLUGIN

Syntax

LSF_ASPLUGIN=path

Description

Points to the SGI Array Services library libarray.so. The parameter only takes effect on 64-bit x-86 Linux 2.6, glibc 2.3.

Default

/usr/lib64/libarray.so

LSF_AUTH_DAEMONS

Syntax

LSF_AUTH_DAEMONS=y | Y

Description

Enables LSF daemon authentication when external authentication is enabled (LSF_AUTH=eauth in the file lsf.conf). Daemons invoke eauth to authenticate each other as specified by the eauth executable.

Default

Not defined.

LSF_BINDIR

Syntax

LSF_BINDIR=directory

Description

Directory in which all LSF user commands are installed.

Default

LSF_MACHDEP/bin

LSF_BIND_JOB

Syntax

LSF_BIND_JOB=NONE | BALANCE | PACK | ANY | USER | USER_CPU_LIST

Description

Specifies the processor binding policy for sequential and parallel job processes that run on a single host.

On Linux execution hosts that support this feature, job processes are hard bound to selected processors.

If processor binding feature is not configured with the BIND_JOB parameter in an application profile in lsb.applications, the lsf.conf configuration setting takes effect. The application profile configuration for processor binding overrides the lsf.conf configuration.

For backwards compatibility:
  • LSF_BIND_JOB=Y is interpreted as LSF_BIND_JOB=BALANCE

  • LSF_BIND_JOB=N is interpreted as LSF_BIND_JOB=NONE

Supported platforms

Linux with kernel version 2.6 or higher

Default

Not defined. Processor binding is disabled.

LSF_BMPLUGIN

Syntax

LSF_BMPLUGIN=path

Description

Points to the bitmask library libbitmask.so. The parameter only takes effect on 64-bit x-86 Linux 2.6, glibc 2.3.

Default

/usr/lib64/libbitmask.so

LSF_CMD_LOGDIR

Syntax

LSF_CMD_LOGDIR=path

Description

The path to the log files used for debugging LSF commands.

This parameter can also be set from the command line.

Default

/tmp

See also

LSB_CMD_LOG_MASK, LSB_CMD_LOGDIR, LSB_DEBUG, LSB_DEBUG_CMD, LSB_TIME_CMD, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR, LSF_TIME_CMD

LSF_CMD_LOG_MASK

Syntax

LSF_CMD_LOG_MASK=log_level

Description

Specifies the logging level of error messages from LSF commands.

For example:
LSF_CMD_LOG_MASK=LOG_DEBUG

To specify the logging level of error messages, use LSB_CMD_LOG_MASK. To specify the logging level of error messages for LSF daemons, use LSF_LOG_MASK.

LSF commands log error messages in different levels so that you can choose to log all messages, or only log messages that are deemed critical. The level specified by LSF_CMD_LOG_MASK determines which messages are recorded and which are discarded. All messages logged at the specified level or higher are recorded, while lower level messages are discarded.

For debugging purposes, the level LOG_DEBUG contains the fewest number of debugging messages and is used for basic debugging. The level LOG_DEBUG3 records all debugging messages, and can cause log files to grow very large; it is not often used. Most debugging is done at the level LOG_DEBUG2.

The commands log to the syslog facility unless LSF_CMD_LOGDIR is set.

Valid values

The log levels from highest to lowest are:
  • LOG_EMERG

  • LOG_ALERT

  • LOG_CRIT

  • LOG_ERR

  • LOG_WARNING

  • LOG_NOTICE

  • LOG_INFO

  • LOG_DEBUG

  • LOG_DEBUG1

  • LOG_DEBUG2

  • LOG_DEBUG3

Default

LOG_WARNING

See also

LSB_CMD_LOG_MASK, LSB_CMD_LOGDIR, LSB_DEBUG, LSB_DEBUG_CMD, LSB_TIME_CMD, LSB_CMD_LOGDIR, LSF_LOG_MASK, LSF_LOGDIR, LSF_TIME_CMD

LSF_CONF_RETRY_INT

Syntax

LSF_CONF_RETRY_INT=time_seconds

Description

The number of seconds to wait between unsuccessful attempts at opening a configuration file (only valid for LIM). This allows LIM to tolerate temporary access failures.

EGO parameter

EGO_CONF_RETRY_INT

Default

30

See also

LSF_CONF_RETRY_MAX

LSF_CONF_RETRY_MAX

Syntax

LSF_CONF_RETRY_MAX=integer

Description

The maximum number of retry attempts by LIM to open a configuration file. This allows LIM to tolerate temporary access failures. For example, to allow one more attempt after the first attempt has failed, specify a value of 1.

EGO parameter

EGO_CONF_RETRY_MAX

Default

0

See also

LSF_CONF_RETRY_INT

LSF_CONFDIR

Syntax

LSF_CONFDIR=directory

Description

Directory in which all LSF configuration files are installed. These files are shared throughout the system and should be readable from any host. This directory can contain configuration files for more than one cluster.

The files in the LSF_CONFDIR directory must be owned by the primary LSF administrator, and readable by all LSF server hosts.

If live reconfiguration through the bconf command is enabled by the parameter LSF_LIVE_CONFDIR, configuration files are written to and read from the directory set by LSF_LIVE_CONFDIR.

Default

LSF_INDEP/conf

See also

LSB_CONFDIR, LSF_LIVE_CONFDIR

LSF_CPUSETLIB

Syntax

LSF_CPUSETLIB=path

Description

Points to the SGI cpuset library libcpuset.so. The parameter only takes effect on 64-bit x-86 Linux 2.6, glibc 2.3.

Default

/usr/lib64/libcpuset.so

LSF_CRASH_LOG

Syntax

LSF_CRASH_LOG=Y | N

Description

On Linux hosts only, enables logging when or if a daemon crashes. Relies on the Linux debugger (gdb). Two log files are created, one for the root daemons (res, lim, sbd, and mbatchd) in /tmp/lsf_root_daemons_crash.log and one for administrative daemons (mbschd) in /tmp/lsf_admin_daemons_crash.log.

File permissions for both files are 600.

If enabling, you must restart the daemons for the change to take effect.

Default

N (no log files are created for daemon crashes)

LSF_DAEMONS_CPUS

Syntax

LSF_DAEMONS_CPUS="mbatchd_cpu_list:mbschd_cpu_list"

mbatchd_cpu_list

Defines the list of master host CPUS where the mbatchd daemon processes can run (hard CPU affinity). Format the list as a white-space delimited list of CPU numbers.

mbschd_cpu_list

Defines the list of master host CPUS where the mbschd daemon processes can run. Format the list as a white-space delimited list of CPU numbers.

Description

By default, mbatchd and mbschd can run on any CPUs. If LSF_DAEMONS_CPUS is set, they only run on a specified list of CPUs. An empty list means LSF daemons can run on any CPUs. Use spaces to separate multiple CPUs.

The operating system can assign other processes to run on the same CPU; however, if utilization of the bound CPU is lower than utilization of the unbound CPUs.

Related parameters

To improve scheduling and dispatch performance of all LSF daemons, you should use LSF_DAEMONS_CPUS together with EGO_DAEMONS_CPUS (in ego.conf or lsf.conf), which controls LIM CPU allocation, and MBD_QUERY_CPUS, which binds mbactchd query processes to specific CPUs so that higher priority daemon processes can run more efficiently. To get best performance, CPU allocation for all four daemons should be assigned their own CPUs. For example, on a 4 CPU SMP host, the following configuration gives the best performance:
EGO_DAEMONS_CPUS=0 LSF_DAEMONS_CPUS=1:2 MBD_QUERY_CPUS=3

Examples

If you specify
LSF_DAEMONS_CPUS="1:2"

the mbatchd processes run only on CPU number 1 on the master host, and mbschd run on only on CPU number 2.

If you specify
LSF_DAEMONS_CPUS="1 2:1 2" 

both mbatchd and mbschd run CPU 1 and CPU 2.

Important

You can specify CPU affinity only for master hosts that use one of the following operating systems:
  • Linux 2.6 or higher

  • Solaris 8 or higher

EGO parameter

LSF_DAEMONS_CPUS=lim_cpu_list: run the EGO LIM daemon on the specified CPUs.

Default

Not defined

See also

MBD_QUERY_CPUS in lsb.params

LSF_DAEMON_WRAP

Syntax

LSF_DAEMON_WRAP=y | Y

Description

Applies to Kerberos, DCE/DFS and AFS environments; if you are using LSF with DCE, AFS, or Kerberos, set this parameter to y or Y.

When this parameter is set to y or Y, mbatchd, sbatchd, and RES run the executable daemons.wrap located in LSF_SERVERDIR.

Default

Not defined. LSF does not run the daemons.wrap executable.

LSF_DEBUG_CMD

Syntax

LSF_DEBUG_CMD=log_class

Description

Sets the debugging log class for LSF commands and APIs.

Specifies the log class filtering to be applied to LSF commands or the API. Only messages belonging to the specified log class are recorded.

LSF_DEBUG_CMD sets the log class and is used in combination with LSF_CMD_LOG_MASK, which sets the log level. For example:
LSF_CMD_LOG_MASK=LOG_DEBUG LSF_DEBUG_CMD="LC_TRACE LC_EXEC" 

Debugging is turned on when you define both parameters.

The daemons log to the syslog facility unless LSF_CMD_LOGDIR is defined.

To specify multiple log classes, use a space-separated list enclosed by quotation marks. For example:
LSF_DEBUG_CMD="LC_TRACE LC_EXEC"

Can also be defined from the command line.

Valid values

Valid log classes are:
  • LC_AFS and LC2_AFS: Log AFS messages

  • LC_AUTH and LC2_AUTH: Log authentication messages

  • LC_CHKPNT and LC2_CHKPNT: Log checkpointing messages

  • LC_COMM and LC2_COMM: Log communication messages

  • LC_DCE and LC2_DCE: Log messages pertaining to DCE support

  • LC_EEVENTD and LC2_EEVENTD: Log eeventd messages

  • LC_ELIM and LC2_ELIM: Log ELIM messages

  • LC_EXEC and LC2_EXEC: Log significant steps for job execution

  • LC_FAIR - Log fairshare policy messages

  • LC_FILE and LC2_FILE: Log file transfer messages

  • LC_HANG and LC2_HANG: Mark where a program might hang

  • LC_JARRAY and LC2_JARRAY: Log job array messages

  • LC_JLIMIT and LC2_JLIMIT: Log job slot limit messages

  • LC_LICENSE and LC2_LICENSE : Log license management messages (LC_LICENCE is also supported for backward compatibility)

  • LC_LOADINDX and LC2_LOADINDX: Log load index messages

  • LC_M_LOG and LC2_M_LOG: Log multievent logging messages

  • LC_MPI and LC2_MPI: Log MPI messages

  • LC_MULTI and LC2_MULTI: Log messages pertaining to MultiCluster

  • LC_PEND and LC2_PEND: Log messages related to job pending reasons

  • LC_PERFM and LC2_PERFM: Log performance messages

  • LC_PIM and LC2_PIM: Log PIM messages

  • LC_PREEMPT and LC2_PREEMPT: Log preemption policy messages

  • LC_RESREQ and LC2_RESREQ: Log resource requirement messages

  • LC_SIGNAL and LC2_SIGNAL: Log messages pertaining to signals

  • LC_SYS and LC2_SYS: Log system call messages

  • LC_TRACE and LC2_TRACE: Log significant program walk steps

  • LC_XDR and LC2_XDR: Log everything transferred by XDR

Default

Not defined

See also

LSF_CMD_LOG_MASK, LSF_CMD_LOGDIR, LSF_DEBUG_LIM, LSF_DEBUG_RES, LSF_LIM_PORT, LSF_RES_PORT, LSB_MBD_PORT, LSB_SBD_PORT, LSF_LOGDIR, LSF_LIM_DEBUG, LSF_RES_DEBUG

LSF_DEBUG_LIM

Syntax

LSF_DEBUG_LIM=log_class

Description

Sets the log class for debugging LIM.

Specifies the log class filtering to be applied to LIM. Only messages belonging to the specified log class are recorded.

The LSF_DEBUG_LIM sets the log class and is used in combination with EGO_LOG_MASK in ego,conf, which sets the log level.

For example, in ego.conf:
EGO_LOG_MASK=LOG_DEBUG
and in lsf.conf:
LSF_DEBUG_LIM=LC_TRACE 
Important:

If EGO is enabled, LSF_LOG_MASK no longer specifies LIM logging level. Use EGO_LOG_MASK in ego.conf to control message logging for LIM. The default value for EGO_LOG_MASK is LOG_WARNING.

You need to restart the daemons after setting LSF_DEBUG_LIM for your changes to take effect.

If you use the command lsadmin limdebug to temporarily change this parameter without changing lsf.conf, you do not need to restart the daemons.

To specify multiple log classes, use a space-separated list enclosed in quotation marks. For example:
LSF_DEBUG_LIM="LC_TRACE LC_EXEC"

This parameter can also be defined from the command line.

Valid values

Valid log classes are:
  • LC_AFS and LC2_AFS: Log AFS messages

  • LC_AUTH and LC2_AUTH: Log authentication messages

  • LC_CHKPNT - log checkpointing messages

  • LC_COMM and LC2_COMM: Log communication messages

  • LC_DCE and LC2_DCE: Log messages pertaining to DCE support

  • LC_EXEC and LC2_EXEC: Log significant steps for job execution

  • LC_FILE and LC2_FILE: Log file transfer messages

  • LC_HANG and LC2_HANG: Mark where a program might hang

  • LC_JGRP - Log job group messages

  • LC_LICENSE and LC2_LICENSE : Log license management messages (LC_LICENCE is also supported for backward compatibility)

  • LC_LICSCHED - Log License Scheduler messages

  • LC_MEMORY - Log memory limit messages

  • LC_MULTI and LC2_MULTI: Log messages pertaining to MultiCluster

  • LC_PIM and LC2_PIM: Log PIM messages

  • LC_RESOURCE - Log resource broker messages

  • LC_SIGNAL and LC2_SIGNAL: Log messages pertaining to signals

  • LC_TRACE and LC2_TRACE: Log significant program walk steps

  • LC_XDR and LC2_XDR: Log everything transferred by XDR

EGO parameter

EGO_DEBUG_LIM

Default

Not defined

See also

LSF_DEBUG_RES, LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR

LSF_DEBUG_RES

Syntax

LSF_DEBUG_RES=log_class

Description

Sets the log class for debugging RES.

Specifies the log class filtering to be applied to RES. Only messages belonging to the specified log class are recorded.

LSF_DEBUG_RES sets the log class and is used in combination with LSF_LOG_MASK, which sets the log level. For example:
LSF_LOG_MASK=LOG_DEBUG LSF_DEBUG_RES=LC_TRACE 
To specify multiple log classes, use a space-separated list enclosed in quotation marks. For example:
LSF_DEBUG_RES="LC_TRACE LC_EXEC"

You need to restart the daemons after setting LSF_DEBUG_RES for your changes to take effect.

If you use the command lsadmin resdebug to temporarily change this parameter without changing lsf.conf, you do not need to restart the daemons.

Valid values

For a list of valid log classes see LSF_DEBUG_LIM

Default

Not defined

See also

LSF_DEBUG_LIM, LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR

LSF_DHCP_ENV

Syntax

LSF_DHCP_ENV=y

Description

If defined, enables dynamic IP addressing for all LSF client hosts in the cluster.

Dynamic IP addressing is not supported across clusters in a MultiCluster environment.

If you set LSF_DHCP_ENV, you must also specify LSF_DYNAMIC_HOST_WAIT_TIME in order for hosts to rejoin a cluster after their IP address changes.
Tip:

After defining or changing this parameter, you must run lsadmin reconfig and badmin mbdrestart to restart all LSF daemons.

EGO parameter

EGO_DHCP_ENV

Default

Not defined

See also

LSF_DYNAMIC_HOST_WAIT_TIME

LSF_DISABLE_LSRUN

Syntax

LSF_DISABLE_LSRUN=y | Y

Description

When defined, RES refuses remote connections from lsrun and lsgrun unless the user is either an LSF administrator or root. For remote execution by root, LSF_ROOT_REX must be defined.

Other remote execution commands, such as ch and lsmake are not affected.

Default

Not defined

LSF_DISPATCHER_LOGDIR

Syntax

LSF_DISPATCHER_LOGDIR=path

Description

Specifies the path to the log files for slot allocation decisions for queue-based fairshare.

If defined, LSF writes the results of its queue-based fairshare slot calculation to the specified directory. Each line in the file consists of a timestamp for the slot allocation and the number of slots allocated to each queue under its control. LSF logs in this file every minute. The format of this file is suitable for plotting with gnuplot.

Example

# clients managed by LSF 
# Roma # Verona # Genova # Pisa # Venezia # Bologna
15/3      19:4:50   0 0 0 0 0 0  
15/3      19:5:51   8 5 2 5 2 0  
15/3      19:6:51   8 5 2 5 5 1  
15/3      19:7:53   8 5 2 5 5 5  
15/3      19:8:54   8 5 2 5 5 0  
15/3      19:9:55   8 5 0 5 4 2 

The queue names are in the header line of the file. The columns correspond to the allocations per each queue.

Default

Not defined

LSF_DUALSTACK_PREFER_IPV6

Syntax

LSF_DUALSTACK_PREFER_IPV6=Y | y

Description

Define this parameter when you want to ensure that clients and servers on dual-stack hosts use IPv6 addresses only. Setting this parameter configures LSF to sort the dynamically created address lookup list in order of AF_INET6 (IPv6) elements first, followed by AF_INET (IPv4) elements, and then others.
Restriction:

IPv4-only and IPv6-only hosts cannot belong to the same cluster. In a MultiCluster environment, you cannot mix IPv4-only and IPv6-only clusters.

Follow these guidelines for using IPv6 addresses within your cluster:
  • Define this parameter only if your cluster
    • Includes only dual-stack hosts, or a mix of dual-stack and IPv6-only hosts, and

    • Does not include IPv4-only hosts or IPv4 servers running on dual-stack hosts (servers prior to LSF version 7)

    Important:

    Do not define this parameter for any other cluster configuration.

  • Within a MultiCluster environment, do not define this parameter if any cluster contains IPv4-only hosts or IPv4 servers (prior to LSF version 7) running on dual-stack hosts.

  • Applications must be engineered to work with the cluster IP configuration.

  • If you use IPv6 addresses within your cluster, ensure that you have configured the dual-stack hosts correctly. For more detailed information, see Administering Platform LSF.

  • Define the parameter LSF_ENABLE_SUPPORT_IPV6 in lsf.conf.

Default

Not defined. LSF sorts the dynamically created address lookup list in order of AF_INET (IPv4) elements first, followed by AF_INET6 (IPv6) elements, and then others. Clients and servers on dual-stack hosts use the first address lookup structure in the list (IPv4).

See also

LSF_ENABLE_SUPPORT_IPV6

LSF_DYNAMIC_HOST_TIMEOUT

Syntax

LSF_DYNAMIC_HOST_TIMEOUT=time_hours

LSF_DYNAMIC_HOST_TIMEOUT=time_minutesm|M

Description

Enables automatic removal of dynamic hosts from the cluster and specifies the timeout value (minimum 10 minutes). To improve performance in very large clusters, you should disable this feature and remove unwanted hosts from the hostcache file manually.

Specifies the length of time a dynamic host is unavailable before the master host removes it from the cluster. Each time LSF removes a dynamic host, mbatchd automatically reconfigures itself.

Valid value

The timeout value must be greater than or equal to 10 minutes.

Values below 10 minutes are set to the minimum allowed value 10 minutes; values above 100 hours are set to the maximum allowed value 100 hours.

Example

LSF_DYNAMIC_HOST_TIMEOUT=60

A dynamic host is removed from the cluster when it is unavailable for 60 hours.

LSF_DYNAMIC_HOST_TIMEOUT=60m

A dynamic host is removed from the cluster when it is unavailable for 60 minutes.

EGO parameter

EGO_DYNAMIC_HOST_TIMEOUT

Default

-1 (Not defined.) Unavailable hosts are never removed from the cluster.

LSF_DYNAMIC_HOST_WAIT_TIME

Syntax

LSF_DYNAMIC_HOST_WAIT_TIME=time_seconds

Description

Defines the length of time in seconds that a dynamic host waits communicating with the master LIM to either add the host to the cluster or to shut down any running daemons if the host is not added successfully.
Note:
To enable dynamically added hosts, the following parameters must be defined:
  • LSF_MASTER_LIST in lsf.conf

  • LSF_DYNAMIC_HOST_WAIT_TIME in lsf.conf, or EGO_DYNAMIC_HOST_WAIT_TIME in ego.conf

  • LSF_HOST_ADDR_RANGE in lsf.cluster.cluster_name

Note:
To enable daemons to be shut down automatically for hosts that attempted to join the cluster but were rejected within the LSF_DYNAMIC_HOST_WAIT_TIME period:
  • EGO_ENABLE_AUTO_DAEMON_SHUTDOWN in lsf.conf or in ego.conf.

Recommended value

An integer greater than zero, up to 60 seconds for every 1000 hosts in the cluster, for a maximum of 15 minutes. Selecting a smaller value results in a quicker response time for hosts at the expense of an increased load on the master LIM.

Example

LSF_DYNAMIC_HOST_WAIT_TIME=60

A host waits 60 seconds from startup to send a request for the master LIM to add it to the cluster or to shut down any daemons if it is not added to the cluster.

EGO parameter

EGO_DYNAMIC_HOST_WAIT_TIME

Default

Not defined. Dynamic hosts cannot join the cluster.

LSF_EGO_DAEMON_CONTROL

Syntax

LSF_EGO_DAEMON_CONTROL="Y" | "N"

Description

Enables EGO Service Controller to control LSF res and sbatchd startup. Set the value to "Y" if you want EGO Service Controller to start res and sbatchd, and restart them if they fail.

To configure this parameter at installation, set EGO_DAEMON_CONTROL in install.config so that res and sbatchd start automatically as EGO services.

If LSF_ENABLE_EGO="N", this parameter is ignored and EGO Service Controller is not started.

If you manually set EGO_DAEMON_CONTROL=Y after installation, you must configure LSF res and sbatchd startup to AUTOMATIC in the EGO configuration files res.xml and sbatchd.xml under EGO_ESRVDIR/esc/conf/services.

To avoid conflicts with existing LSF startup scripts, do not set this parameter to "Y" if you use a script (for example in /etc/rc or /etc/inittab) to start LSF daemons. If this parameter is not defined in install.config file, it takes default value of "N".

Important:

After installation, LSF_EGO_DAEMON_CONTROL alone does not change the start type for the sbatchd and res EGO services to AUTOMATIC in res.xml and sbatchd.xml under EGO_ESRVDIR/esc/conf/services. You must edit these files and set the <sc:StartType> parameter to AUTOMATIC.

Example

LSF_EGO_DAEMON_CONTROL="N"

Default

N (res and sbatchd are started manually or through operating system rc facility)

LSF_EGO_ENVDIR

Syntax

LSF_EGO_ENVDIR=directory

Description

Directory where all Platform EGO configuration files are installed. These files are shared throughout the system and should be readable from any host.

If LSF_ENABLE_EGO="N", this parameter is ignored and ego.conf is not loaded.

Default

LSF_CONFDIR/ego/cluster_name/kernel. If not defined, or commented out, /etc is assumed.

LSF_ENABLE_CSA

Syntax

LSF_ENABLE_CSA=y | Y

Description

If set, enables LSF to write records for LSF jobs to SGI IRIX Comprehensive System Accounting facility (CSA).

CSA writes an accounting record for each process in the pacct file, which is usually located in the /var/adm/acct/day directory. IRIX system administrators then use the csabuild command to organize and present the records on a job by job basis.

When LSF_ENABLE_CSA is set, for each job run on the IRIX system, LSF writes an LSF-specific accounting record to CSA when the job starts, and when the job finishes. LSF daemon accounting in CSA starts and stops with the LSF daemon.

To disable IRIX CSA accounting, remove LSF_ENABLE_CSA from lsf.conf.

See the IRIX resource administration documentation for information about CSA.

Set up IRIX CSA

  1. Define the LSF_ENABLE_CSA parameter in lsf.conf:
    ... LSF_ENABLE_CSA=Y ...
  2. Set the following parameters in /etc/csa.conf to on:
    • CSA_START

    • WKMG_START

  3. Run the csaswitch command to turn on the configuration changes in /etc/csa.conf.
    Note:

    See the IRIX resource administration documentation for information about the csaswitch command.

Information written to the pacct file

LSF writes the following records to the pacct file when a job starts and when it exits:
  • Job record type (job start or job exit)

  • Current system clock time

  • Service provider (LSF)

  • Submission time of the job (at job start only)

  • User ID of the job owner

  • Array Session Handle (ASH) of the job

  • IRIX job ID

  • IRIX project ID

  • LSF job name if it exists

  • Submission host name

  • LSF queue name

  • LSF external job ID

  • LSF job array index

  • LSF job exit code (at job exit only)

  • NCPUS :  number of CPUs the LSF job has been using

Default

Not defined

LSF_ENABLE_DUALCORE

Syntax

LSF_ENABLE_DUALCORE=y | n

Description

Enables job scheduling based on dual-core information for a host. If yes (Y), LSF scheduling policies use the detected number of cores as the number of physical processors on the host instead of the number of dual-core chips for job scheduling. For a dual-core host, lshosts shows the number of cores under ncpus instead of the number of chips.

IF LSF_ENABLE_DUALCORE=n, then lshosts shows the number of processor chips under ncpus.

EGO parameter

EGO_ENABLE_DUALCORE

Default

N

LSF_ENABLE_EGO

Syntax

LSF_ENABLE_EGO="Y" | "N"

Description

Enables Platform EGO functionality in the LSF cluster.

If you set LSF_ENABLE_EGO="Y", you must set or uncomment LSF_EGO_ENVDIR in lsf.conf.

If you set LSF_ENABLE_EGO="N" you must remove or comment out LSF_EGO_ENVDIR in lsf.conf.

Set the value to "N" if you do not want to take advantage of the following LSF features that depend on EGO:
  • LSF daemon control by EGO Service Controller

  • EGO-enabled SLA scheduling

Important:

After changing the value of LSF_ENABLE_EGO, you must shut down and restart the cluster.

Default

Y (EGO is enabled in the LSF cluster)

LSF_ENABLE_EXTSCHEDULER

Syntax

LSF_ENABLE_EXTSCHEDULER=y | Y

Description

If set, enables mbatchd external scheduling for LSF HPC features.

Default

Not defined

LSF_ENABLE_SUPPORT_IPV6

Syntax

LSF_ENABLE_SUPPORT_IPV6=y | Y

Description

If set, enables the use of IPv6 addresses in addition to IPv4.

Default

Not defined

See also

LSF_DUALSTACK_PREFER_IPV6

LSF_ENVDIR

Syntax

LSF_ENVDIR=directory

Description

Directory containing the lsf.conf file.

By default, lsf.conf is installed by creating a shared copy in LSF_CONFDIR and adding a symbolic link from /etc/lsf.conf to the shared copy. If LSF_ENVDIR is set, the symbolic link is installed in LSF_ENVDIR/lsf.conf.

The lsf.conf file is a global environment configuration file for all LSF services and applications. The LSF default installation places the file in LSF_CONFDIR.

Default

/etc

LSF_EVENT_PROGRAM

Syntax

LSF_EVENT_PROGRAM=event_program_name

Description

Specifies the name of the LSF event program to use.

If a full path name is not provided, the default location of this program is LSF_SERVERDIR.

If a program that does not exist is specified, event generation does not work.

If this parameter is not defined, the default name is genevent on UNIX, and genevent.exe on Windows.

Default

Not defined

LSF_EVENT_RECEIVER

Syntax

LSF_EVENT_RECEIVER=event_receiver_program_name

Description

Specifies the LSF event receiver and enables event generation.

Any string may be used as the LSF event receiver; this information is not used by LSF to enable the feature but is only passed as an argument to the event program.

If LSF_EVENT_PROGRAM specifies a program that does not exist, event generation does not work.

Default

Not defined. Event generation is disabled

LSF_GET_CONF

Syntax

LSF_GET_CONF=lim

Description

Synchronizes a local host's cluster configuration with the master host's cluster configuration. Specifies that a slave host must request cluster configuration details from the LIM of a host on the SERVER_HOST list. Use when a slave host does not share a filesystem with master hosts, and therefore cannot access cluster configuration.

Default

Not defined.

LSF_HOST_CACHE_NTTL

Syntax

LSF_HOST_CACHE_NTTL=time_seconds

Description

Negative-time-to-live value in seconds. Specifies the length of time the system caches a failed DNS lookup result. If you set this value to zero (0), LSF does not cache the result.
Note:

Setting this parameter does not affect the positive-time-to-live value set by the parameter LSF_HOST_CACHE_PTTL.

Valid values

Positive integer. Recommended value less than or equal to 60 seconds (1 minute).

Default

20 seconds

See also

LSF_HOST_CACHE_PTTL

LSF_HOST_CACHE_PTTL

Syntax

LSF_HOST_CACHE_PTTL=time_seconds

Description

Positive-time-to-live value in seconds. Specifies the length of time the system caches a successful DNS lookup result. If you set this value to zero (0), LSF does not cache the result.
Note:

Setting this parameter does not affect the negative-time-to-live value set by the parameter LSF_HOST_CACHE_NTTL.

Valid values

Positive integer. Recommended value equal to or greater than 3600 seconds (1 hour).

Default

86400 seconds (24 hours)

See also

LSF_HOST_CACHE_NTTL

LSF_HPC_EXTENSIONS

Syntax

LSF_HPC_EXTENSIONS="extension_name ..."

Description

Enables Platform LSF HPC extensions.

After adding or changing LSF_HPC_EXTENSIONS, use badmin mbdrestart and badmin hrestart to reconfigure your cluster.

Valid values

The following extension names are supported:

CUMULATIVE_RUSAGE : When a parallel job script runs multiple commands, resource usage is collected for jobs in the job script, rather than being overwritten when each command is executed.

DISP_RES_USAGE_LIMITS :  bjobs displays resource usage limits configured in the queue as well as job-level limits.

HOST_RUSAGE: For parallel jobs, reports the correct rusage based on each host’s usage and the total rusage being charged to the execution host. This host rusage breakdown applies to the blaunch framework, the pam framework, and vendor MPI jobs (HP and SGI). For a running job, you will see run time, memory, swap, utime, stime, and pids and pgids on all hosts that a parallel job spans. For finished jobs, you will see memory, swap, utime, and stime on all hosts that a parallel job spans. The host-based rusage is reported in the JOB_FINISH record of lsb.acct and lsb.stream, and the JOB_STATUS record of lsb.events if the job status is done or exit. Also for finished jobs, bjobs -l shows CPU time, bhist -l shows CPU time, and bacct -l shows utime, stime, memory, and swap. In the MultiCluster lease model, the parallel job must run on hosts that are all in the same cluster. If you use the jobFinishLog API, all external tools must use jobFinishLog built with LSF 8.0, or host-based rusage will not work. If you add or remove this extension, you must restart mbatchd, sbatchd, and res on all hosts.

LSB_HCLOSE_BY_RES :  If res is down, host is closed with a message

Host is closed because RES is not available.

The status of the closed host is closed_Adm. No new jobs are dispatched to this host, but currently running jobs are not suspended.

RESERVE_BY_STARTTIME :   LSF selects the reservation that gives the job the earliest predicted start time.

By default, if multiple host groups are available for reservation, LSF chooses the largest possible reservation based on number of slots.

SHORT_EVENTFILE : Compresses long host name lists when event records are written to lsb.events and lsb.acct for large parallel jobs. The short host string has the format:
number_of_hosts*real_host_name
Tip:

When SHORT_EVENTFILE is enabled, older daemons and commands (pre-LSF Version 7) cannot recognize the lsb.acct and lsb.events file format.

For example, if the original host list record is
6 "hostA" "hostA" "hostA" "hostA" "hostB" "hostC"
redundant host names are removed and the short host list record becomes
3 "4*hostA" "hostB" "hostC"

When LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is set, and LSF reads the host list from lsb.events or lsb.acct, the compressed host list is expanded into a normal host list.

SHORT_EVENTFILE affects the following events and fields:
  • JOB_START in lsb.events when a normal job is dispatched
    • numExHosts (%d)

    • execHosts (%s)

  • JOB_CHUNK in lsb.events when a job is inserted into a job chunk
    • numExHosts (%d)

    • execHosts (%s)

  • JOB_FORWARD in lsb.events when a job is forwarded to a MultiCluster leased host
    • numReserHosts (%d)

    • reserHosts (%s)

  • JOB_FINISH record in lsb.acct
    • numExHosts (%d)

    • execHosts (%s)

SHORT_PIDLIST : Shortens the output from bjobs to omit all but the first process ID (PID) for a job. bjobs displays only the first ID and a count of the process group IDs (PGIDs) and process IDs for the job.

Without SHORT_PIDLIST, bjobs -l displays all the PGIDs and PIDs for the job. With SHORT_PIDLIST set, bjobs -l displays a count of the PGIDS and PIDs.

TASK_MEMLIMIT : Enables enforcement of a memory limit (bsub -M, bmod -M, or MEMLIMIT in lsb.queues) for individual tasks in a parallel job. If any parallel task exceeds the memory limit, LSF terminates the entire job.

TASK_SWAPLIMIT:  Enables enforcement of a virtual memory (swap) limit (bsub -v, bmod -v, or SWAPLIMIT in lsb.queues) for individual tasks in a parallel job. If any parallel task exceeds the swap limit, LSF terminates the entire job.

Example JOB_START events in lsb.events:

For a job submitted with
bsub -n 64 -R "span[ptile=32]" sleep 100
Without SHORT_EVENTFILE, a JOB_START event like the following is logged in lsb.events:
"JOB_START" "8.0" 1058989891 710 4 0 0 10.3 64 "hostA" "hostA" "hostA" "hostA" "hostA" 
"hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" 
"hostA" "hostA" "hostA" "hostA" "hostA" "u050" "hostA" "hostA" "hostA" "hostA" "hostA" 
"hostA" "hostA" "hostA" "hostA" "hostA" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" 
"hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" 
"hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" 
"hostB" "hostB" "hostB" "hostB" "" "" 0 "" 0
With SHORT_EVENTFILE, a JOB_START event would be logged in lsb.events with the number of execution hosts (numExHosts field) changed from 64 to 2 and the execution host list (execHosts field) shortened to "32*hostA" and "32*hostB":
"JOB_START" "8.0" 1058998174 812 4 0 0 10.3 2 "32*hostA" "32*hostB" "" "" 0 "" 0 ""

Example JOB_FINISH records in lsb.acct:

For a job submitted with
bsub -n 64 -R "span[ptile=32]" sleep 100
Without SHORT_EVENTFILE, a JOB_FINISH event like the following is logged in lsb.acct:
"JOB_FINISH" "8.0" 1058990001 710 33054 33816578 64 1058989880 0 0 1058989891 "user1" 
"normal" "span[ptile=32]" "" "" "hostA" "/scratch/user1/work" "" "" "" "1058989880.710" 
0 64 "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" 
"hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" 
"hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" "hostA" 
"hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" 
"hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" 
"hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" "hostB" 64 10.3 
"" "sleep 100" 0.079999 0.270000 0 0 -1 0 0 0 0 0 0 0 -1 0 0 0 0 0 -1 "" "default" 0 64 
"" "" 0 4304 6024 "" "" "" 0
With SHORT_EVENTFILE, a JOB_FINISH event like the following would be logged in lsb.acct with the number of execution hosts (numExHosts field) changed from 64 to 2 and the execution host list (execHosts field) shortened to "32*hostA" and "32*hostB":
"JOB_FINISH" "8.0" 1058998282 812 33054 33816578 64 1058998163 0 0 1058998174 "user1" 
"normal" "span[ptile=32]" "" "" "hostA" "/scratch/user1/work" "" "" "" "1058998163.812" 
0 2 "32*hostA" "32*hostB" 64 10.3 "" "sleep 100" 0.039999 0.259999 0 0 -1 0 0 0 0 0 0 0 
-1 0 0 0 0 0 -1 "" "default" 0 64 "" "" 0 4304 6024 "" "" "" "" 0 0

Example bjobs -l output without SHORT_PIDLIST:

bjobs -l displays all the PGIDs and PIDs for the job:
bjobs -l
Job <109>, User <user3>, Project <default>, Status <RUN>, Queue <normal>, Inte
                     ractive mode, Command <./myjob.sh>
Mon Jul 21 20:54:44 2009: Submitted from host <hostA>, CWD <$HOME/LSF/jobs;
 RUNLIMIT 
 10.0 min of hostA
 STACKLIMIT CORELIMIT MEMLIMIT
   5256 K    10000 K    5000 K
Mon Jul 21 20:54:51 2009: Started on <hostA>;
Mon Jul 21 20:55:03 2009: Resource usage collected.
                     MEM: 2 Mbytes;  SWAP: 15 Mbytes
                     PGID: 256871;  PIDs: 256871 
                     PGID: 257325;  PIDs: 257325 257500 257482 257501 257523 
                     257525 257531
 SCHEDULING PARAMETERS:
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -  
 loadStop    -     -     -     -       -     -    -     -     -      -      -  
                    cpuspeed    bandwidth
 loadSched          -            -
 loadStop           -            -
<< Job <109> is done successfully. >>

Example bjobs -l output with SHORT_PIDLIST:

bjobs -l displays a count of the PGIDS and PIDs:
bjobs -l
Job <109>, User <user3>, Project <default>, Status <RUN>, Queue <normal>, Inte
                     ractive mode, Command <./myjob.sh>
Mon Jul 21 20:54:44 2009: Submitted from host <hostA>, CWD <$HOME/LSF/jobs;
 RUNLIMIT                
 10.0 min of hostA
 STACKLIMIT CORELIMIT MEMLIMIT
   5256 K    10000 K    5000 K
Mon Jul 21 20:54:51 2009: Started on <hostA>;
Mon Jul 21 20:55:03 2009: Resource usage collected.
                     MEM: 2 Mbytes;  SWAP: 15 Mbytes
                     PGID(s):  256871:1 PID, 257325:7 PIDs
 SCHEDULING PARAMETERS:
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -  
 loadStop    -     -     -     -       -     -    -     -     -      -      - 
                    cpuspeed    bandwidth
 loadSched          -            -
 loadStop           -            -

Default

Not defined

LSF_HPC_PJL_LOADENV_TIMEOUT

Syntax

LSF_HPC_PJL_LOADENV_TIMEOUT=time_seconds

Description

Timeout value in seconds for PJL to load or unload the environment. For example, set LSF_HPC_PJL_LOADENV_TIMEOUT to the number of seconds needed for IBM POE to load or unload adapter windows.

At job startup, the PJL times out if the first task fails to register with PAM within the specified timeout value. At job shutdown, the PJL times out if it fails to exit after the last Taskstarter termination report within the specified timeout value.

Default

LSF_HPC_PJL_LOADENV_TIMEOUT=300

LSF_ID_PORT

Syntax

LSF_ID_PORT=port_number

Description

The network port number used to communicate with the authentication daemon when LSF_AUTH is set to ident.

Default

Not defined

LSF_INCLUDEDIR

Syntax

LSF_INCLUDEDIR=directory

Description

Directory under which the LSF API header files lsf.h and lsbatch.h are installed.

Default

LSF_INDEP/include

See also

LSF_INDEP

LSF_INDEP

Syntax

LSF_INDEP=directory

Description

Specifies the default top-level directory for all machine-independent LSF files.

This includes man pages, configuration files, working directories, and examples. For example, defining LSF_INDEP as /usr/share/lsf/mnt places man pages in /usr/share/lsf/mnt/man, configuration files in /usr/share/lsf/mnt/conf, and so on.

The files in LSF_INDEP can be shared by all machines in the cluster.

As shown in the following list, LSF_INDEP is incorporated into other LSF environment variables.
  • LSB_SHAREDIR=$LSF_INDEP/work

  • LSF_CONFDIR=$LSF_INDEP/conf

  • LSF_INCLUDEDIR=$LSF_INDEP/include

  • LSF_MANDIR=$LSF_INDEP/man

  • XLSF_APPDIR=$LSF_INDEP/misc

Default

/usr/share/lsf/mnt

See also

LSF_MACHDEP, LSB_SHAREDIR, LSF_CONFDIR, LSF_INCLUDEDIR, LSF_MANDIR, XLSF_APPDIR

LSF_INTERACTIVE_STDERR

Syntax

LSF_INTERACTIVE_STDERR=y | n

Description

Separates stderr from stdout for interactive tasks and interactive batch jobs.

This is useful to redirect output to a file with regular operators instead of the bsub -e err_file and -o out_file options.

This parameter can also be enabled or disabled as an environment variable.
CAUTION:

If you enable this parameter globally in lsf.conf, check any custom scripts that manipulate stderr and stdout.

When this parameter is not defined or set to n, the following are written to stdout on the submission host for interactive tasks and interactive batch jobs:
  • Job standard output messages

  • Job standard error messages

The following are written to stderr on the submission host for interactive tasks and interactive batch jobs:
  • LSF messages

  • NIOS standard messages

  • NIOS debug messages (if LSF_NIOS_DEBUG=1 in lsf.conf)

When this parameter is set to y, the following are written to stdout on the submission host for interactive tasks and interactive batch jobs:
  • Job standard output messages

The following are written to stderr on the submission host:
  • Job standard error messages

  • LSF messages

  • NIOS standard messages

  • NIOS debug messages (if LSF_NIOS_DEBUG=1 in lsf.conf)

Default

Not defined

Notes

When this parameter is set, the change affects interactive tasks and interactive batch jobs run with the following commands:
  • bsub -I

  • bsub -Ip

  • bsub -Is

  • lsrun

  • lsgrun

  • lsmake (Platform Make)

  • bsub pam (HPC features must be enabled)

Limitations

  • Pseudo-terminal: Do not use this parameter if your application depends on stderr as a terminal. This is because LSF must use a non-pseudo-terminal connection to separate stderr from stdout.

  • Synchronization: Do not use this parameter if you depend on messages in stderr and stdout to be synchronized and jobs in your environment are continuously submitted. A continuous stream of messages causes stderr and stdout to not be synchronized. This can be emphasized with parallel jobs. This situation is similar to that of rsh.

  • NIOS standard and debug messages: NIOS standard messages, and debug messages (when LSF_NIOS_DEBUG=1 in lsf.conf or as an environment variable) are written to stderr. NIOS standard messages are in the format <<message>>, which makes it easier to remove them if you wish. To redirect NIOS debug messages to a file, define LSF_CMD_LOGDIR in lsf.conf or as an environment variable.

See also

LSF_NIOS_DEBUG, LSF_CMD_LOGDIR

LSF_LD_SECURITY

Syntax

LSF_LD_SECURITY=y | n

Description

LSF_LD_SECURITY: When set, jobs submitted using bsub -Is or bsub -Ip cause the environment variables LD_PRELOAD and LD_LIBRARY_PATH to be removed from the job environment during job initialization to ensure enhanced security against users obtaining root privileges.

Two new environment variables are created (LSF_LD_LIBRARY_PATH and LSF_LD_PRELOAD) to allow LD_PRELOAD and LD_LIBRARY_PATH to be put back before the job runs.

Default

N

LSF_LIBDIR

Syntax

LSF_LIBDIR=directory

Description

Specifies the directory in which the LSF libraries are installed. Library files are shared by all hosts of the same type.

Default

LSF_MACHDEP/lib

LSF_LIC_SCHED_HOSTS

Syntax

LSF_LIC_SCHED_HOSTS="candidate_host_list"

candidate_host_list is a space-separated list of hosts that are candidate License Scheduler hosts.

Description

The candidate License Scheduler host list is read by LIM on each host to check if the host is a candidate License Scheduler master host. If the host is on the list, LIM starts the License Scheduler daemon (bld) on the host.

LSF_LIC_SCHED_PREEMPT_REQUEUE

Syntax

LSF_LIC_SCHED_PREEMPT_REQUEUE=y | n

Description

Set this parameter to requeue a job whose license is preempted by Platform License Scheduler. The job is killed and requeued instead of suspended.

If you set LSF_LIC_SCHED_PREEMPT_REQUEUE, do not set LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE. If both these parameters are set, LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE is ignored.

Default

N

See also

LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE, LSF_LIC_SCHED_PREEMPT_STOP

LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE

Syntax

LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE=y | n

Description

Set this parameter to release the slot of a job that is suspended when its license is preempted by Platform License Scheduler.

If you set LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE, do not set LSF_LIC_SCHED_PREEMPT_REQUEUE. If both these parameters are set, LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE is ignored.

Default

Y

See also

LSF_LIC_SCHED_PREEMPT_REQUEUE, LSF_LIC_SCHED_PREEMPT_STOP

LSF_LIC_SCHED_PREEMPT_STOP

Syntax

LSF_LIC_SCHED_PREEMPT_STOP=y | n

Description

Set this parameter to use job controls to stop a job that is preempted. When this parameter is set, a UNIX SIGSTOP signal is sent to suspend a job instead of a UNIX SIGTSTP.

To send a SIGSTOP signal instead of SIGTSTP, the following parameter in lsb.queues must also be set:
JOB_CONTROLS=SUSPEND[SIGSTOP]

Default

N

See also

LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE, LSF_LIC_SCHED_PREEMPT_REQUEUE

LSF_LIC_SCHED_STRICT_PROJECT_NAME

Syntax

LSF_LIC_SCHED_STRICT_PROJECT_NAME=y | n

Description

Enforces strict checking of the License Scheduler project name upon job submission or job modification (bsub or bmod). If the project named is misspelled (case sensitivity applies), the job is rejected.

If this parameter is not set or it is set to n, and if there is an error in the project name, the default project is used.

Default

N

LSF_LICENSE_ACCT_PATH

Syntax

LSF_LICENSE_ACCT_PATH=directory

Description

Specifies the location for the license accounting files. These include the license accounting files for LSF Family products.

Use this parameter to define the location of all the license accounting files. By defining this parameter, you can store the license accounting files for the LSF Family of products in the same directory for convenience.

Default

Not defined. The license accounting files are stored in the default log directory for the particular product. For example, LSF stores its license audit file in the LSF system log file directory.

See also

  • LSF_LOGDIR

  • lsf.cluster_name.license.acct

  • bld.license.acct

LSF_LICENSE_FILE

Syntax

LSF_LICENSE_FILE="file_name ... | port_number@host_name[:port_number@host_name ...]"

Description

Specifies one or more demo or FlexNet permanent license files used by LSF.

The value for LSF_LICENSE_FILE can be either of the following:
  • The full path name to the license file.

    • UNIX example:
      LSF_LICENSE_FILE=/usr/share/lsf/cluster1/conf/license.dat
    • Windows examples:

      LSF_LICENSE_FILE= C:\licenses\license.dat

      LSF_LICENSE_FILE=\\HostA\licenses\license.dat

  • For a permanent license, the name of the license server host and TCP port number used by the lmgrd daemon, in the format port@host_name. For example:

    LSF_LICENSE_FILE="1700@hostD"
  • For a license with redundant servers, use a semi-colon or colon to separate the port@host_names. The port number must be the same as that specified in the SERVER line of the license file. For example:

    UNIX:
    LSF_LICENSE_FILE="port@hostA:port@hostB:port@hostC"
    Windows:
    LSF_LICENSE_FILE="port@hostA;port@hostB;port@hostC"
  • For a license with distributed servers, use a pipe to separate the port@host_names. The port number must be the same as that specified in the SERVER line of the license file. For example:

    LSF_LICENSE_FILE="port@hostA|port@hostB|port@hostC"

Multiple license files should be quoted and must be separated by a pipe character (|).

Windows example:
LSF_LICENSE_FILE="C:\licenses\lic1|C:\licenses\lic2|D:\mydir\lic3"

Multiple files may be kept in the same directory, but each one must reference a different license server. When checking out a license, LSF searches the servers in the order in which they are listed, so it checks the second server when there are no more licenses available from the first server.

If this parameter is not defined, LSF assumes the default location.

Default

The default licence installation directory is the value of the parameter LSF_CONFDIR or LSF_ENVDIR in lsf.conf.

Demo license: The default demo licence installation directory is /usr/local/flexlm/.

LSF_LICENSE_MAINTENANCE_INTERVAL

Syntax

LSF_LICENSE_MAINTENANCE_INTERVAL=time_seconds

Description

Specifies how often LSF checks the LSF licences when starting or restarting the cluster. A small number could delay LSF. Valid values are from 5 to 300.

When this parameter is not set, the default value is used.

Recommended value

Set LSF_LICENSE_MAINTENANCE_INTERVAL depending on your cluster size, system buffer size, license server, and cluster communication speed:
  • If you have network delays or a small system buffer (less than 32 KB), set LSF_LICENSE_MAINTENANCE to the high end of the valid values (300).

  • For a small cluster (fewer than 1000 hosts), specify LSF_LICENSE_MAINTENANCE_INTERVAL with 5-60 second value.

  • For a large cluster (greater than 4000 hosts) with limited licenses, use the maximum value: 300 seconds.

  • If you have slow cluster communication (for example, if you use a Web-based intranet), use the maximum value: 300 seconds.

Default

5 seconds

LSF_LICENSE_NOTIFICATION_INTERVAL

Syntax

LSF_LICENSE_MAINTENANCE_INTERVAL=time_hours

Description

Specifies how often notification email is sent to the primary cluster administrator about overuse of LSF Family product licenses and Platform License Scheduler tokens.

Recommended value

To avoid getting the same audit information more than once, set LSF_LICENSE_NOTIFICATION_INTERVAL greater than 24 hours.

Example notification email

Subject: LSF license overuse LSF Administrator: Your cluster has 
experienced license overuse. Platform Product License Name: 
LSF_MANAGER CLASS E license usage: 0 in total; 8 in use (8 overused).  
Overuse Hosts:  hostA Use lim -t and lshosts -l or see 
/usr/opt/lsf7.0/log/lsf.cluster_8.0.license.acct file for details. 
Please contact Platform Support at support@platform.com for 
information about getting additional licenses.

Default

24 hours

See also

  • LSF_LICENSE_ACCT_PATH

  • LSF_LOGDIR

  • lsf.cluster_name.license.acct

  • bld.license.acct

LSF_LIM_API_NTRIES

Syntax

LSF_LIM_API_NTRIES=integer

Description

Defines the number of times LSF commands will try to communicate with the LIM API when LIM is not available. LSF_LIM_API_NTRIES is ignored by LSF and EGO daemons and EGO commands. The LSF_LIM_API_NTRIES environment variable. overrides the value of LSF_LIM_API_NTRIES in lsf.conf.

Valid values

1 to 65535

Default

1. LIM API exits without retrying.

LSF_LIM_DEBUG

Syntax

LSF_LIM_DEBUG=1 | 2

Description

Sets LSF to debug mode.

If LSF_LIM_DEBUG is defined, LIM operates in single user mode. No security checking is performed, so LIM should not run as root.

LIM does not look in the services database for the LIM service port number. Instead, it uses port number 36000 unless LSF_LIM_PORT has been defined.

Specify 1 for this parameter unless you are testing LSF.

Valid values

LSF_LIM_DEBUG=1

LIM runs in the background with no associated control terminal.

LSF_LIM_DEBUG=2

LIM runs in the foreground and prints error messages to tty.

EGO parameter

EGO_LIM_DEBUG

Default

Not defined

See also

LSF_RES_DEBUG, LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR

LSF_LIM_IGNORE_CHECKSUM

Syntax

LSF_LIM_IGNORE_CHECKSUM=y | Y

Description

Configure LSF_LIM_IGNORE_CHECKSUM=Y to ignore warning messages logged to lim log files on non-master hosts.

When LSF_MASTER_LIST is set, lsadmin reconfig only restarts master candidate hosts (for example, after adding or removing hosts from the cluster). This can cause superfluous warning messages like the following to be logged in the lim log files for non-master hosts because lim on these hosts are not restarted after configuration change:
Aug 26 13:47:35 2006 9746 4 8.0 xdr_loadvector: Sender <10.225.36.46:9999> has a different configuration

Default

Not defined.

See also

LSF_MASTER_LIST

LSF_LIM_PORT, LSF_RES_PORT, LSB_MBD_PORT, LSB_SBD_PORT

Syntax

LSF_LIM_PORT=port_number

Description

TCP service ports to use for communication with the LSF daemons.

If port parameters are not defined, LSF obtains the port numbers by looking up the LSF service names in the /etc/services file or the NIS (UNIX). If it is not possible to modify the services database, you can define these port parameters to set the port numbers.

EGO parameter

EGO_LIM_PORT

Default

On UNIX, the default is to get port numbers from the services database.

On Windows, these parameters are mandatory.

Default port number values are:
  • LSF_LIM_PORT=7869

  • LSF_RES_PORT=6878

  • LSB_MBD_PORT=6881

  • LSB_SBD_PORT=6882

LSF_LIVE_CONFDIR

Syntax

LSF_LIVE_CONFDIR=directory

Description

Enables and disables live reconfiguration (bconf command) and sets the directory where configuration files changed by live reconfiguration are saved. bconf requests will be rejected if the directory does not exist and cannot be created, or is specified using a relative path.

When LSF_LIVE_CONFDIR is defined and contains configuration files, all LSF restart and reconfiguration reads these configuration files instead of the files in LSF_CONFDIR.

After adding or changing LSF_LIVE_CONFDIR in lsf.conf, use badmin mbdrestart and lsadmin reconfig to reconfigure your cluster.

Important:

Remove LSF_LIVE_CONFDIR configuration files or merge files into LSF_CONFDIR before upgrading LSF or applying patches to LSF.

See bconf in the LSF Command Reference or bconf man page for bconf (live reconfiguration) details.

Default

Undefined. (bconf disabled.)

During installation, LSF_LIVE_CONFDIR is set to LSB_SHAREDIR/cluster_name/live_confdir where cluster_name is the name of the LSF cluster, as returned by lsid.

See also

LSF_CONFDIR, LSB_CONFDIR

LSF_LOAD_USER_PROFILE

Syntax

LSF_LOAD_USER_PROFILE=local | roaming

Description

When running jobs on Windows hosts, you can specify whether a user profile should be loaded. Use this parameter if you have jobs that need to access user-specific resources associated with a user profile.

Local and roaming user profiles are Windows features. For more information about them, check Microsoft documentation.

  • Local: LSF loads the Windows user profile from the local execution machine (the host on which the job runs).

    Note:

    If the user has logged onto the machine before, the profile of that user is used. If not, the profile for the default user is used

  • Roaming: LSF loads a roaming user profile if it has been set up. If not, the local user profile is loaded instead.

Default

Not defined. No user profiles are loaded when jobs run on Windows hosts.

LSF_LOCAL_RESOURCES

Syntax

LSF_LOCAL_RESOURCES="resource ..."

Description

Defines instances of local resources residing on the slave host.
  • For numeric resources, defined name-value pairs:
    "[resourcemap value*resource_name]"
  • For Boolean resources, the value is the resource name in the form:
    "[resource resource_name]"

When the slave host calls the master host to add itself, it also reports its local resources. The local resources to be added must be defined in lsf.shared.

If the same resource is already defined in lsf.shared as default or all, it cannot be added as a local resource. The shared resource overrides the local one.
Tip:

LSF_LOCAL_RESOURCES is usually set in the slave.config file during installation. If LSF_LOCAL_RESOURCES are already defined in a local lsf.conf on the slave host, lsfinstall does not add resources you define in LSF_LOCAL_RESOURCES in slave.config. You should not have duplicate LSF_LOCAL_RESOURCES entries in lsf.conf. If local resources are defined more than once, only the last definition is valid.

Important:

Resources must already be mapped to hosts in the ResourceMap section of lsf.cluster.cluster_name. If the ResourceMap section does not exist, local resources are not added.

Example

LSF_LOCAL_RESOURCES="[resourcemap 1*verilog] [resource linux]"

EGO parameter

EGO_LOCAL_RESOURCES

Default

Not defined

LSF_LOG_MASK

Syntax

LSF_LOG_MASK=message_log_level

Description

Specifies the logging level of error messages for LSF daemons, except LIM, which is controlled by Platform EGO.

For example:
LSF_LOG_MASK=LOG_DEBUG

If EGO is enabled in the LSF cluster, and EGO_LOG_MASK is not defined, LSF uses the value of LSF_LOG_MASK for LIM, PIM, and MELIM. EGO vemkd and pem components continue to use the EGO default values. If EGO_LOG_MASK is defined, and EGO is enabled, then EGO value is taken.

To specify the logging level of error messages for LSF commands, use LSF_CMD_LOG_MASK. To specify the logging level of error messages for LSF batch commands, use LSB_CMD_LOG_MASK.

On UNIX, this is similar to syslog. All messages logged at the specified level or higher are recorded; lower level messages are discarded. The LSF_LOG_MASK value can be any log priority symbol that is defined in syslog.h (see syslog).

The log levels in order from highest to lowest are:
  • LOG_EMERG

  • LOG_ALERT

  • LOG_CRIT

  • LOG_ERR

  • LOG_WARNING

  • LOG_NOTICE

  • LOG_INFO

  • LOG_DEBUG

  • LOG_DEBUG1

  • LOG_DEBUG2

  • LOG_DEBUG3

The most important LSF log messages are at the LOG_ERR or LOG_WARNING level. Messages at the LOG_INFO and LOG_DEBUG level are only useful for debugging.

Although message log level implements similar functionality to UNIX syslog, there is no dependency on UNIX syslog. It works even if messages are being logged to files instead of syslog.

LSF logs error messages in different levels so that you can choose to log all messages, or only log messages that are deemed critical. The level specified by LSF_LOG_MASK determines which messages are recorded and which are discarded. All messages logged at the specified level or higher are recorded, while lower level messages are discarded.

For debugging purposes, the level LOG_DEBUG contains the fewest number of debugging messages and is used for basic debugging. The level LOG_DEBUG3 records all debugging messages, and can cause log files to grow very large; it is not often used. Most debugging is done at the level LOG_DEBUG2.

In versions earlier than LSF 4.0, you needed to restart the daemons after setting LSF_LOG_MASK in order for your changes to take effect.

LSF 4.0 implements dynamic debugging, which means you do not need to restart the daemons after setting a debugging environment variable.

EGO parameter

EGO_LOG_MASK

Default

LOG_WARNING

See also

LSB_CMD_LOG_MASK, LSB_CMD_LOGDIR, LSB_DEBUG, LSB_DEBUG_CMD, LSB_DEBUG_NQS, LSB_TIME_CMD, LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_DEBUG_LIM, LSB_DEBUG_MBD, LSF_DEBUG_RES, LSB_DEBUG_SBD, LSB_DEBUG_SCH, LSF_LOG_MASK, LSF_LOGDIR, LSF_TIME_CMD

LSF_LOG_MASK_WIN

Syntax

LSF_LOG_MASK_WIN=message_log_level

Description

Allows you to reduce the information logged to the LSF Windows event log files. Messages of lower severity than the specified level are discarded.

For all LSF files, the types of messages saved depends on LSF_LOG_MASK, so the threshold for the Windows event logs is either LSF_LOG_MASK or LSF_LOG_MASK_WIN, whichever is higher. LSF_LOG_MASK_WIN is ignored if LSF_LOG_MASK is set to a higher level.

The LSF event log files for Windows are:
  • lim.log.host_name

  • res.log.host_name

  • sbatchd.log.host_name

  • mbatchd.log.host_name

  • pim.log.host_name

The log levels you can specify for this parameter, in order from highest to lowest, are:
  • LOG_ERR

  • LOG_WARNING

  • LOG_INFO

  • LOG_NONE (LSF does not log Windows events)

Default

LOG_ERR

See also

LSF_LOG_MASK

LSF_LOGDIR

Syntax

LSF_LOGDIR=directory

Description

Defines the LSF system log file directory. Error messages from all servers are logged into files in this directory. To effectively use debugging, set LSF_LOGDIR to a directory such as /tmp. This can be done in your own environment from the shell or in lsf.conf.

Windows

LSF_LOGDIR is required on Windows if you wish to enable logging.

You must also define LSF_LOGDIR_USE_WIN_REG=n.

If you define LSF_LOGDIR without defining LSF_LOGDIR_USE_WIN_REG=n, LSF logs error messages into files in the default local directory specified in one of the following Windows registry keys:
  • On Windows 2000, Windows XP, and Windows 2003:

    HKEY_LOCAL_MACHINE\SOFTWARE\Platform Computing Corporation\LSF\cluster_name\LSF_LOGDIR
  • On Windows XP x64 and Windows 2003 x64:

    HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Platform Computing Corporation\LSF\cluster_name\LSF_LOGDIR 
If a server is unable to write in the LSF system log file directory, LSF attempts to write to the following directories in the following order:
  • LSF_TMPDIR if defined

  • %TMP% if defined

  • %TEMP% if defined

  • System directory, for example, c:\winnt

UNIX

If a server is unable to write in this directory, the error logs are created in /tmp on UNIX.

If LSF_LOGDIR is not defined, syslog is used to log everything to the system log using the LOG_DAEMON facility. The syslog facility is available by default on most UNIX systems. The /etc/syslog.conf file controls the way messages are logged and the files they are logged to. See the man pages for the syslogd daemon and the syslog function for more information.

Default

Not defined. On UNIX, log messages go to syslog. On Windows, no logging is performed.

See also

LSB_CMD_LOG_MASK, LSB_CMD_LOGDIR, LSB_DEBUG, LSB_DEBUG_CMD, LSB_TIME_CMD, LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR_USE_WIN_REG, LSF_TIME_CMD

Files

  • lim.log.host_name

  • res.log.host_name

  • sbatchd.log.host_name

  • sbatchdc.log.host_name (when LSF_DAEMON_WRAP=Y)

  • mbatchd.log.host_name

  • eeventd.log.host_name

  • pim.log.host_name

LSF_LOGDIR_USE_WIN_REG

Syntax

LSF_LOGDIR_USE_WIN_REG=n | N

Description

Windows only.

If set, LSF logs error messages into files in the directory specified by LSF_LOGDIR in lsf.conf.

Use this parameter to enable LSF to save log files in a different location from the default local directory specified in the Windows registry.

If not set, or if set to any value other than N or n, LSF logs error messages into files in the default local directory specified in one of the following Windows registry keys:
  • On Windows 2000, Windows XP, and Windows 2003:

    HKEY_LOCAL_MACHINE\SOFTWARE\Platform Computing Corporation\LSF\cluster_name\LSF_LOGDIR
  • On Windows XP x64 and Windows 2003 x64:

    HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Platform Computing Corporation\LSF\cluster_name\LSF_LOGDIR 

Default

Not set.

LSF uses the default local directory specified in the Windows registry.

See also

LSF_LOGDIR

LSF_LOGFILE_OWNER

Syntax

LSF_LOGFILE_OWNER="user_name"

Description

Specifies an owner for the LSF log files other than the default, the owner of lsf.conf. To specify a Windows user account, include the domain name in uppercase letters (DOMAIN_NAME\user_name).

Default

Not set. The LSF Administrator with root privileges is the owner of LSF log files.

LSF_LSLOGIN_SSH

Syntax

LSF_LSLOGIN_SSH=YES | yes

Description

Enables SSH to secure communication between hosts and during job submission.

SSH is used when running any of the following:

  • Remote log on to a lightly loaded host (lslogin)

  • An interactive job (bsub -IS | -ISp | ISs)

  • An X-window job (bsub -IX)

  • An externally submitted job that is interactive or X-window (esub)

Default

Not set. LSF uses rlogin to authenticate users.

LSF_MACHDEP

Syntax

LSF_MACHDEP=directory

Description

Specifies the directory in which machine-dependent files are installed. These files cannot be shared across different types of machines.

In clusters with a single host type, LSF_MACHDEP is usually the same as LSF_INDEP. The machine dependent files are the user commands, daemons, and libraries. You should not need to modify this parameter.

As shown in the following list, LSF_MACHDEP is incorporated into other LSF variables.
  • LSF_BINDIR=$LSF_MACHDEP/bin

  • LSF_LIBDIR=$LSF_MACHDEP/lib

  • LSF_SERVERDIR=$LSF_MACHDEP/etc

  • XLSF_UIDDIR=$LSF_MACHDEP/lib/uid

Default

/usr/share/lsf

See also

LSF_INDEP

LSF_MANDIR

Syntax

LSF_MANDIR=directory

Description

Directory under which all man pages are installed.

The man pages are placed in the man1, man3, man5, and man8 subdirectories of the LSF_MANDIR directory. This is created by the LSF installation process, and you should not need to modify this parameter.

Man pages are installed in a format suitable for BSD-style man commands.

For most versions of UNIX and Linux, you should add the directory LSF_MANDIR to your MANPATH environment variable. If your system has a man command that does not understand MANPATH, you should either install the man pages in the /usr/man directory or get one of the freely available man programs.

Default

LSF_INDEP/man

LSF_MASTER_LIST

Syntax

LSF_MASTER_LIST="host_name ..."

Description

Required. Defines a list of hosts that are candidates to become the master host for the cluster.

Listed hosts must be defined in lsf.cluster.cluster_name.

Host names are separated by spaces.

Tip:

On UNIX and Linux, master host candidates should share LSF configuration and binaries. On Windows, configuration files are shared, but not binaries.

Starting in LSF 7, LSF_MASTER_LIST must be defined in lsf.conf.

If EGO is enabled, LSF_MASTER_LIST can only be defined lsf.conf. EGO_MASTER_LIST can only be defined in ego.conf. EGO_MASTER_LIST cannot be defined in lsf.conf. LSF_MASTER_LIST cannot be defined ego.conf.

LIM reads EGO_MASTER_LIST wherever it is defined. If both LSF_MASTER_LIST and EGO_MASTER_LIST are defined, the value of EGO_MASTER_LIST in ego.conf is taken. To avoid errors, you should make sure that the value of LSF_MASTER_LIST matches the value of EGO_MASTER_LIST, or define only EGO_MASTER_LIST.

If EGO is disabled, ego.conf not loaded and the value of LSF_MASTER_LIST defined in lsf.conf is taken.

When you run lsadmin reconfig to reconfigure the cluster, only the master LIM candidates read lsf.shared and lsf.cluster.cluster_name to get updated information. The elected master LIM sends configuration information to slave LIMs.

If you have a large number of non-master hosts, you should configure LSF_LIM_IGNORE_CHECKSUM=Y to ignore warning messages like the following logged to lim log files on non-master hosts.
Aug 26 13:47:35 2006 9746 4 8.0 xdr_loadvector: Sender <10.225.36.46:9999> has a different configuration

Interaction with LSF_SERVER_HOSTS

You can use the same list of hosts, or a subset of the master host list defined in LSF_MASTER_LIST, in LSF_SERVER_HOSTS. If you include the primary master host in LSF_SERVER_HOSTS, you should define it as the last host of the list.

If LSF_ADD_CLIENTS is defined in install.config at installation, lsfinstall automatically appends the hosts in LSF_MASTER_LIST to the list of hosts in LSF_SERVER_HOSTS so that the primary master host is last. For example:
LSF_MASTER_LIST="lsfmaster hostE"
LSF_SERVER_HOSTS="hostB hostC hostD hostE lsfmaster"

The value of LSF_SERVER_HOSTS is not changed during upgrade.

EGO parameter

EGO_MASTER_LIST

Default

Defined at installation

See also

LSF_LIM_IGNORE_CHECKSUM

LSF_MASTER_NSLOOKUP_TIMEOUT

Syntax

LSF_MASTER_NSLOOKUP_TIMEOUT=time_milliseconds

Description

Timeout in milliseconds that the master LIM waits for DNS host name lookup.

If LIM spends a lot of time calling DNS to look up a host name, LIM appears to hang.

This parameter is used by master LIM only. Only the master LIM detects this parameter and enable the DNS lookup timeout.

Default

Not defined. No timeout for DNS lookup

See also

LSF_LIM_IGNORE_CHECKSUM

LSF_MAX_TRY_ADD_HOST

Syntax

LSF_MAX_TRY_ADD_HOST=integer

Description

When a slave LIM on a dynamically added host sends an add host request to the master LIM, but master LIM cannot add the host for some reason. the slave LIM tries again. LSF_MAX_TRY_ADD_HOST specifies how many times the slave LIM retries the add host request before giving up.

Default

20

LSF_MC_NON_PRIVILEGED_PORTS

Syntax

LSF_MC_NON_PRIVILEGED_PORTS=y | Y

Description

MultiCluster only. If this parameter is enabled in one cluster, it must be enabled in all clusters.

Specify Y to make LSF daemons use non-privileged ports for communication across clusters.

Compatibility

This disables privileged port daemon authentication, which is a security feature. If security is a concern, you should use eauth for LSF daemon authentication (see LSF_AUTH_DAEMONS in lsf.conf).

Default

Not defined. LSF daemons use privileged port authentication

LSF_MONITOR_LICENSE_TOOL

Syntax

LSF_MONITOR_LICENSE_TOOL=y | Y

Description

Specify Y to enable data collection by lim for the command option lsadmin lsflic.

Default

Not defined. lim ignores requests from lsadmin, closing the channel.

LSF_MISC

Syntax

LSF_MISC=directory

Description

Directory in which miscellaneous machine independent files, such as example source programs and scripts, are installed.

Default

LSF_CONFDIR/misc

LSF_NIOS_DEBUG

Syntax

LSF_NIOS_DEBUG=1

Description

Enables NIOS debugging for interactive jobs (if LSF_NIOS_DEBUG=1).

NIOS debug messages are written to standard error.

This parameter can also be defined as an environment variable.

When LSF_NIOS_DEBUG and LSF_CMD_LOGDIR are defined, NIOS debug messages are logged in nios.log.host_name. in the location specified by LSF_CMD_LOGDIR.

If LSF_NIOS_DEBUG is defined, and the directory defined by LSF_CMD_LOGDIR is inaccessible, NIOS debug messages are logged to /tmp/nios.log.host_name instead of stderr.

On Windows, NIOS debug messages are also logged to the temporary directory.

Default

Not defined

See also

LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR

LSF_NIOS_ERR_LOGDIR

Syntax

LSF_NIOS_ERR_LOGDIR=directory

Description

Applies to Windows only.

If LSF_NIOS_ERR_LOGDIR is specified, logs NIOS errors to directory/nios.error.log.hostname.txt.

If the attempt fails, LSF tries to write to another directory instead. The order is:

  1. the specified log directory

  2. LSF_TMPDIR

  3. %TMP%

  4. %TEMP%

  5. the system directory, for example, C:\winnt

If LSF_NIOS_DEBUG is also specified, NIOS debugging overrides the LSF_NIOS_ERR_LOGDIR setting.

LSF_NIOS_ERR_LOGDIR is an alternative to using the NIOS debug functionality.

This parameter can also be defined as an environment variable.

Default

Not defined

See also

LSF_NIOS_DEBUG, LSF_CMD_LOGDIR

LSF_NIOS_JOBSTATUS_INTERVAL

Syntax

LSF_NIOS_JOBSTATUS_INTERVAL=time_minutes

Description

Applies only to interactive batch jobs.

Time interval at which NIOS polls mbatchd to check if a job is still running. Used to retrieve a job’s exit status in the case of an abnormal exit of NIOS, due to a network failure for example.

Use this parameter if you run interactive jobs and you have scripts that depend on an exit code being returned.

When this parameter is not defined and a network connection is lost, mbatchd cannot communicate with NIOS and the return code of a job is not retrieved.

When this parameter is defined, before exiting, NIOS polls mbatchd on the interval defined by LSF_NIOS_JOBSTATUS_INTERVAL to check if a job is still running. NIOS continues to poll mbatchd until it receives an exit code or mbatchd responds that the job does not exist (if the job has already been cleaned from memory for example).

If an exit code cannot be retrieved, NIOS generates an error message and the code -11.

Valid values

Any integer greater than zero

Default

Not defined

Notes

Set this parameter to large intervals such as 15 minutes or more so that performance is not negatively affected if interactive jobs are pending for too long. NIOS always calls mbatchd on the defined interval to confirm that a job is still pending and this may add load to mbatchd.

See also

Environment variable LSF_NIOS_PEND_TIMEOUT

LSF_NIOS_MAX_TASKS

Syntax

LSF_NIOS_MAX_TASKS=integer

Description

Specifies the maximum number of NIOS tasks.

Default

Not defined

LSF_NIOS_RES_HEARTBEAT

Syntax

LSF_NIOS_RES_HEARTBEAT=time_minutes

Description

Applies only to interactive non-parallel batch jobs.

Defines how long NIOS waits before sending a message to RES to determine if the connection is still open.

Use this parameter to ensure NIOS exits when a network failure occurs instead of waiting indefinitely for notification that a job has been completed. When a network connection is lost, RES cannot communicate with NIOS and as a result, NIOS does not exit.

When this parameter is defined, if there has been no communication between RES and NIOS for the defined period of time, NIOS sends a message to RES to see if the connection is still open. If the connection is no longer available, NIOS exits.

Valid values

Any integer greater than zero

Default

Not defined

Notes

The time you set this parameter to depends how long you want to allow NIOS to wait before exiting. Typically, it can be a number of hours or days. Too low a number may add load to the system.

LSF_NON_PRIVILEGED_PORTS

Syntax

LSF_NON_PRIVILEGED_PORTS=y | Y

Description

Disables privileged ports usage.

By default, LSF daemons and clients running under root account use privileged ports to communicate with each other. Without LSF_NON_PRIVILEGED_PORTS defined, and if LSF_AUTH is not defined in lsf.conf, LSF daemons check privileged port of request message to do authentication.

If LSF_NON_PRIVILEGED_PORTS=Y is defined, LSF clients (LSF commands and daemons) do not use privileged ports to communicate with daemons and LSF daemons do not check privileged ports of incoming requests to do authentication.

LSF_PAM_APPL_CHKPNT

Syntax

LSF_PAM_APPL_CHKPNT=Y | N

Description

When set to Y, allows PAM to function together with application checkpointing support.

Default

Y

LSF_PAM_CLEAN_JOB_DELAY

Syntax

LSF_PAM_CLEAN_JOB_DELAY=time_seconds

Description

The number of seconds LSF waits before killing a parallel job with failed tasks. Specifying LSF_PAM_CLEAN_JOB_DELAY implies that if any parallel tasks fail, the entire job should exit without running the other tasks in the job. The job is killed if any task exits with a non-zero exit code.

Specify a value greater than or equal to zero (0).

Applies only to PAM jobs.

Default

Undefined: LSF kills the job immediately

LSF_PAM_HOSTLIST_USE

Syntax

LSF_PAM_HOSTLIST_USE=unique

Description

Used to start applications that use both OpenMP and MPI.

Valid values

unique

Default

Not defined

Notes

At job submission, LSF reserves the correct number of processors and PAM starts only 1 process per host. For example, to reserve 32 processors and run on 4 processes per host, resulting in the use of 8 hosts:
bsub -n 32 -R "span[ptile=4]" pam yourOpenMPJob

Where defined

This parameter can alternatively be set as an environment variable. For example:
setenv LSF_PAM_HOSTLIST_USE unique

LSF_PAM_PLUGINDIR

Syntax

LSF_PAM_PLUGINDIR=path

Description

The path to libpamvcl.so. Used with Platform LSF HPC features.

Default

Path to LSF_LIBDIR

LSF_PAM_USE_ASH

Syntax

LSF_PAM_USE_ASH=y | Y

Description

Enables LSF to use the SGI IRIX Array Session Handles (ASH) to propagate signals to the parallel jobs.

See the IRIX system documentation and the array_session(5) man page for more information about array sessions.

Default

Not defined

LSF_PASSWD_DIR

Syntax

LSF_PASSWD_DIR=file_path

Description

Defines a location for LSF to load and update the passwd.lsfuser file.

Specify the full path to a shared directory accessible by all master candidate hosts. The LSF lim daemon must have read and write permissions on this directory.

By default, passwd.lsfuser is located in $LSF_CONFDIR. The default location is only used if LSF_PASSWD_DIR is undefined; if you define a new location and lim fails to access passwd.lsfuser in LSF_PASSWD_DIR, it will not check $LSF_CONFDIR.

You must restart lim to make changes take effect.

Default

Not defined (passwd.lsfuser is located in $LSF_CONFDIR)

LSF_PIM_INFODIR

Syntax

LSF_PIM_INFODIR=path

Description

The path to where PIM writes the pim.info.host_name file.

Specifies the path to where the process information is stored. The process information resides in the file pim.info.host_name. The PIM also reads this file when it starts so that it can accumulate the resource usage of dead processes for existing process groups.

EGO parameter

EGO_PIM_INFODIR

Default

Not defined. The system uses /tmp.

LSF_PIM_LINUX_ENHANCE

Syntax

LSF_PIM_LINUX_ENHANCE=Y | N

Description

When enabled, the PIM daemon reports proportional memory utilization for each process attached to a shared memory segment. The sum total of memory utilization of all processes on the host is now accurately reflected in the total memory used. (The Linux kernel must be version 2.6.14 or newer.)

When EGO_PIM_SWAP_REPORT is set, the swap amount is correctly reported. The swap amount is the virtual memory minus the value of the rss value in the static Linux file.

Applies only to Linux operating systems and Red Hat Enterprise Linux 4.7.5.0.

Default

Not defined.

LSF_PIM_SLEEPTIME

Syntax

LSF_PIM_SLEEPTIME=time_seconds

Description

The reporting period for PIM.

PIM updates the process information every 15 seconds unless an application queries this information. If an application requests the information, PIM updates the process information every LSF_PIM_SLEEPTIME seconds. If the information is not queried by any application for more than 5 minutes, the PIM reverts back to the 15 second update period.

EGO parameter

EGO_PIM_SLEEPTIME

Default

30 seconds

LSF_PIM_SLEEPTIME_UPDATE

Syntax

LSF_PIM_SLEEPTIME_UPDATE=y | n

Description

UNIX only.

Use this parameter to improve job throughput and reduce a job’s start time if there are many jobs running simultaneously on a host. This parameter reduces communication traffic between sbatchd and PIM on the same host.

When this parameter is not defined or set to n, sbatchd queries PIM as needed for job process information.

When this parameter is defined, sbatchd does not query PIM immediately as it needs information; sbatchd only queries PIM every LSF_PIM_SLEEPTIME seconds.

Limitations

When this parameter is defined:
  • sbatchd may be intermittently unable to retrieve process information for jobs whose run time is smaller than LSF_PIM_SLEEPTIME.

  • It may take longer to view resource usage with bjobs -l.

EGO parameter

EGO_PIM_SLEEPTIME_UPDATE

Default

Not defined

LSF_POE_TIMEOUT_BIND

Syntax

LSF_POE_TIMEOUT_BIND=time_seconds

Description

Specifies the time in seconds for the poe_w wrapper to keep trying to set up a server socket to listen on.

poe_w is the wrapper for the IBM poe driver program.

LSF_POE_TIMEOUT_BIND can also be set as an environment variable for poe_w to read.

Default

120 seconds

LSF_POE_TIMEOUT_SELECT

Syntax

LSF_POE_TIMEOUT_SELECT=time_seconds

Description

Specifies the time in seconds for the poe_w wrapper to wait for connections from the pmd_w wrapper. pmd_w is the wrapper for pmd (IBM PE Partition Manager Daemon).

LSF_POE_TIMEOUT_SELECT can also be set as an environment variable for poe_w to read.

Default

160 seconds

LSF_REMOTE_COPY_CMD

Syntax

LSF_REMOTE_COPY_CMD="copy_command"

Description

UNIX only. Specifies the shell command or script to use with the following LSF commands if RES fails to copy the file between hosts.

  • lsrcp

  • bsub –i, –f, –is, -Zs ¨Ci(s)

  • bmod -Zs

By default, rcp is used for these commands.

There is no need to restart any daemons when this parameter changes.

For example, to use scp instead of rcp for remote file copying, specify:

LSF_REMOTE_COPY_CMD="scp -B -o 'StrictHostKeyChecking no'"

You can also configure ssh options such as BatchMode, StrictHostKeyChecking in the global SSH_ETC/ssh_config file or $HOME/.ssh/config.

When remote copy of a file via RES fails, the environment variable “LSF_LSRCP_ERRNO” is set to the system defined errno. You can use this variable in a self-defined shell script executed by lsrcp. The script can do the appropriate cleanup, recopy, or retry, or it can just exit without invoking any other copy command.

LSF automatically appends two parameters before executing the command:

  • The first parameter is the source file path.

  • The second parameter is the destination file path.

Valid values

Values are passed directly through. Any valid scp, rcp, or custom copy commands and options are supported except for compound multi-commands. For example, set LSF_REMOTE_COPY_CMD="scp -B -o 'StrictHostKeyChecking no'".

To avoid a recursive loop, the value of LSF_REMOTE_COPY_CMD must not be lsrcp or a shell script executing lsrcp.

Default

Not defined.

LSF_RES_ACCT

Syntax

LSF_RES_ACCT=time_milliseconds | 0

Description

If this parameter is defined, RES logs information for completed and failed tasks by default (see lsf.acct).

The value for LSF_RES_ACCT is specified in terms of consumed CPU time (milliseconds). Only tasks that have consumed more than the specified CPU time are logged.

If this parameter is defined as LSF_RES_ACCT=0, then all tasks are logged.

For those tasks that consume the specified amount of CPU time, RES generates a record and appends the record to the task log file lsf.acct.host_name. This file is located in the LSF_RES_ACCTDIR directory.

If this parameter is not defined, the LSF administrator must use the lsadmin command (see lsadmin) to turn task logging on after RES has started.

Default

Not defined

See also

LSF_RES_ACCTDIR

LSF_RES_ACCTDIR

Syntax

LSF_RES_ACCTDIR=directory

Description

The directory in which the RES task log file lsf.acct.host_name is stored.

If LSF_RES_ACCTDIR is not defined, the log file is stored in the /tmp directory.

Default

(UNIX)/tmp

(Windows) C:\temp

See also

LSF_RES_ACCT

LSF_RES_ACTIVE_TIME

Syntax

LSF_RES_ACTIVE_TIME=time_seconds

Description

Time in seconds before LIM reports that RES is down.

Minimum value

10 seconds

Default

90 seconds

LSF_RES_CLIENT_TIMEOUT

Syntax

LSF_RES_CLIENT_TIMEOUT=time_minutes

Description

Specifies in minutes how long an application RES waits for a new task before exiting.

CAUTION:

If you use the LSF API to run remote tasks and you define this parameter with timeout. the remote execution of the new task fails (for example, ls_rtask()).

Default

The parameter is not set; the application RES waits indefinitely for new task to come until client tells it to quit.

LSF_RES_CONNECT_RETRY

Syntax

LSF_RES_CONNECT_RETRY=integer | 0

Description

The number of attempts by RES to reconnect to NIOS.

If LSF_RES_CONNECT_RETRY is not defined, the default value is used.

Default

0

See also

LSF_NIOS_RES_HEARTBEAT

LSF_RES_DEBUG

Syntax

LSF_RES_DEBUG=1 | 2

Description

Sets RES to debug mode.

If LSF_RES_DEBUG is defined, the Remote Execution Server (RES) operates in single user mode. No security checking is performed, so RES should not run as root. RES does not look in the services database for the RES service port number. Instead, it uses port number 36002 unless LSF_RES_PORT has been defined.

Specify 1 for this parameter unless you are testing RES.

Valid values

LSF_RES_DEBUG=1

RES runs in the background with no associated control terminal.

LSF_RES_DEBUG=2

RES runs in the foreground and prints error messages to tty.

Default

Not defined

See also

LSF_LIM_DEBUG, LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR

LSF_RES_PORT

See LSF_LIM_PORT, LSF_RES_PORT, LSB_MBD_PORT, LSB_SBD_PORT.

LSF_RES_RLIMIT_UNLIM

Syntax

LSF_RES_RLIMIT_UNLIM=cpu | fsize | data | stack | core | vmem

Description

By default, RES sets the hard limits for a remote task to be the same as the hard limits of the local process. This parameter specifies those hard limits which are to be set to unlimited, instead of inheriting those of the local process.

Valid values are cpu, fsize, data, stack, core, and vmem, for CPU, file size, data size, stack, core size, and virtual memory limits, respectively.

Example

The following example sets the CPU, core size, and stack hard limits to be unlimited for all remote tasks:

LSF_RES_RLIMIT_UNLIM="cpu core stack"

Default

Not defined

LSF_RES_TIMEOUT

Syntax

LSF_RES_TIMEOUT=time_seconds

Description

Timeout when communicating with RES.

Default

15

LSF_ROOT_REX

Syntax

LSF_ROOT_REX=local

Description

UNIX only.

Allows root remote execution privileges (subject to identification checking) on remote hosts, for both interactive and batch jobs. Causes RES to accept requests from the superuser (root) on remote hosts, subject to identification checking.

If LSF_ROOT_REX is not defined, remote execution requests from user root are refused.

Theory

Sites that have separate root accounts on different hosts within the cluster should not define LSF_ROOT_REX. Otherwise, this setting should be based on local security policies.

The lsf.conf file is host-type specific and not shared across different platforms. You must make sure that lsf.conf for all your host types are changed consistently.

Default

Not defined. Root execution is not allowed.

See also

LSF_TIME_CMD, LSF_AUTH

LSF_RSH

Syntax

LSF_RSH=command [command_options]

Description

Specifies shell commands to use when the following LSF commands require remote execution:
  • badmin hstartup

  • bpeek

  • lsadmin limstartup

  • lsadmin resstartup

  • lsfrestart

  • lsfshutdown

  • lsfstartup

  • lsrcp

By default, rsh is used for these commands. Use LSF_RSH to enable support for ssh.

EGO parameter

EGO_RSH

Default

Not defined

Example

To use an ssh command before trying rsh for LSF commands, specify:
LSF_RSH="ssh -o 'PasswordAuthentication no' -o 'StrictHostKeyChecking no'"

ssh options such as PasswordAuthentication and StrictHostKeyChecking can also be configured in the global SSH_ETC/ssh_config file or $HOME/.ssh/config.

See also

ssh, ssh_config

LSF_SECUREDIR

Syntax

LSF_SECUREDIR=path

Description

Windows only; mandatory if using lsf.sudoers.

Path to the directory that contains the file lsf.sudoers (shared on an NTFS file system).

LSF_SERVER_HOSTS

Syntax

LSF_SERVER_HOSTS="host_name ..."

Description

Defines one or more server hosts that the client should contact to find a Load Information Manager (LIM). LSF server hosts are hosts that run LSF daemons and provide loading-sharing services. Client hosts are hosts that only run LSF commands or applications but do not provide services to any hosts.
Important:

LSF_SERVER_HOSTS is required for non-shared slave hosts.

Use this parameter to ensure that commands execute successfully when no LIM is running on the local host, or when the local LIM has just started. The client contacts the LIM on one of the LSF_SERVER_HOSTS and execute the command, provided that at least one of the hosts defined in the list has a LIM that is up and running.

If LSF_SERVER_HOSTS is not defined, the client tries to contact the LIM on the local host.

The host names in LSF_SERVER_HOSTS must be enclosed in quotes and separated by white space. For example:
LSF_SERVER_HOSTS="hostA hostD hostB"

The parameter string can include up to 4094 characters for UNIX or 255 characters for Windows.

Interaction with LSF_MASTER_LIST

Starting in LSF 7, LSF_MASTER_LIST must be defined in lsf.conf. You can use the same list of hosts, or a subset of the master host list, in LSF_SERVER_HOSTS. If you include the primary master host in LSF_SERVER_HOSTS, you should define it as the last host of the list.

If LSF_ADD_CLIENTS is defined in install.config at installation, lsfinstall automatically appends the hosts in LSF_MASTER_LIST to the list of hosts in LSF_SERVER_HOSTS so that the primary master host is last. For example:
LSF_MASTER_LIST="lsfmaster hostE"
LSF_SERVER_HOSTS="hostB hostC hostD hostE lsfmaster"
LSF_ADD_CLIENTS="clientHostA"

The value of LSF_SERVER_HOSTS is not changed during upgrade.

Default

Not defined

See also

LSF_MASTER_LIST

LSF_SERVERDIR

Syntax

LSF_SERVERDIR=directory

Description

Directory in which all server binaries and shell scripts are installed.

These include lim, res, nios, sbatchd, mbatchd, and mbschd. If you use elim, eauth, eexec, esub, etc, they are also installed in this directory.

Default

LSF_MACHDEP/etc

See also

LSB_ECHKPNT_METHOD_DIR

LSF_SHELL_AT_USERS

Syntax

LSF_SHELL_AT_USERS="user_name user_name ..."

Description

Applies to lstcsh only. Specifies users who are allowed to use @ for host redirection. Users not specified with this parameter cannot use host redirection in lstcsh. To specify a Windows user account, include the domain name in uppercase letters (DOMAIN_NAME\user_name).

If this parameter is not defined, all users are allowed to use @ for host redirection in lstcsh.

Default

Not defined

LSF_SHIFT_JIS_INPUT

Syntax

LSF_SHIFT_JIS_INPUT=y | n

Description

Enables LSF to accept Shift-JIS character encoding for job information (for example, user names, queue names, job names, job group names, project names, commands and arguments, esub parameters, external messages, etc.)

Default

n

LSF_STRICT_CHECKING

Syntax

LSF_STRICT_CHECKING=Y

Description

If set, enables more strict checking of communications between LSF daemons and between LSF commands and daemons when LSF is used in an untrusted environment, such as a public network like the Internet.

If you enable this parameter, you must enable it in the entire cluster, as it affects all communications within LSF. If it is used in a MultiCluster environment, it must be enabled in all clusters, or none. Ensure that all binaries and libraries are upgraded to LSF Version 7, including LSF_BINDIR, LSF_SERVERDIR and LSF_LIBDIR directories, if you enable this parameter.

If your site uses any programs that use the LSF base and batch APIs, or LSF MPI (Message Passing Interface), they need to be recompiled using the LSF Version 7 APIs before they can work properly with this option enabled.
Important:

You must shut down the entire cluster before enabling or disabling this parameter.

If LSF_STRICT_CHECKING is defined, and your cluster has slave hosts that are dynamically added, LSF_STRICT_CHECKING must be configured in the local lsf.conf on all slave hosts.

Valid value

Set to Y to enable this feature.

Default

Not defined. LSF is secure in trusted environments.

LSF_STRICT_RESREQ

Syntax

LSF_STRICT_RESREQ=Y | N

Description

When LSF_STRICT_RESREQ=Y, the resource requirement selection string must conform to the stricter resource requirement syntax described in Administering Platform LSF. The strict resource requirement syntax only applies to the select section. It does not apply to the other resource requirement sections (order, rusage, same, span, or cu).

When LSF_STRICT_RESREQ=Y in lsf.conf, LSF rejects resource requirement strings where an rusage section contains a non-consumable resource.

When LSF_STRICT_RESREQ=N, the default resource requirement selection string evaluation is performed.

Default

N

LSF_STRIP_DOMAIN

Syntax

LSF_STRIP_DOMAIN=domain_suffix[:domain_suffix ...]

Description

(Optional) If all of the hosts in your cluster can be reached using short host names, you can configure LSF to use the short host names by specifying the portion of the domain name to remove. If your hosts are in more than one domain or have more than one domain name, you can specify more than one domain suffix to remove, separated by a colon (:).

For example, given this definition of LSF_STRIP_DOMAIN,
LSF_STRIP_DOMAIN=.foo.com:.bar.com

LSF accepts hostA, hostA.foo.com, and hostA.bar.com as names for host hostA, and uses the name hostA in all output. The leading period ‘.’ is required.

Example:
LSF_STRIP_DOMAIN=.platform.com:.generic.com

In the above example, LSF accepts hostA, hostA.platform.com, and hostA.generic.com as names for hostA, and uses the name hostA in all output.

Setting this parameter only affects host names displayed through LSF, it does not affect DNS host lookup.

After adding or changing LSF_STRIP_DOMAIN, use lsadmin reconfig and badmin mbdrestart to reconfigure your cluster.

EGO parameter

EGO_STRIP_DOMAIN

Default

Not defined

LSF_TIME_CMD

Syntax

LSF_TIME_CMD=timimg_level

Description

The timing level for checking how long LSF commands run. Time usage is logged in milliseconds. Specify a positive integer.

Default

Not defined

See also

LSB_TIME_MBD, LSB_TIME_SBD, LSB_TIME_CMD, LSF_TIME_LIM, LSF_TIME_RES

LSF_TIME_LIM

Syntax

LSF_TIME_LIM=timing_level

Description

The timing level for checking how long LIM routines run.

Time usage is logged in milliseconds. Specify a positive integer.

EGO parameter

EGO_TIME_LIM

Default

Not defined

See also

LSB_TIME_CMD, LSB_TIME_MBD, LSB_TIME_SBD, LSF_TIME_RES

LSF_TIME_RES

Syntax

LSF_TIME_RES=timing_level

Description

The timing level for checking how long RES routines run.

Time usage is logged in milliseconds. Specify a positive integer.

LSF_TIME_RES is not supported on Windows.

Default

Not defined

See also

LSB_TIME_CMD, LSB_TIME_MBD, LSB_TIME_SBD, LSF_TIME_LIM

LSF_TMPDIR

Syntax

LSF_TMPDIR=directory

Description

Specifies the path and directory for temporary job output.

When LSF_TMPDIR is defined in lsf.conf, LSF creates a temporary directory under the directory specified by LSF_TMPDIR on the execution host when a job is started and sets the temporary directory environment variable (TMPDIR) for the job.

The name of the temporary directory has the following format:
$LSF_TMPDIR/job_ID.tmpdir

On UNIX, the directory has the permission 0700 and is owned by the execution user.

After adding LSF_TMPDIR to lsf.conf, use badmin hrestart all to reconfigure your cluster.

If LSB_SET_TMPDIR= Y, the environment variable TMPDIR will be set equal to the path specified by LSF_TMPDIR.

If the path specified by LSF_TMPDIR does not exist, the value of TMPDIR is set to the default path /tmp/job_ID.tmpdir.

Valid values

Specify any valid path up to a maximum length of 256 characters. The 256 character maximum path length includes the temporary directories and files that the system creates as jobs run. The path that you specify for LSF_TMPDIR should be as short as possible to avoid exceeding this limit.

UNIX

Specify an absolute path. For example:
LSF_TMPDIR=/usr/share/lsf_tmp

Windows

Specify a UNC path or a path with a drive letter. For example:
LSF_TMPDIR=\\HostA\temp\lsf_tmp
LSF_TMPDIR=D:\temp\lsf_tmp

Temporary directory for tasks launched by blaunch

By default, LSF creates a temporary directory for a job only on the first execution host. If LSF_TMPDIR is set in lsf.conf, the path of the job temporary directory on the first execution host is set to LSF_TMPDIR/job_ID.tmpdir.

If LSB_SET_TMPDIR= Y, the environment variable TMPDIR will be set equal to the path specified by LSF_TMPDIR.

Tasks launched through the blaunch distributed application framework make use of the LSF temporary directory specified by LSF_TMPDIR:
  • When the environment variable TMPDIR is set on the first execution host, the blaunch framework propagates this environment variable to all execution hosts when launching remote tasks

  • The job RES or the task RES creates the directory specified by TMPDIR if it does not already exist before starting the job

  • The directory created by the job RES or task RES has permission 0700 and is owned by the execution user

  • If the TMPDIR directory was created by the task RES, LSF deletes the temporary directory and its contents when the task is complete

  • If the TMPDIR directory was created by the job RES, LSF will delete the temporary directory and its contents when the job is done

  • If the TMPDIR directory is on a shared file system, it is assumed to be shared by all the hosts allocated to the blaunch job, so LSF does not remove TMPDIR directories created by the job RES or task RES

Default

By default, LSF_TMPDIR is not enabled. If LSF_TMPDIR is not specified in lsf.conf, this parameter is defined as follows:
  • On UNIX: $TMPDIR/job_ID.tmpdir or /tmp/job_ID.tmpdir

  • On Windows: %TMP%, %TEMP, or %SystemRoot%

LSF_ULDB_DOMAIN

Syntax

LSF_ULDB_DOMAIN="domain_name ..."

Description

LSF_ULDB_DOMAIN specifies the name of the LSF domain in the ULDB domain directive. A domain definition of name domain_name must be configured in the SGI IRIX jlimit.in input file.

Used with IRIX User Limits Database (ULDB). Configures LSF so that jobs submitted to a host with the IRIX job limits option installed are subject to the job limits configured in the IRIX User Limits Database (ULDB).

The ULDB contains job limit information that system administrators use to control access to a host on a per user basis. The job limits in the ULDB override the system default values for both job limits and process limits. When a ULDB domain is configured, the limits are enforced as IRIX job limits.

If the ULDB domain specified in LSF_ULDB_DOMAIN is not valid or does not exist, LSF uses the limits defined in the domain named batch. If the batch domain does not exist, then the system default limits are set.

When an LSF job is submitted, an IRIX job is created, and the job limits in the ULDB are applied.

Next, LSF resource usage limits are enforced for the IRIX job under which the LSF job is running. LSF limits override the corresponding IRIX job limits. The ULDB limits are used for any LSF limits that are not defined. If the job reaches the IRIX job limits, the action defined in the IRIX system is used.

IRIX job limits in the ULDB apply only to batch jobs.

See the IRIX resource administration documentation for information about configuring ULDB domains in the jlimit.in file.

LSF resource usage limits controlled by ULDB

  • PROCESSLIMIT:  Corresponds to IRIX JLIMIT_NUMPROC; fork() fails, but the existing processes continue to run

  • MEMLIMIT :  Corresponds to JLIMIT_RSS; Resident pages above the limit become prime swap candidates

  • DATALIMIT :  Corresponds to LIMIT_DATA; malloc() calls in the job fail with errno set to ENOMEM

  • CPULIMIT:  Corresponds to JLIMIT_CPU; IRIX sends SIGXCPU signal to job, then after the grace period expires, sends SIGINT, SIGTERM, and SIGKILL

  • FILELIMIT:  No corresponding IRIX limit; use process limit RLIMIT_FSIZE

  • STACKLIMIT :  No corresponding IRIX limit; use process limit RLIMIT_STACK

  • CORELIMIT:  No corresponding IRIX limit; use process limit RLIMIT_CORE

  • SWAPLIMIT:  Corresponds to JLIMIT_VMEM; use process limit RLIMIT_VMEM

Increase the default MEMLIMIT for ULDB

In some pre-defined LSF queues, such as normal, the default MEMLIMIT is set to 5000 (5 MB). However, if ULDB is enabled (LSF_ULDB_DOMAIN is defined) the MEMLIMIT should be set greater than 8000 in lsb.queues.

Default

Not defined

LSF_UNIT_FOR_LIMITS

Syntax

LSF_UNIT_FOR_LIMITS=unit

Description

Enables scaling of large units in resource usage limits.

When set, LSF_UNIT_FOR_LIMITS applies cluster-wide to limits at the job-level (bsub), queue-level (lsb.queues), and application level (lsb.applications).

The limit unit specified by LSF_UNIT_FOR_LIMITS also applies to limits modified with bmod, and the display of resource usage limits in query commands (bacct, bapp, bhist, bhosts, bjobs, bqueues, lsload, and lshosts).

Important:

Before changing the units of your resource usage limits, you should completely drain the cluster of all workload. There should be no running, pending, or finished jobs in the system.

In a MultiCluster environment, you should configure the same unit for all clusters.

Example

A job is submitted with bsub -M 100 and LSF_UNIT_FOR_LIMITS=MB; the memory limit for the job is 100 MB rather than the default 100 KB.

Valid values

unit indicates the unit for the resource usage limit, one of:
  • KB (kilobytes)

  • MB (megabytes)

  • GB (gigabytes)

  • TB (terabytes)

  • PB (petabytes)

  • EB (exabytes)

Default

KB

LSF_USE_HOSTEQUIV

Syntax

LSF_USE_HOSTEQUIV=y | Y

Description

(UNIX only; optional)

If LSF_USE_HOSTEQUIV is defined, RES and mbatchd call the ruserok() function to decide if a user is allowed to run remote jobs.

The ruserok() function checks in the /etc/hosts.equiv file and the user’s $HOME/.rhosts file to decide if the user has permission to execute remote jobs.

If LSF_USE_HOSTEQUIV is not defined, all normal users in the cluster can execute remote jobs on any host.

If LSF_ROOT_REX is set, root can also execute remote jobs with the same permission test as for normal users.

Default

Not defined

See also

LSF_ROOT_REX

LSF_USER_DOMAIN

Syntax

LSF_USER_DOMAIN=domain_name:domain_name:domain_name... .

Description

Enables the UNIX/Windows user account mapping feature, which allows cross-platform job submission and execution in a mixed UNIX/Windows environment. LSF_USER_DOMAIN specifies one or more Windows domains that LSF either strips from the user account name when a job runs on a UNIX host, or adds to the user account name when a job runs on a Windows host.

Important:

Configure LSF_USER_DOMAIN immediately after you install LSF; changing this parameter in an existing cluster requires that you verify and possibly reconfigure service accounts, user group memberships, and user passwords.

Specify one or more Windows domains, separated by a colon (:). You can enter an unlimited number of Windows domains. A period (.) specifies a local account, not a domain.

Examples

LSF_USER_DOMAIN=BUSINESS

LSF_USER_DOMAIN=BUSINESS:ENGINEERING:SUPPORT

Default

The default depends on your LSF installation:
  • If you upgrade a cluster to LSF version 7, the default is the existing value of LSF_USER_DOMAIN, if defined

  • For a new cluster, this parameter is not defined, and UNIX/Windows user account mapping is not enabled

LSF_VPLUGIN

Syntax

LSF_VPLUGIN=path

Description

The full path to the vendor MPI library libxmpi.so. Used with Platform LSF HPC features.

For PAM to access the SGI MPI libxmpi.so library, the file permission mode must be 755 (-rwxr-xr-x).

Examples

  • Platform MPI: LSF_VPLUGIN=/opt/mpi/lib/pa1.1/libmpirm.sl

  • SGI MPI: LSF_VPLUGIN=/usr/lib32/libxmpi.so

  • SGI Linux (64-bit x-86 Linux 2.6, glibc 2.3.): LSF_VPLUGIN=/usr/lib32/libxmpi.so:/usr/lib/libxmpi.so: /usr/lib64/libxmpi.so

Default

Not defined

MC_PLUGIN_REMOTE_RESOURCE

Syntax

MC_PLUGIN_REMOTE_RESOURCE=y

Description

MultiCluster job forwarding model only. By default, the submission cluster does not consider remote resources. Define MC_PLUGIN_REMOTE_RESOURCE=y in the submission cluster to allow consideration of remote resources.

Note:

When MC_PLUGIN_REMOTE_RESOURCE is defined, only the following resource requirements (boolean only) are supported: -R "type==type_name", -R "same[type]" and -R "defined(resource_name)"

Note:

When MC_PLUGIN_SCHEDULE_ENHANCE in lsb.params is defined, remote resources are considered as if MC_PLUGIN_REMOTE_RESOURCE=Y regardless of the actual value. In addition, details of the remote cluster workload are considered by the submission cluster scheduler.

Default

Not defined. The submission cluster does not consider remote resources.

See also

MC_PLUGIN_SCHEDULE_ENHANCE in lsb.params

XLSF_APPDIR

Syntax

XLSF_APPDIR=directory

Description

(UNIX only; optional) Directory in which X application default files for LSF products are installed.

The LSF commands that use X look in this directory to find the application defaults. Users do not need to set environment variables to use the Platform LSF X applications. The application default files are platform-independent.

Default

LSF_INDEP/misc

XLSF_UIDDIR

Syntax

XLSF_UIDDIR=directory

Description

(UNIX only) Directory in which Motif User Interface Definition files are stored.

These files are platform-specific.

Default

LSF_LIBDIR/uid