Knowledge Center         Contents    Previous  Next    Index  
Platform Computing Corp.

Interactive Jobs with bsub

Contents

About Interactive Jobs

It is sometimes desirable from a system management point of view to control all workload through a single centralized scheduler.

Running an interactive job through the LSF batch system allows you to take advantage of batch scheduling policies and host selection features for resource-intensive jobs. You can submit a job and the least loaded host is selected to run the job.

Since all interactive batch jobs are subject to LSF policies, you will have more control over your system. For example, you may dedicate two servers as interactive servers, and disable interactive access to all other servers by defining an interactive queue that only uses the two interactive servers.

Scheduling policies

Running an interactive batch job allows you to take advantage of batch scheduling policies and host selection features for resource-intensive jobs.

An interactive batch job is scheduled using the same policy as all other jobs in a queue. This means an interactive job can wait for a long time before it gets dispatched. If fast response time is required, interactive jobs should be submitted to high-priority queues with loose scheduling constraints.

Interactive queues

You can configure a queue to be interactive-only, batch-only, or both interactive and batch with the parameter INTERACTIVE in lsb.queues.

See the Platform LSF Configuration Reference for information about configuring interactive queues in the lsb.queues file.

Interactive jobs with non-batch utilities

Non-batch utilities such as lsrun, lsgrun, etc., use LIM simple placement advice for host selection when running interactive tasks. For more details on using non-batch utilities to run interactive tasks, see Running Interactive and Remote Tasks.

Submitting Interactive Jobs

Use the bsub -I option to submit batch interactive jobs, and the bsub -Is and -Ip options to submit batch interactive jobs in pseudo-terminals.

Pseudo-terminals are not supported for Windows.

For more details, see the bsub command.

Finding out which queues accept interactive jobs

Before you submit an interactive job, you need to find out which queues accept interactive jobs with the bqueues -l command.

If the output of this command contains the following, this is a batch-only queue. This queue does not accept interactive jobs:

SCHEDULING POLICIES:  NO_INTERACTIVE 

If the output contains the following, this is an interactive-only queue:

SCHEDULING POLICIES:  ONLY_INTERACTIVE 

If none of the above are defined or if SCHEDULING POLICIES is not in the output of bqueues -l, both interactive and batch jobs are accepted by the queue.

You configure interactive queues in the lsb.queues file.

Submit an interactive job

  1. Use the bsub -I option to submit an interactive batch job.
  2. For example:

    bsub -I ls

    Submits a batch interactive job which displays the output of ls at the user's terminal.

    bsub -I -q interactive -n 4,10 lsmake
    <<Waiting for dispatch ...>>

    This example starts Platform Make on 4 to 10 processors and displays the output on the terminal.

    A new job cannot be submitted until the interactive job is completed or terminated.

    When an interactive job is submitted, a message is displayed while the job is awaiting scheduling. The bsub command stops display of output from the shell until the job completes, and no mail is sent to the user by default. A user can issue a ctrl-c at any time to terminate the job.

    Interactive jobs cannot be checkpointed.

    Interactive batch jobs cannot be rerunnable (bsub -r)

    You can submit interactive batch jobs to rerunnable queues (RERUNNABLE=y in lsb.queues) or rerunnable application profiles (RERUNNABLE=y in lsb.applications).

Submit an interactive job by using a pseudo-terminal

Submission of interaction jobs using pseudo-terminal is not supported for Windows for either lsrun or bsub LSF commands.

bsub -Ip
  1. To submit a batch interactive job by using a pseudo-terminal, use the bsub -Ip option.
  2. For example:

    % bsub -Ip vi myfile 
     

    Submits a batch interactive job to edit myfile.

    When you specify the -Ip option, bsub submits a batch interactive job and creates a pseudo-terminal when the job starts. Some applications such as vi for example, require a pseudo-terminal in order to run correctly.

bsub -Is
  1. To submit a batch interactive job and create a pseudo-terminal with shell mode support, use the bsub -Is option.
  2. For example:

    % bsub -Is csh 
     

    Submits a batch interactive job that starts up csh as an interactive shell.

    When you specify the -Is option, bsub submits a batch interactive job and creates a pseudo-terminal with shell mode support when the job starts. This option should be specified for submitting interactive shells, or applications which redefine the CTRL-C and CTRL-Z keys (for example, jove).

Submit an interactive job and redirect streams to files

bsub -i, -o, -e

You can use the -I option together with the -i, -o, and -e options of bsub to selectively redirect streams to files. For more details, see the bsub(1) man page.

  1. To save the standard error stream in the job.err file, while standard input and standard output come from the terminal:
  2. bsub -I -q interactive -e job.err lsmake 
    
Split stdout and stderr

If in your environment there is a wrapper around bsub and LSF commands so that end-users are unaware of LSF and LSF-specific options, you can redirect standard output and standard error of batch interactive jobs to a file with the > operator.

By default, both standard error messages and output messages for batch interactive jobs are written to stdout on the submission host.

  1. To write both stderr and stdout to mystdout:
  2. bsub -I myjob 2>mystderr 1>mystdout 
    
  3. To redirect both stdout and stderr to different files, set LSF_INTERACTIVE_STDERR=y in lsf.conf or as an environment variable.
  4. For example, with LSF_INTERACTIVE_STDERR set:

    bsub -I myjob 2>mystderr 1>mystdout 
     

    stderr is redirected to mystderr, and stdout to mystdout.

    See the Platform LSF Configuration Reference for more details on LSF_INTERACTIVE_STDERR.

Submit an interactive job, redirect streams to files, and display streams

When using any of the interactive bsub options (for example: -I, -Is, -ISs) as well as the -o or -e options, you can also have your output displayed on the console by using the -tty option.

  1. To run an interactive job, redirect the error stream to file, and display the stream to the console:
  2. bsub -I -q interactive -e job.err -tty lsmake

Performance Tuning for Interactive Batch Jobs

LSF is often used on systems that support both interactive and batch users. On one hand, users are often concerned that load sharing will overload their workstations and slow down their interactive tasks. On the other hand, some users want to dedicate some machines for critical batch jobs so that they have guaranteed resources. Even if all your workload is batch jobs, you still want to reduce resource contentions and operating system overhead to maximize the use of your resources.

Numerous parameters can be used to control your resource allocation and to avoid undesirable contention.

Types of load conditions

Since interferences are often reflected from the load indices, LSF responds to load changes to avoid or reduce contentions. LSF can take actions on jobs to reduce interference before or after jobs are started. These actions are triggered by different load conditions. Most of the conditions can be configured at both the queue level and at the host level. Conditions defined at the queue level apply to all hosts used by the queue, while conditions defined at the host level apply to all queues using the host.

Scheduling conditions

These conditions, if met, trigger the start of more jobs. The scheduling conditions are defined in terms of load thresholds or resource requirements.

At the queue level, scheduling conditions are configured as either resource requirements or scheduling load thresholds, as described in lsb.queues. At the host level, the scheduling conditions are defined as scheduling load thresholds, as described in lsb.hosts.

Suspending conditions

These conditions affect running jobs. When these conditions are met, a SUSPEND action is performed to a running job.

At the queue level, suspending conditions are defined as STOP_COND as described in lsb.queues or as suspending load threshold. At the host level, suspending conditions are defined as stop load threshold as described in lsb.hosts.

Resuming conditions

These conditions determine when a suspended job can be resumed. When these conditions are met, a RESUME action is performed on a suspended job.

At the queue level, resume conditions are defined as by RESUME_COND in lsb.queues, or by the loadSched thresholds for the queue if RESUME_COND is not defined.

Types of load indices

To effectively reduce interference between jobs, correct load indices should be used properly. Below are examples of a few frequently used parameters.

Paging rate (pg)

The paging rate (pg) load index relates strongly to the perceived interactive performance. If a host is paging applications to disk, the user interface feels very slow.

The paging rate is also a reflection of a shortage of physical memory. When an application is being paged in and out frequently, the system is spending a lot of time performing overhead, resulting in reduced performance.

The paging rate load index can be used as a threshold to either stop sending more jobs to the host, or to suspend an already running batch job to give priority to interactive users.

This parameter can be used in different configuration files to achieve different purposes. By defining paging rate threshold in lsf.cluster.cluster_name, the host will become busy from LIM's point of view; therefore, no more jobs will be advised by LIM to run on this host.

By including paging rate in queue or host scheduling conditions, jobs can be prevented from starting on machines with a heavy paging rate, or can be suspended or even killed if they are interfering with the interactive user on the console.

A job suspended due to pg threshold will not be resumed even if the resume conditions are met unless the machine is interactively idle for more than PG_SUSP_IT seconds.

Interactive idle time (it)

Strict control can be achieved using the idle time (it) index. This index measures the number of minutes since any interactive terminal activity. Interactive terminals include hard wired ttys, rlogin and lslogin sessions, and X shell windows such as xterm. On some hosts, LIM also detects mouse and keyboard activity.

This index is typically used to prevent batch jobs from interfering with interactive activities. By defining the suspending condition in the queue as it<1 && pg>50, a job from this queue will be suspended if the machine is not interactively idle and the paging rate is higher than 50 pages per second. Furthermore, by defining the resuming condition as it>5 && pg<10 in the queue, a suspended job from the queue will not resume unless it has been idle for at least five minutes and the paging rate is less than ten pages per second.

The it index is only non-zero if no interactive users are active. Setting the it threshold to five minutes allows a reasonable amount of think time for interactive users, while making the machine available for load sharing, if the users are logged in but absent.

For lower priority batch queues, it is appropriate to set an it suspending threshold of two minutes and scheduling threshold of ten minutes in the lsb.queues file. Jobs in these queues are suspended while the execution host is in use, and resume after the host has been idle for a longer period. For hosts where all batch jobs, no matter how important, should be suspended, set a per-host suspending threshold in the lsb.hosts file.

CPU run queue length (r15s, r1m, r15m)

Running more than one CPU-bound process on a machine (or more than one process per CPU for multiprocessors) can reduce the total throughput because of operating system overhead, as well as interfering with interactive users. Some tasks such as compiling can create more than one CPU-intensive task.

Queues should normally set CPU run queue scheduling thresholds below 1.0, so that hosts already running compute-bound jobs are left alone. LSF scales the run queue thresholds for multiprocessor hosts by using the effective run queue lengths, so multiprocessors automatically run one job per processor in this case.

For short to medium-length jobs, the r1m index should be used. For longer jobs, you might want to add an r15m threshold. An exception to this are high priority queues, where turnaround time is more important than total throughput. For high priority queues, an r1m scheduling threshold of 2.0 is appropriate.

See Load Indices for the concept of effective run queue length.

CPU utilization (ut)

The ut parameter measures the amount of CPU time being used. When all the CPU time on a host is in use, there is little to gain from sending another job to that host unless the host is much more powerful than others on the network. A ut threshold of 90% prevents jobs from going to a host where the CPU does not have spare processing cycles.

If a host has very high pg but low ut, then it may be desirable to suspend some jobs to reduce the contention.

Some commands report ut percentage as a number from 0-100, some report it as a decimal number between 0-1. The configuration parameter in the lsf.cluster.cluster_name file, the configuration files, and the bsub -R resource requirement string take a fraction in the range from 0 to 1.

The command bhist shows the execution history of batch jobs, including the time spent waiting in queues or suspended because of system load.

The command bjobs -p shows why a job is pending.

Scheduling conditions and resource thresholds

Three parameters, RES_REQ, STOP_COND and RESUME_COND, can be specified in the definition of a queue. Scheduling conditions are a more general way for specifying job dispatching conditions at the queue level. These parameters take resource requirement strings as values which allows you to specify conditions in a more flexible manner than using the loadSched or loadStop thresholds.

Interactive Batch Job Messaging

LSF can display messages to stderr or the Windows console when the following changes occur with interactive batch jobs:

Other job status changes, like switching the job's queue, are not displayed.

Limitations

Interactive batch job messaging is not supported in a MultiCluster environment.

Windows

Interactive batch job messaging is not fully supported on Windows. Only changes in the job state that occur before the job starts running are displayed. No messages are displayed after the job starts.

Configure interactive batch job messaging

Messaging for interactive batch jobs can be specified cluster-wide or in the user environment.

Cluster level
  1. To enable interactive batch job messaging for all users in the cluster, the LSF administrator configures the following parameters in lsf.conf:
User level
  1. To enable messaging for interactive batch jobs, LSF users can define LSB_INTERACT_MSG_ENH and LSB_INTERACT_MSG_INTVAL as environment variables.

The user-level definition of LSB_INTERACT_MSG_ENH overrides the definition in lsf.conf.

Example messages

Job in pending state

The following example shows messages displayed when a job is in pending state:

bsub -Is -R "ls < 2" csh
Job <2812> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>

<<  Job's resource requirements not satisfied: 2 hosts; >>
<<  Load information unavailable: 1 host; >>

<<  Just started a job recently: 1 host; >>
<<  Load information unavailable: 1 host; >>
<<  Job's resource requirements not satisfied: 1 host; >> 
Job terminated by user

The following example shows messages displayed when a job in pending state is terminated by the user:

bsub -m hostA -b 13:00 -Is sh
Job <2015> is submitted to default queue <normal>.
Job will be scheduled after Fri Nov 19 13:00:00 1999
<<Waiting for dispatch ...>>

<< New job is waiting for scheduling >>

<< The job has a specified start time >>

bkill 2015
<< Job <2015> has been terminated by user or administrator >>

<<Terminated while pending>> 
Job suspended then resumed

The following example shows messages displayed when a job is dispatched, suspended, and then resumed:

bsub -m hostA -Is sh
Job <2020> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>

<< New job is waiting for scheduling >>
<<Starting on hostA>>
bstop 2020
<< The job was suspended by user >>

bresume 2020
<< Waiting for re-scheduling after being resumed by user >> 

Running X Applications with bsub

You can start an X session on the least loaded host by submitting it as a batch job:

bsub xterm 

An xterm is started on the least loaded host in the cluster.

When you run X applications using lsrun or bsub, the environment variable DISPLAY is handled properly for you. It behaves as if you were running the X application on the local machine.

Configure SSH X11 forwarding for jobs

Prerequisites: X11 forwarding must already be working outside LSF.

  1. Install SSH and enable X11 forwarding for all hosts that will submit and run these jobs (UNIX hosts only).
  2. (Optional) In lsf.conf, specify an SSH command for LSB_SSH_XFORWARD_CMD.
  3. The command can include full PATH and options.

Writing Job Scripts

You can build a job file one line at a time, or create it from another file, by running bsub without specifying a job to submit. When you do this, you start an interactive session in which bsub reads command lines from the standard input and submits them as a single batch job. You are prompted with bsub> for each line.

You can use the bsub -Zs command to spool a file.

For more details on bsub options, see the bsub(1) man page.

Writing a job file one line at a time

UNIX example
% bsub -q simulation
bsub> cd /work/data/myhomedir
bsub> myjob arg1 arg2 ......
bsub> rm myjob.log
bsub> ^D
Job <1234> submitted to queue <simulation>. 

In the above example, the 3 command lines run as a Bourne shell (/bin/sh) script. Only valid Bourne shell command lines are acceptable in this case.

Windows example
C:\> bsub -q simulation
bsub> cd \\server\data\myhomedir
bsub> myjob arg1 arg2 ......
bsub> del myjob.log
bsub> ^Z
Job <1234> submitted to queue <simulation>. 

In the above example, the 3 command lines run as a batch file (.BAT). Note that only valid Windows batch file command lines are acceptable in this case.

Specifying job options in a file

In this example, options to run the job are specified in the options_file.

% bsub -q simulation < options_file
Job <1234> submitted to queue <simulation>. 
UNIX

On UNIX, the options_file must be a text file that contains Bourne shell command lines. It cannot be a binary executable file.

Windows

On Windows, the options_file must be a text file containing Windows batch file command lines.

Spooling a job command file

Use bsub -Zs to spool a job command file to the directory specified by the JOB_SPOOL_DIR parameter in lsb.params, and use the spooled file as the command file for the job.

Use the bmod -Zsn command to modify or remove the command file after the job has been submitted. Removing or modifying the original input file does not affect the submitted job.

Redirecting a script to bsub standard input

You can redirect a script to the standard input of the bsub command:

% bsub < myscript
Job <1234> submitted to queue <test>. 

In this example, the myscript file contains job submission options as well as command lines to execute. When the bsub command reads a script from its standard input, it can be modified right after bsub returns for the next job submission.

When the script is specified on the bsub command line, the script is not spooled:

% bsub myscript
Job <1234> submitted to default queue <normal>. 

In this case the command line myscript is spooled, instead of the contents of the myscript file. Later modifications to the myscript file can affect job behavior.

Specifying embedded submission options

You can specify job submission options in scripts read from standard input by the bsub command using lines starting with #BSUB:

% bsub -q simulation
bsub> #BSUB -q test
bsub> #BSUB -o outfile -R "mem>10"
bsub> myjob arg1 arg2
bsub> #BSUB -J simjob
bsub> ^D
Job <1234> submitted to queue <simulation>. 

Note that:

Running a job under a particular shell

By default, LSF runs batch jobs using the Bourne (/bin/sh) shell. You can specify the shell under which a job is to run. This is done by specifying an interpreter in the first line of the script.

For example:

% bsub
bsub> #!/bin/csh -f
bsub> set coredump=`ls |grep core`
bsub> if ( "$coredump" != "") then
bsub> mv core core.`date | cut -d" " -f1`
bsub> endif
bsub> myjob
bsub> ^D
Job <1234> is submitted to default queue <normal>. 

The bsub command must read the job script from standard input to set the execution shell. If you do not specify a shell in the script, the script is run using /bin/sh. If the first line of the script starts with a # not immediately followed by an exclamation mark (!), then /bin/csh is used to run the job.

For example:

% bsub
bsub> # This is a comment line. This tells the system to use /bin/csh 
to
bsub> # interpret the script.
bsub>
bsub> setenv DAY `date | cut -d" " -f1`
bsub> myjob
bsub> ^D
Job <1234> is submitted to default queue <normal>. 

If running jobs under a particular shell is required frequently, you can specify an alternate shell using a command-level job starter and run your jobs interactively. See Controlling Execution Environment Using Job Starters for more details.

Registering utmp File Entries for Interactive Batch Jobs

LSF administrators can configure the cluster to track user and account information for interactive batch jobs submitted with bsub -Ip or bsub -Is. User and account information is registered as entries in the UNIX utmp file, which holds information for commands such as who. Registering user information for interactive batch jobs in utmp allows more accurate job accounting.

Configuration and operation

To enable utmp file registration, the LSF administrator sets the LSB_UTMP parameter in lsf.conf.

When LSB_UTMP is defined, LSF registers the job by adding an entry to the utmp file on the execution host when the job starts. After the job finishes, LSF removes the entry for the job from the utmp file.

Limitations


Platform Computing Inc.
www.platform.com
Knowledge Center         Contents    Previous  Next    Index