On successful completion of the job control action, the LSF job control commands cause the status of a job to change.
The environment variable LS_EXEC_T is set to the value JOB_CONTROLS for a job when a job control action is initiated.
Change a suspended job from SSUSP, USUSP, or PSUSP state to the RUN state. The default action is to send the signal SIGCONT.
Terminate a job. This usually causes the job change to EXIT status. The default action is to send SIGINT first, then send SIGTERM 10 seconds after SIGINT, then send SIGKILL 10 seconds after SIGTERM. The delay between signals allows user programs to catch the signals and clean up before the job terminates.
To override the 10 second interval, use the parameter JOB_TERMINATE_INTERVAL in the lsb.params file. See the Platform LSF Configuration Reference for information about the lsb.params file.
If the execution of an action is in progress, no further actions are initiated unless it is the TERMINATE action. A TERMINATE action is issued for all job states except PEND.
On Windows, actions equivalent to the UNIX signals have been implemented to do the default job control actions. Job control messages replace the SIGINT and SIGTERM signals, but only customized applications will be able to process them. Termination is implemented by the TerminateProcess() system call.
See Platform LSF Programmer’s Guide for more information about LSF signal handling on Windows.
Notifying users when their jobs are suspended, resumed, or terminated
An application holds resources (for example, licenses) that are not freed by suspending the job. The administrator can set up an action to be performed that causes the license to be released before the job is suspended and re-acquired when the job is resumed.
A distributed parallel application must receive a catchable signal when the job is suspended, resumed or terminated to propagate the signal to remote processes.
To override the default actions for the SUSPEND, RESUME, and TERMINATE job controls, specify the JOB_CONTROLS parameter in the queue definition in lsb.queues.
Begin Queue...JOB_CONTROLS = SUSPEND[signal | CHKPNT | command] \RESUME[signal | command] \TERMINATE[signal | CHKPNT | command]...End Queue
When LSF needs to suspend, resume, or terminate a job, it invokes one of the following actions as specified by SUSPEND, RESUME, and TERMINATE.
A UNIX signal name (for example, SIGTSTP or SIGTERM). The specified signal is sent to the job.
The same set of signals is not supported on all UNIX systems. To display a list of the symbolic names of the signals (without the SIG prefix) supported on your system, use the kill -l command.
The command line for the action is run with /bin/sh -c so you can use shell features in the command.
LSB_SUSP_REASONS : An integer representing a bitmap of suspending reasons as defined in lsbatch.h. The suspending reason can allow the command to take different actions based on the reason for suspending the job.
LSB_SUSP_SUBREASONS: An integer representing the load index that caused the job to be suspended. When the suspending reason SUSP_LOAD_REASON (suspended by load) is set in LSB_SUSP_REASONS, LSB_SUSP_SUBREASONS is set to one of the load index values defined in lsf.h.
The standard input, output, and error of the command are redirected to the NULL device, so you cannot tell directly whether the command runs correctly. The default null device on UNIX is /dev/null.
You should make sure the command line is correct. If you want to see the output from the command line for testing purposes, redirect the output to a file inside the command line.
Use caution when configuring TERMINATE job actions that do more than just kill a job. For example, resource usage limits that terminate jobs change the job state to SSUSP while LSF waits for the job to end. If the job is not killed by the TERMINATE action, it remains suspended indefinitely.
In certain situations you may want to terminate the job instead of calling the default SUSPEND action. For example, you may want to kill jobs if the run window of the queue is closed. Use the TERMINATE_WHEN parameter to configure the queue to invoke the TERMINATE action instead of SUSPEND.
See the Platform LSF Configuration Reference for information about the lsb.queues file and the TERMINATE_WHEN parameter.
Use LSB_SIGSTOP to configure the SIGSTOP signal sent by the default SUSPEND action.
If LSB_SIGSTOP is set to anything other than SIGSTOP, the SIGTSTP signal that is normally sent by the SUSPEND action is not sent. For example, if LSB_SIGSTOP=SIGKILL, the three default signals sent by the TERMINATE action (SIGINT, SIGTERM, and SIGKILL) are sent 10 seconds apart.
Do not configure a job control to contain the signal or command that is the same as the action associated with that job control. This will cause a deadlock between the signal and the action.
For example, the bkill command uses the TERMINATE action, so a deadlock results when the TERMINATE action itself contains the bkill command.
LSF supports signal conversion between UNIX and Windows for remote interactive execution through RES.
On Windows, the CTRL+C and CTRL+BREAK key combinations are treated as signals for console applications (these signals are also called console control actions).
LSF supports these two Windows console signals for remote interactive execution. LSF regenerates these signals for user tasks on the execution host.
For example, if you issue the lsrun or bsub -I commands from a Windows console but the task is running on an UNIX host, pressing the CTRL+C keys will generate a UNIX SIGINT signal to your task on the UNIX host. The opposite is also true.
Here, SIGXXXX/SIGYYYY are UNIX signal names such as SIGQUIT, SIGTINT, etc. The conversions will then be: CTRL+C=SIGXXXX and CTRL+BREAK=SIGYYYY.
If both LSF_NT2UNIX_CTRLC and LSF_NT2UNIX_CTRLB are set to the same value (LSF_NT2UNIX_CTRLC=SIGXXXX and LSF_NT2UNIX_CTRLB=SIGXXXX), CTRL+C will be generated on the Windows execution host.
For bsub -I, there is no conversion other than the default conversion.