Job submission and modification

Job submission and modification are the most common operations in LSF batch. A user can submit jobs to the system and then modify them if the job has not been started.

lsb_submit()

LSBLIB provides lsb_submit() for job submission and lsb_modify() for job modification.

LS_LONG_INT lsb_submit(jobSubReq, jobSubReply)
LS_LONG_INT lsb_modify(jobSubReq, jobSubReply, jobId)

On success, these calls return the job ID. On failure, it returns -1, and lsberrno set to indicate the error. lsb_submit() is similar to lsb_modify(), except lsb_modify() modifies the parameters of an already submitted job.

Both of these functions use the same data structure:

struct submit      *jobSubReq;      Job specifications
struct submitReply *jobSubReply;    Results of job submission
LS_LONG_INT   jobId;                ID of the job to modify (lsb_modify()
                                     only)

submit structure

The submit structure is defined in lsbatch.h as:

struct submit {
    int    options;           Indicates which optional fields are present
    int    options2;          Indicates which additional fields are present
    char   *jobName;          Job name (optional)
    char   *queue;            Submit the job to this queue (optional)
    int    numAskedHosts;     Size of askedHosts (optional)
    char   **askedHosts;      Array of names of candidate hosts (optional)
    char   *resReq;           Resource requirements of the job (optional)
    int    rlimits[LSF_RLIM_NLIMITS];
                              Limits on system resource use by all of the
                                  job’s processes
    char   *hostSpec;         Host model used for scaling rlimits (optional)
    int    numProcessors;     Initial number of processors needed by the job
    char   *dependCond;       Job dependency condition (optional)
    char   *timeEvent         Time event string for scheduled repetitive jobs
                                  (optional)
    time_t beginTime;         Dispatch the job on or after beginTime
    time_t termTime;          Job termination deadline
    int    sigValue;          This variable is obsolete)
    char   *inFile;           Path name of the job’s standard input file
                                  (optional)
    char   *outFile;          Path name of the job’s standard output file
                                 (optional)
    char   *errFile;         Path name of the job’s standard error output file
                                 (optional)
    char   *command;         Command line of the job
    char   *newCommand       New command for bmod (optional)
    time_t chkpntPeriod;     Job is checkpointable with this period (optional)
    char   *chkpntDir;       Directory for this job’s chk directory (optional)
    int    nxf;              Size of xf (optional)
    struct xFile *xf;        Array of file transfer specifications (optional)
    char   *preExecCmd;      Job’s pre-execution command (optional)
    char   *mailUser;        User E-mail address to which the job’s output
                                 are mailed (optional)
    int    delOptions;       Bits to be removed from options 
                                 (lsb_modify() only)
    char   *projectName;     Name of the job’s project (optional)
    int    maxNumProcessors;  Requested maximum num of job slots for the
                                  job
    char   *loginShell;      Login shell to be used to re-initialize
                                 environment
    char   *userGroup;       User group
    char   *exceptList;      List of exception handlers
    int    userPriority;     User priority
    char   *rsvId;           Use hosts reserved in advance
    char   *jobGroup;        Job group under which the job runs
    char   *sla;             SLA under which the job runs
    char   *extsched;        extsched options
    int    warningTimePeriod;  Warning time period (seconds), -1 if unspecified
    char   *warningAction;   Warning action, SIGNAL | CHKPNT | command, NULL if unspecified
    char   *licenseProject;  The license scheduler project
    int    options3;         Extend options again
    int    delOptions3;      Delete options in options3 field
    char   *app;             Application profile
    int  jsdlFlag;           -1 if no -jsdl, and -jsdl_strict options
                            * 0 -jsdl_strict option
                            * 1 -jsdl option*/
    char *jsdlDoc;           jsdl filename*/
    void   *correlator;      ARM correlator */
    char *apsString;   aps string set by admin to denote system value
                           * or admin factor value
    char  *postExecCmd;     Post-execution commands specified by -Ep
    char   *cwd;            CWD specified by -cwd
    int      runtimeEstimation;  Runtime estimation specified by -We
    char *requeueEValues; /* -Q: Job level requeue exit values */
    int     initChkpntPeriod; Initial checkpoint period */
    int     migThreshold;     Migration threshold */
    char *notifyCmd;   Script or command invoked when resize request satisfied
};

For a complete description of the fields in the submit structure, see the lsb_submit(3) man page.

submitReply structure

The submitReply structure is defined in lsbatch.h as

struct submitReply {
    char   *queue;            Queue name the job was submitted to
    LS_LONG_INT badJobId;     dependCond contains badJobId but there is
                                  no such job
    char   *badJobName;       dependCond contains badJobName but 
                                  there is no such job
    int    badReqIndx;        Index of a host or resource limit that caused
                                  an error
};

The last three variables in the structure submitReply are only used when the lsb_submit() or lsb_modify() fail.

For a complete description of the fields in the submitReply structure, see the lsb_submit(3) man page.

To submit a new job, fill out this data structure and then call lsb_submit(). The delOptions variable is ignored by LSF batch for lsb_submit().

Example

The example job submission program below takes the job command line as an argument and submits the job to LSF batch. For simplicity, it is assumed that the job command does not have arguments.

/******************************************************
* LSBLIB -- Examples
*
* simple bsub
* This program submits a batch job to LSF 
* It is the equivalent of using the "bsub" command without 
* any options.
******************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <lsf/lsbatch.h>
#include "combine_arg.h"
    /* To use the function "combine_arg" to combine arguments on the         command line include its header file "combine_arg.h". */
int main(int argc, char **argv) 
{
    struct submit req;           /* job specifications */
    memset(&req, 0, sizeof(req)); /* initializes req */
    struct submitReply  reply;  /* results of job submission */ 
    int  jobId;                 /* job ID of submitted job */
    int  i;
    /* initialize LSBLIB  and  get  the  configuration
    environment */
    if (lsb_init(argv[0]) < 0) {
        lsb_perror("simbsub: lsb_init() failed");
        exit(-1);
    }
    /* check if input is in the right format: "./simbsub
    COMMAND ARGUMENTS" */
    if (argc < 2) {
    fprintf(stderr, "Usage: simbsub command\n");
    exit(-1);
    }
    /* options and options2 are bitwise inclusive OR of some of
    the SUB_* flags */
    req.options = 0;
    req.options2 = 0;
    for (i = 0; i < LSF_RLIM_NLIMITS; i++)    /* resource
                                              limits are
                                              initialized to
                                              default */
        req.rLimits[i] = DEFAULT_RLIMIT;
    req.beginTime = 0;
        /* specific date and time to dispatch the job */
    req.termTime  = 0;
        /* specifies job termination deadline */
    req.numProcessors = 1; 
/* initial number of processors needed by a (parallel) job */
    req.maxNumProcessors = 1;   
/* max num of processors required to run the (parallel) job */
    req.command = combine_arg(argc,argv);
        /* command line of job */
printf("----------------------------------------------\n");
    jobId = lsb_submit(&req, &reply);
        /* submit the job with specifications */
    if (jobId < 0)
        /* if job submission fails, lsb_submit returns -1 */
    switch (lsberrno) {
        /* and sets lsberrno to indicate the error */
        case LSBE_QUEUE_USE:
        case LSBE_QUEUE_CLOSED:
            lsb_perror(reply.queue);
            exit(-1);
        default:
            lsb_perror(NULL);
            exit(-1);
    }
    exit(0);
} 
/* main */

The above program will produce output similar to the following:

Job <5602> is submitted to default queue <default>.

Sample program explanations

Options and options2

req.options = 0;

req.options2 = 0;

The options and options2 fields of the submit structure are the bitwise inclusive OR of some of the SUB_* flags defined in lsbatch.h. These flags serve two purposes.

Some flags indicate which of the optional fields of the submit structure are present. Those that are not present have default values.

Other flags indicate submission options. For a description of these flags, see lsb_submit(3).

Since options indicate which of the optional fields are meaningful, the programmer does not need to initialize the fields that are not chosen by options. All parameters that are not optional must be initialized properly.

numProcessors and maxNumProcessors

req.numProcessors = 1;

/* initial number of processors needed by a (parallel) job */
    req.maxNumProcessors = 1;
/* max number of processors required to run the (parallel) job */

numProcessors and maxNumProcessors are initialized to ensure only one processor is requested. They are defined in order to synchronize the job specification in lsb_submit() to the default used by bsub.

If the resReq field of the submit structure is NULL, then LSBLIB will try to obtain resource requirements for a command from the remote task list. If the task does not appear in the remote task list, then NULL is passed to LSF batch. mbatchd uses the default resource requirements with option DFT_FROMTYPE bit set when making a LSLIB call for host selection from LIM.

rLimits[LSF_RLIM_NLIMITS] and hostSpec

for (i = 0; i < LSF_RLIM_NLIMITS; i++)

        /* resource limits are initialized to default */
        req.rLimits[i] = DEFAULT_RLIMIT;

The default resource limit (DEFAULT_RLIMIT) defined in lsf.h are for no resource limits.

The constants used to index the rlimits array of the submit structure is defined in lsf.h. The resource limits currently supported by LSF batch are listed below.

Resource Limit

Index in rlimits Array

CPU time limit (in seconds)

LSF_RLIMIT_CPU

File size limit (in kilobytes)

LSF_RLIMIT_FSIZE

Data size limit (in kilobytes)

LSF_RLIMIT_DATA

Stack size limit

LSF_RLIMIT_STACK

Core file size limit (in kilobytes)

LSF_RLIMIT_CORE

Resident memory size limit (in kilobytes)

LSF_RLIMIT_RSS

Number of open files limit

LSF_RLIMIT_NOFILE

Number of open files limit (for HP-UX)

LSF_RLIMIT_OPEN_MAX

Virtual memory limit (same as max swap memory)

LSF_RLIMIT_SWAP

Wall-clock time run limit

LSF_RLIMIT_RUN

Maximum num of processes a job can fork

LSF_RLIMIT_PROCESS

Thread number limit

LSF_RLIMIT_THREAD

The hostSpec field of the submit structure specifies the host model to use for scaling rlimits[LSF_RLIMIT_CPU] and rlimits[LSF_RLIMIT_RUN] (See lsb_queueinfo(3)). If hostSpec is NULL, the local host’s model is assumed.

beginTime and termTime

req.beginTime = 0;/* specific date and time to dispatch the job */

    req.termTime  = 0;/* specifies job termination deadline */

If the beginTime field of the submit structure is 0, start the job as soon as possible.

A USR2 signal is sent if the job is running at termTime. If the job does not terminate within 10 minutes after being sent this signal, it is killed. If the termTime field of the submit structure is 0, the job is allowed to run until it reaches a resource limit.

lsberrno

The example below checks the value of lsberrno when lsb_submit() fails:

    if (jobId < 0)
        /* if job submission fails, lsb_submit returns -1 */
    switch (lsberrno) {
        /* and sets lsberrno to indicate the error */
    case LSBE_QUEUE_USE:
    case LSBE_QUEUE_CLOSED:
    lsb_perror(reply.queue);
    exit(-1);
    default:
    lsb_perror(NULL);
    exit(-1);
}

Different actions are taken depending on the type of the error. All possible error numbers are defined in lsbatch.h. For example, error number LSBE_QUEUE_USE indicates that the user is not authorized to use the queue. The error number LSBE_QUEUE_CLOSED indicates that the queue is closed.

Since a queue name was not specified for the job, the job is submitted to the default queue. The queue field of the submitReply structure contains the name of the queue to which the job was submitted.

The above program will produce output similar to the following:

Job <5602> is submitted to default queue <default>.

The output from the job is mailed to the user because the program did not specify a file name for the outFile parameter in the submit structure.

The program assumes that uniform user names and user ID spaces exist among all the hosts in the cluster. That is, a job submitted by a given user will run under the same user's account on the execution host. For situations where non-uniform user names and user ID spaces exist, account mapping must be used to determine the account used to run a job.

If you are familiar with the bsub command, it may help to know how the fields in the submit structure relate to the bsub command options. This is provided in the following table.

bsub Option

submit Field

options

-J job_name_spec

jobName

SUB_JOB_NAME

-q queue_name

queue

SUB_QUEUE

-m host_name[+[pref_level]]

askedHosts

SUB_HOST

-n min_proc[,max_proc]

numProcessors,

maxNumProcessors

-R res_req

resReq

SUB_RES_REQ

-c cpu_limit[/host_spec]

rlimits[LSF_RLIMIT_

CPU] / hostSpec **

SUB_HOST_SPEC (if host_spec is specified)

-W run_limit[/host_spec]

rlimits[LSF_RLIMIT_

RUN] / hostSpec**

SUB_HOST_SPEC (if host_spec is specified)

-F file_limit

rlimits[LSF_RLIMIT_

FSIZE]**

-M mem_limit

rlimits[LSF_RLIMIT_

RSS]**

-D data_limit

rlimits[LSF_RLIMIT_

DATA]**

-S stack_limit

rlimits[LSF_RLIMIT_

STACK**

-C core_limit

rlimits[LSF_RLIMIT_

CORE]**

-k "chkpnt_dir [chkpnt_period]"

chkpntDir, chkpntPeriod

SUB_CHKPNT_DIR, SUB_CHKPNT_DIR (if chkpntPeriod is specified)

-w depend_cond

dependCond

SUB_DEPEND_COND

-b begin_time

beginTime

-t term_time

TermTime

-i in_file

inFile

SUB_IN_FILE

-o out_file

outFile

SUB_OUT_FILE

-e err_file

errFile

SUB_ERR_FILE

-u mail_user

mailUser

SUB_MAIL_USER

-f "lfile op [rfile]"

xf

-E "pre_exec_cmd [arg]"

preExecCmd

SUB_PRE_EXEC

-L login_shell

loginShell

SUB_LOGIN_SHELL

-P project_name

projectName

SUB_PROJECT_NAME

-G user_group

userGroup

SUB_USER_GROUP

-H

SUB2_HOLD*

-x

SUB_EXCLUSIVE

-r

SUB_RERUNNABLE

-N

SUB_NOTIFY_END

-B

SUB_NOTIFY_

BEGIN

-I

SUB_INTERACTIVE

-Ip

SUB_PTY

-Is

SUB_PTY_SHELL

-K

SUB2_BSUB_BLOCK*

- X "except_cond::action"

exceptList

SUB_EXCEPT

-T time_event

timeEvent

SUB_TIME_EVENT

* indicates a bitwise OR mask for options2.

** indicates -1 means undefined

Even if all the options are not used, all optional string fields must be initialized to the empty string. For a complete description of the fields in the submit structure, see the lsb_submit(3) man page.

To modify an already submitted job, fill out a new submit structure to override existing parameters, and use delOptions to remove option bits that were previously specified for the job. Modifying a submitted job is like re-submitting the job. Thus a similar program can be used to modify an existing job with minor changes. One additional parameter that must be specified for job modification is the job Id. The parameter delOptions can also be set if you want to clear some option bits that were previously set.

All applications that call lsb_submit() and lsb_modify() are subject to authentication constraints described in .