Force a job to run

After a job is submitted to the LSF batch system, it remains pending until LSF batch runs it (for details on the factors that govern when and where a job starts to run, see Administering Platform LSF).

lsb_runjob()

A job can be forced to run on a specified list of hosts immediately using the following LSBLIB function:

int lsb_runjob (struct runJobRequest *runReq)

runJobReq Structure

lsb_runjob() takes the runJobRequest structure, which is defined in lsbatch.h:

struct runJobRequest {
    LS_LONG_INT  jobId;             Job ID of the job to start
    int          numHosts;          Number of hosts to run the job on
    char         **hostname;        Host names where jobs run
#define RUNJOB_OPT_NORMAL     0x01
#define RUNJOB_OPT_NOSTOP     0x02
#define RUNJOB_OPT_PENDONLY   0x04     Pending jobs only, no finished jobs
#define RUNJOB_OPT_FROM_BEGIN 0x08     Checkpoint jobs only, from beginning
#define RUNJOB_OPT_FREE       0x10     brun to use free CPUs only
    int          options;           Run job request options
    int          *slots;            Number of slots per host
}

To force a job to run, the job must have been submitted and in either PEND or FINISHED state. Only the LSF administrator or the owner of the job can start the job. lsb_runjob() restarts a job in FINISHED status.

A job can be run without any scheduling constraints such as job slot limits. If the job is started with the options field being 0 or RUNJOB_OPT_NORMAL, then the job is subject to the:

  • Run windows in the default queue

  • Queue threshold

  • Execution hosts for the job

To override a started, use RUNJOB_OPT_NOSTOP and the job will not be stopped due to the above mentioned load conditions. However, all LSBLIB's job manipulation APIs can still be applied to the job.

Example

The following is an example program that runs a specified job on a host that has no batch job running.

/******************************************************

* LSBLIB -- Examples

*

* simple brun

* The program takes a job ID as the argument and runs that

* job on a vacant hosts

******************************************************/

#include <stdio.h>
#include <lsf/lsbatch.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
    struct hostInfoEnt  *hInfo;  /* host information */
    int numHosts = 0;            /* number of hosts */
    int i;
    struct runJobRequest runJobReq;
        /* specification for the job to be run */
    /* check if the input is in the right format: "./simbrun
    JOBID" */
    if (argc != 2) {
        printf("Usage: %s jobId\n", argv[0]);
        exit(-1);
    }
    /* initialize LSBLIB and get the configuration environment */
    if (lsb_init(argv[0]) < 0) {
        lsb_perror("lsb_init");
        exit(-1);
    }
    /* get host information */
    hInfo = lsb_hostinfo(NULL, &numHosts);
    if (hInfo == NULL) {
        lsb_perror("lsb_hostinfo");
        exit(-1);
    }
    /* find a vacant host */
    for (i = 0; i < numHosts; i++) {
       if (hInfo[i].hStatus & (HOST_STAT_BUSY |
                              HOST_STAT_WIND | 
                              HOST_STAT_DISABLED |
                              HOST_STAT_LOCKED |
                              HOST_STAT_FULL |
                              HOST_STAT_NO_LIM |
                              HOST_STAT_UNLICENSED |
                              HOST_STAT_UNAVAIL |
                              HOST_STAT_UNREACH))
            continue;
        /* found a vacant host */
        if (hInfo[i].numJobs == 0)
            break;
    }
    /* return error message when there is no vacant host found */
    if (i == numHosts) {
        fprintf(stderr, "Cannot find vacate host to run job
                < %s >\n", argv[1]);
        exit(-1);
    }
    /* define the specifications for the job to be run (The job
    can be stopped due to load conditions) */
    runJobReq.jobId = atoi(argv[1]);
    runJobReq.options = 0;
    runJobReq.numHosts = 1;
    runJobReq.hostname = (char **)malloc(sizeof(char*));
    runJobReq.hostname[0] = hInfo[i].host;
    /* run the job and check for the success */
    if (lsb_runjob(&runJobReq) < 0) {
        lsb_perror("lsb_runjob");
        exit(-1);
    }
    exit (0);
}

On success, lsb_runjob() returns 0. On failure, returns -1 and sets lsberrno to indicate the error.