Platform LSF batch queues

LSF batch queues hold jobs in LSF batch and according to scheduling policies and limits on resource usage.

lsb_queueinfo()

lsb_queueinfo() gets information about the queues in LSF batch. This includes:

  • Queue name

  • Parameters

  • Statistics

  • Status

  • Resource limits

  • Scheduling policies and parameters

  • Users and hosts associated with the queue.

The example program in this section uses lsb_queueinfo() to get the queue information:

struct queueInfoEnt *lsb_queueinfo(queues,numQueues,
                   hostname,username,options)

lsb_queueinfo() has the following parameters:

char  **queues;           Array containing names of queues of interest
int   *numQueues;         Number of queues
char  *hostname;          Specified queues using hostname
char  *username;          Specified queues enabled for user
int   options;            Reserved for future use; supply 0

To get information on all queues, set *numQueues to 0. If *numQueues is 1 and queue is NULL, information on the default system queue is returned.

If hostname is not NULL, then all queues using host hostname as a batch server host will be returned. If username is not NULL, then all queues allowing user username to submit jobs to will be returned.

On success, lsb_queueinfo() returns an array containing a queueInfoEnt structure (see below) for each queue of interest and sets *numQueues to the size of the array. On failure, lsb_queueinfo() returns NULL and sets lsberrno to indicate the error.

The queueInfoEnt structure is defined in lsbatch.h as

struct queueInfoEnt {
    char  *queue;             Name of the queue
    char  *description;       Description of the queue
    int   priority;           Priority of the queue
    short nice;               Value that runs jobs in the queue
    char  *userList;          Users allowed to submit jobs to the queue
    char  *hostList;          Hosts that can run jobs in the queue
    int   nIdx;               Size of the loadSched and loadStop arrays
    float *loadSched;         Load thresholds that control scheduling of job
                                  from the queue
    float *loadStop;           Load thresholds that control suspension of
                                  jobs from the queue
    int   userJobLimit;       Number of unfinished jobs a user can dispatch
                                  from the queue
    int   procJobLimit;       Number of unfinished jobs the queue can
                                  dispatch to a processor
    char  *windows;           Queue run window
    int   rLimits[LSF_RLIM_NLIMITS];  Per-process resource limits for
                                           jobs
    char  *hostSpec;          Obsolete. Use defaultHostSpec instead
    int   qAttrib;            Attributes of the queue
    int   qStatus;            Status of the queue
    int   maxJobs;            Job slot limit of the queue.
    int   numJobs;            Total number of job slots required by all jobs 
    int   numPEND;            Number of job slots needed by pending jobs 
    int   numRUN;             Number of jobs slots used by running jobs  
    int   numSSUSP;           Number of job slots used by system
                                  suspended jobs
    int   numUSUSP;           Number of jobs slots used by user
                                  suspended jobs 
    int   mig;                Queue migration threshold in minutes
    int   schedDelay;         Schedule delay for new jobs
    int   acceptIntvl;        Minimum interval between two jobs dispatched
                                  to the same host
    char  *windowsD;          Queue dispatch window
    char  *nqsQueues;         Blank-separated list of NQS queue specifiers
    char  *userShares;        Blank-separated list of user shares
    char  *defaultHostSpec;   Value of DEFAULT_HOST_SPEC for the
                                  queue in lsb.queues
    int   procLimit;          Maximum number of job slots a job can take
    char  *admins;            Queue level administrators
    char  *preCmd;            Queue level pre-exec command 
    char  *postCmd;           Queue’s post-exec command 
    char  *requeueEValues;    Queue’s requeue exit status 
    int   hostJobLimit;       Per host job slot limit 
    char  *resReq;            Queue level resource requirement 
    int   numRESERVE;         Reserved job slots for pending jobs 
    int   slotHoldTime;       Time period for reserving job slots
    char  *sndJobsTo;         Remote queues to forward jobs to 
    char  *rcvJobsFrom;       Remote queues which can forward to me 
    char  *resumeCond;        Conditions to resume jobs 
    char  *stopCond;          Conditions to suspend jobs 
    char  *jobStarter;        Queue level job starter 
    char  *suspendActCmd;     Action commands for SUSPEND
    char  *resumeActCmd;      Action commands for RESUME 
    char  *terminateActCmd;   Action commands for TERMINATE 
    int   sigMap[LSB_SIG_NUM];  Configurable signal mapping 
    char  *preemption;        Preemption policy
    int    maxRschedTime;     Time period for remote cluster to schedule job
    struct shareAcctInfoEnt *shareAccts;  Array of shareAcctInfoEnt
    char   *chkpntDir;        chkpnt directory
    int    chkpntPeriod;      chkpnt period
    int    imptJobBklg;       Number of important jobs kept in the queue
    int    defLimits[LSF_RLIM_NLIMITS];  LSF resource limits (soft)
    int    chunkJobSize;      Maximum number of jobs in one chunk
    int    minProcLimit;      Minimum processor limit
    int    defProcLimit;      Default processor limit
    char   *fairshareQueues;
    char   *defExtSched;      Default external scheduling
    char   *mandExtSched;     Mandatory external scheduling
    int    slotShare;         The share of cpus to use in the pool
    char   *slotPool;         The cpu pool name
    int    underRCond;
    int    overRCond;
    float  idleCond;
    int    underRJobs;
    int    overRJobs;
    int    idleJobs;
    int    warningTimePeriod;  Warning time period in seconds
    char   *warningAction;     Warning action, SIGNAL | CHKPNT | command */
    char   *qCtrlMsg;          AdminAction - queue control message*/
    char   *acResReq;
    int   symJobLimit;         Limit of running service job/symphony job*/
    char   *cpuReq;            cpu_req for service partition of symphony */
    int    proAttr;          Indicates willingness to donate/borrow
    int    lendLimit;        Grace period to lend/return idle hosts
   int    hostReallocInterval;  Grace period to lend/return idle hosts
   int    numCPURequired;   Number of cpus required by CPU provision
    int    numCPUAllocated;  Number of cpus actually allocated
    int    numCPUBorrowed;   Number of cpus borrowed
    int    numCPULent;       Number of cpus lent
    /* the number of reserved cpu(numCPUReserved) = numCPUAllocated - numCPUBorrowed + numCPULent */
/* the following fields are for real-time app(ex. murex) of symphony */
    int    schGranularity;        Scheduling granularity in milliseconds
    int    symTaskGracePeriod;    Grace period for stopping symphony tasks
    int    minOfSsm;              Minimum number of ssm
    int    maxOfSsm;              Maximum number of ssm
    int    numOfAllocSlots;       Number of allocated slots
    char *servicePreemption;      Service preemptin policy
   int    provisionStatus;       Dynamic cpu provision status
    int    minTimeSlice;          Minimal time for preemt. backfill (sec)
    char   *queueGroup;           List of queues defined in QUEUE_GROUP
    int    numApsFactors;
    struct apsFactorInfo *apsFactorInfoList;
    struct apsFactorMap  *apsFactorMaps;  Mapping from factors to subfactors
    struct apsLongNameMap *apsLongNames;  Mapping from factors to their long names
    int    maxJobPreempt;     Maximum number of job preempted times
    int    maxPreExecRetry;   Maximum number of pre-exec retry times
    int    localMaxPreExecRetry;   Maximum number of pre-exec retry times for local cluster
    int    maxJobRequeue;     Maximum number of job re-queue times
    int    usePam;            Use Linux-PAM
    int    cu_type_exclusive; Compute unit type
    char  *cu_str_exclusive;  String specified in EXCLUSIVE=CU[<string>]
};

The variable nIdx is the number of load threshold values for job scheduling. This is the total number of load indices returned by LIM. The parameters sndJobsTo, rcvJobsFrom, and maxRschedTime are used with LSF MultiCluster. The variable chunkJobSize must be larger than 1.

For a complete description of the fields in the queueInfoEnt structure, see the lsb_queueinfo() man page.

Include lsbatch.h in every application that uses LSBLIB functions. lsf.h does not have to be explicitly included in your program because lsbatch.h includes lsf.h.

Like the data structures returned by LSLIB functions, the data structures returned by an LSBLIB function are dynamically allocated inside LSBLIB and are automatically freed next time the same function is called. Do not attempt to free the space allocated by LSBLIB. To keep this information across calls, make your own copy of the data structure.

Example

The program below takes a queue name as the first argument and displays information about the named queue.

/******************************************************
* LSBLIB -- Examples
*
* simbqueues
* Display information about a specific queue in the 
* cluster.
* (Queue name is given on the command line argument)
* It is similar to the command "bqueues QUEUE_NAME".
******************************************************/
# include <lsf/lsbatch.h>
int main (int argc, char *argv[])
{
    struct queueInfoEnt *qInfo;
    char *queues;
        /* take the command line argument as the queue name */
    int numQueues = 1;
        /* only 1 queue name in the array queue */
    char *host = NULL;/* all queues are of interest */
    char *user = NULL;/* all queues are of interest */
    int options = 0;
    /* check if input is in the right format: "./simbqueues
    QUEUENAME" */
    if (argc != 2) {
        printf("Usage: %s queue_name\n", argv[0]);
        exit(-1);
    }
    queues = argv[1];
/* initialize LSBLIB and get the configuration environment */
    if (lsb_init(argv[0]) < 0) {
        lsb_perror("simbqueues: lsb_init() failed");
        exit(-1);
    }
    /* get queue information about the specified queue */
    qInfo = lsb_queueinfo(&queues, &numQueues, host, user,
    options);
    if (qInfo == NULL) {
        lsb_perror("simbqueues: lsb_queueinfo() failed");
        exit(-1);
    }
    /* display the queue information (name, descriptions,
    priority, nice value, max num of jobs, num of PEND, RUN,
    SUSP and TOTAL jobs) */
    printf("Information about %s queue:\n", queues);
    printf("Description: %s\n", qInfo[0].description);
    printf("Priority: %d     Nice: %d     \n",            qInfo[0].priority, qInfo[0].nice);
    printf("Maximum number of job slots:");
    if (qInfo->maxJobs < INFINIT_INT)
        printf("%5d\n", qInfo[0].maxJobs);
    else
        printf("%5s\n", "unlimited");
    printf("Job slot statistics: PEND(%d) RUN(%d) SUSP(%d)            TOTAL(%d).\n", qInfo[0].numPEND, qInfo[0].numRUN,            qInfo[0].numSSUSP + qInfo[0].numUSUSP,            qInfo[0].numJobs);
    exit(0);
} /* main */

In the above program, INFINIT_INT is defined in lsf.h and is used to indicate that there is no limit set for maxJobs. This applies to all Platform LSF API function calls. Platform LSF will supply INFINIT_INT automatically whenever the value for the variable is either invalid (not available) or infinity. This value should be checked for all variables that are optional. For example, if you display the loadSched/loadStop values, an INFINIT_INT indicates that the threshold is not configured and is ignored.

Similarly, lsb_perror() prints error messages regarding function call failure. You can check lsberrno if you want to take different actions for different errors.

The above program will produce output similar to the following:

Information about normal queue:
Description: For normal low priority jobs
Priority: 25            Nice: 20
Maximum number of job slots : 40
Job slot statistics: PEND( 5) RUN(12) SUSP(1) TOTAL(18)