Knowledge Center         Contents    Previous  Next    Index  
Platform Computing Corp.

Job Arrays

LSF provides a structure called a job array that allows a sequence of jobs that share the same executable and resource requirements, but have different input files, to be submitted, controlled, and monitored as a single unit. Using the standard LSF commands, you can also control and monitor individual jobs and groups of jobs submitted from a job array.

After the job array is submitted, LSF independently schedules and dispatches the individual jobs. Each job submitted from a job array shares the same job ID as the job array and are uniquely referenced using an array index. The dimension and structure of a job array is defined when the job array is created.

Contents

Create a Job Array

A job array is created at job submission time using the -J option of bsub.

  1. For example, the following command creates a job array named myArray made up of 1000 jobs.
  2. bsub -J "myArray[1-1000]" myJob
    Job <123> is submitted to default queue <normal>. 
    

Syntax

The bsub syntax used to create a job array follows:

bsub -J "arrayName[indexList, ...]" myJob 

Where:

-J "arrayName[indexList, ...]"

Names and creates the job array. The square brackets, [ ], around indexList must be entered exactly as shown and the job array name specification must be enclosed in quotes. Commas (,) are used to separate multiple indexList entries. The maximum length of this specification is 255 characters.

arrayName

User specified string used to identify the job array. Valid values are any combination of the following characters:

a-z | A-Z | 0-9 | . | - | _ 
indexList = start[-end[:step]]

Specifies the size and dimension of the job array, where:

start

Specifies the start of a range of indices. Can also be used to specify an individual index. Valid values are unique positive integers. For example, [1-5] and [1, 2, 3, 4, 5] specify 5 jobs with indices 1 through 5.

end

Specifies the end of a range of indices. Valid values are unique positive integers.

step

Specifies the value to increment the indices in a range. Indices begin at start, increment by the value of step, and do not increment past the value of end. The default value is 1. Valid values are positive integers. For example, [1-10:2] specifies a range of 1-10 with step value 2 creating indices 1, 3, 5, 7, and 9.

After the job array is created (submitted), individual jobs are referenced using the job array name or job ID and an index value. For example, both of the following series of job array statements refer to jobs submitted from a job array named myArray which is made up of 1000 jobs and has a job ID of 123:

myArray[1], myArray[2], myArray[3], ..., myArray[1000]
123[1], 123[2], 123[3], ..., 123[1000] 

Change the maximum size of a job array

A large job array allows a user to submit a large number of jobs to the system with a single job submission.

By default, the maximum number of jobs in a job array is 1000, which means the maximum size of a job array can never exceed 1000 jobs.

  1. To make a change to the maximum job array value, set MAX_JOB_ARRAY_SIZE in lsb.params to any positive integer between 1 and 2147483646. The maximum number of jobs in a job array cannot exceed the value set by MAX_JOB_ARRAY_SIZE.

Handling Input and Output Files

LSF provides methods for coordinating individual input and output files for the multiple jobs created when submitting a job array. These methods require your input files to be prepared uniformly. To accommodate an executable that uses standard input and standard output, LSF provides runtime variables (%I and %J) that are expanded at runtime. To accommodate an executable that reads command line arguments, LSF provides an environment variable (LSB_JOBINDEX) that is set in the execution environment.

Methods

Prepare input files

LSF needs all the input files for the jobs in your job array to be located in the same directory. By default LSF assumes the current working directory (CWD); the directory from where bsub was issued.

  1. To override CWD, specify an absolute path when submitting the job array.
  2. Each file name consists of two parts, a consistent name string and a variable integer that corresponds directly to an array index. For example, the following file names are valid input file names for a job array. They are made up of the consistent name input and integers that correspond to job array indices from 1 to 1000:

    input.1, input.2, input.3, ..., input.1000 
    

Redirecting Standard Input and Output

The variables %I and %J are used as substitution strings to support file redirection for jobs submitted from a job array. At execution time, %I is expanded to provide the job array index value of the current job, and %J is expanded at to provide the job ID of the job array.

Redirect standard input

  1. Use the -i option of bsub and the %I variable when your executable reads from standard input.
  2. To use %I, all the input files must be named consistently with a variable part that corresponds to the indices of the job array. For example:

    input.1, input.2, input.3, ..., input.N 
     

    For example, the following command submits a job array of 1000 jobs whose input files are named input.1, input.2, input.3, ..., input.1000 and located in the current working directory:

    bsub -J "myArray[1-1000]" -i "input.%I" myJob

Redirect standard output and error

  1. Use the -o option of bsub and the %I and %J variables when your executable writes to standard output and error.
    1. To create an output file that corresponds to each job submitted from a job array, specify %I as part of the output file name.
    2. For example, the following command submits a job array of 1000 jobs whose output files are put in CWD and named output.1, output.2, output.3, ..., output.1000:

      bsub -J "myArray[1-1000]" -o "output.%I" myJob 
      
    3. To create output files that include the job array job ID as part of the file name specify %J.
    4. For example, the following command submits a job array of 1000 jobs whose output files are put in CWD and named output.123.1, output.123.2, output.123.3, ..., output.123.1000. The job ID of the job array is 123.

      bsub -J "myArray[1-1000]" -o "output.%J.%I" myJob 
      

Passing Arguments on the Command Line

The environment variable LSB_JOBINDEX is used as a substitution string to support passing job array indices on the command line. When the job is dispatched, LSF sets LSB_JOBINDEX in the execution environment to the job array index of the current job. LSB_JOBINDEX is set for all jobs. For non-array jobs, LSB_JOBINDEX is set to zero (0).

To use LSB_JOBINDEX, all the input files must be named consistently and with a variable part that corresponds to the indices of the job array. For example:

input.1, input.2, input.3, ..., input.N 

You must escape LSB_JOBINDEX with a backslash, \, to prevent the shell interpreting bsub from expanding the variable. For example, the following command submits a job array of 1000 jobs whose input files are named input.1, input.2, input.3, ..., input.1000 and located in the current working directory. The executable is being passed an argument that specifies the name of the input files:

bsub -J "myArray[1-1000]" myJob -f input.\$LSB_JOBINDEX 

Job Array Dependencies

Like all jobs in LSF, a job array can be dependent on the completion or partial completion of a job or another job array. A number of job-array-specific dependency conditions are provided by LSF.

Set a whole array dependency

  1. To make a job array dependent on the completion of a job or another job array use the -w "dependency_condition" option of bsub.
  2. For example, to have an array dependent on the completion of a job or job array with job ID 123, use the following command:

    bsub -w "done(123)" -J "myArray2[1-1000]" myJob 
    

Set a partial array dependency

  1. To make a job or job array dependent on an existing job array , use one of the following dependency conditions.
  2. Condition
    Description
    numrun(jobArrayJobId, op num)
    Evaluate the number of jobs in RUN state
    numpend(jobArrayJobId, op num)
    Evaluate the number of jobs in PEND state
    numdone(jobArrayJobId, op num)
    Evaluate the number of jobs in DONE state
    numexit(jobArrayJobId, op num)
    Evaluate the number of jobs in EXIT state
    numended(jobArrayJobId, op num)
    Evaluate the number of jobs in DONE and EXIT state
    numhold(jobArrayJobId, op num)
    Evaluate the number of jobs in PSUSP state
    numstart(jobArrayJobId, op num)
    Evaluate the number of jobs in RUN and SSUSP and USUSP state

  3. Use one the following operators (op) combined with a positive integer (num) to build a condition:
  4. == | > | < | >= |<= | != 
     

    Optionally, an asterisk (*) can be used in place of num to mean all jobs submitted from the job array.

    For example, to start a job named myJob when 100 or more elements in a job array with job ID 123 have completed successfully:

    bsub -w "numdone(123, >= 100)" myJob

Monitoring Job Arrays

Use bjobs and bhist to monitor the current and past status of job arrays.

Display job array status

  1. To display summary information about the currently running jobs submitted from a job array, use the -A option of bjobs.
  2. For example, a job array of 10 jobs with job ID 123:

    bjobs -A 123
    JOBID    ARRAY_SPEC  OWNER  NJOBS PEND DONE  RUN EXIT SSUSP USUSP PSUSP
    123      myArra[1-10]     user1     10    3    3    4    0     0     0     0 
    

Display job array dependencies

  1. To display information for any job dependency information for an array, use the bjdepinfo command.
  2. For example, a job array (with job ID 456) where you want to view the dependencies on the third element of the array:

    bjdepinfo -c "456[3]" 
    JOBID  CHILD CHILD_STATUS CHILD_NAME LEVEL 
    456[3] 300   PEND         job300     1 
    

Individual job status

Display current job status

  1. To display the status of the individual jobs submitted from a job array, specify the job array job ID with bjobs. For jobs submitted from a job array, JOBID displays the job array job ID, and JOBNAME displays the job array name and the index value of each job.
  2. For example, to view a job array with job ID 123:

    bjobs 123
    JOBID  USER   STAT   QUEUE     FROM_HOST  EXEC_HOST   JOB_NAME    SUBMIT_TIME
    123    user1  DONE   default   hostA      hostC       myArray[1]  Feb 29 12:34
    123    user1  DONE   default   hostA      hostQ       myArray[2]  Feb 29 12:34
    123    user1  DONE   default   hostA      hostB       myArray[3]  Feb 29 12:34
    123    user1  RUN    default   hostA      hostC       myArray[4]  Feb 29 12:34
    123    user1  RUN    default   hostA      hostL       myArray[5]  Feb 29 12:34
    123    user1  RUN    default   hostA      hostB       myArray[6]  Feb 29 12:34
    123    user1  RUN    default   hostA      hostQ       myArray[7]  Feb 29 12:34
    123    user1  PEND   default   hostA                  myArray[8]  Feb 29 12:34
    123    user1  PEND   default   hostA                  myArray[9]  Feb 29 12:34
    123    user1  PEND   default   hostA                  myArray[10] Feb 29 12:34 
    

Display past job status

  1. To display the past status of the individual jobs submitted from a job array, specify the job array job ID with bhist.
  2. For example, to view the history of a job array with job ID 456:

    bhist 456
    Summary of time in seconds spent in various states:
    JOBID  USER    JOB_NAME   PEND    PSUSP   RUN     USUSP   SSUSP   UNKWN   TOTAL
    456[1] user1   *rray[1]   14      0       65      0       0       0       79
    456[2] user1   *rray[2]   74      0       25      0       0       0       99
    456[3] user1   *rray[3]   121     0       26      0       0       0       147
    456[4] user1   *rray[4]   167     0       30      0       0       0       197
    456[5] user1   *rray[5]   214     0       29      0       0       0       243
    456[6] user1   *rray[6]   250     0       35      0       0       0       285
    456[7] user1   *rray[7]   295     0       33      0       0       0       328
    456[8] user1   *rray[8]   339     0       29      0       0       0       368
    456[9] user1   *rray[9]   356     0       26      0       0       0       382
    456[10]user1   *ray[10]   375     0       24      0       0       0       399 
    

Specific job status

Display the current status of a specific job

  1. To display the current status of a specific job submitted from a job array, specify in quotes, the job array job ID and an index value with bjobs.
  2. For example, the status of the 5th job in a job array with job ID 123:

    bjobs "123[5]"
    JOBID  USER   STAT   QUEUE     FROM_HOST  EXEC_HOST   JOB_NAME    SUBMIT_TIME
    123    user1  RUN    default   hostA      hostL       myArray[5]  Feb 29 12:34 
    

Display the past status of a specific job

  1. To display the past status of a specific job submitted from a job array, specify, in quotes, the job array job ID and an index value with bhist.
  2. For example, the status of the 5th job in a job array with job ID 456:

    bhist "456[5]"
    Summary of time in seconds spent in various states:
    JOBID  USER    JOB_NAME   PEND    PSUSP   RUN     USUSP   SSUSP   UNKWN   TOTAL
    456[5] user1   *rray[5]   214     0       29      0       0       0       243 
    

Controlling Job Arrays

You can control the whole array, all the jobs submitted from the job array, with a single command. LSF also provides the ability to control individual jobs and groups of jobs submitted from a job array. When issuing commands against a job array, use the job array job ID instead of the job array name. Job names are not unique in LSF, and issuing a command using a job array name may result in unpredictable behavior.

Most LSF commands allow operation on both the whole job array, individual jobs, and groups of jobs. These commands include bkill, bstop, bresume, and bmod.

Some commands only allow operation on individual jobs submitted from a job array. These commands include btop, bbot, and bswitch.

Control a whole array

  1. To control the whole job array, specify the command as you would for a single job using only the job ID.
  2. For example, to kill a job array with job ID 123:

    bkill 123 
    

Control individual jobs

  1. To control an individual job submitted from a job array, specify the command using the job ID of the job array and the index value of the corresponding job. The job ID and index value must be enclosed in quotes.
  2. For example, to kill the 5th job in a job array with job ID 123:

    bkill "123[5]" 
    

Control groups of jobs

  1. To control a group of jobs submitted from a job array, specify the command as you would for an individual job and use indexList syntax to indicate the jobs.
  2. For example, to kill jobs 1-5, 239, and 487 in a job array with job ID 123:

    bkill "123[1-5, 239, 487]" 
    

Job Array Chunking

Job arrays in most queues can be chunked across an array boundary (not all jobs must belong to the same array). However, if the queue is preemptable or preemptive, the jobs are chunked when they belong to the same array.

For example:

job1[1], job1[2], job2[1], job2[2] in a preemption queue with CHUNK_JOB_SIZE=3

Then

Requeuing a Job Array

Use brequeue to requeue a job array. When the job is requeued, it is assigned the PEND status and the job's new position in the queue is after other jobs of the same priority. You can requeue:

brequeue is not supported across clusters.

Requeue jobs in DONE state

  1. To requeue DONE jobs use the -d option of brequeue.
  2. For example, the command brequeue -J "myarray[1-10]" -d 123 requeues jobs with job ID 123 and DONE status.

Requeue Jobs in EXIT state

  1. To requeue EXIT jobs use the -e option of brequeue.
  2. For example, the command brequeue -J "myarray[1-10]" -e 123 requeues jobs with job ID 123 and EXIT status.

Requeue all jobs in an array regardless of job state

  1. A submitted job array can have jobs that have different job states. To requeue all the jobs in an array regardless of any job's state, use the -a option of brequeue.
  2. For example, the command brequeue -J "myarray[1-10]" -a 123 requeues all jobs in a job array with job ID 123 regardless of their job state.

Requeue RUN jobs to PSUSP state

  1. To requeue RUN jobs to PSUSP state, use the -H option of brequeue.
  2. For example, the command brequeue -J "myarray[1-10]" -H 123 requeues to PSUSP RUN status jobs with job ID 123.

Requeue jobs in RUN state

  1. To requeue RUN jobs use the -r option of brequeue.
  2. For example, the command brequeue -J "myarray[1-10]" -r 123 requeues jobs with job ID 123 and RUN status.

Job Array Job Slot Limit

The job array job slot limit is used to specify the maximum number of jobs submitted from a job array that are allowed to run at any one time. A job array allows a large number of jobs to be submitted with one command, potentially flooding a system, and job slot limits provide a way to limit the impact a job array may have on a system. Job array job slot limits are specified using the following syntax:

bsub -J "job_array_name[index_list]%job_slot_limit" myJob 

where:

%job_slot_limit

Specifies the maximum number of jobs allowed to run at any one time. The percent sign (%) must be entered exactly as shown. Valid values are positive integers less than the maximum index value of the job array.

Setting a job array job slot limit

Set a job array slot limit at submission
  1. Use the bsub command to set a job slot limit at the time of submission.
  2. To set a job array job slot limit of 100 jobs for a job array of 1000 jobs:

    bsub -J "job_array_name[1000]%100" myJob 
    
Set a job array slot limit after submission
  1. Use the bmod command to set a job slot limit after submission.
  2. For example, to set a job array job slot limit of 100 jobs for an array with job ID 123:

    bmod -J "%100" 123 
    

Change a job array job slot limit

Changing a job array job slot limit is the same as setting it after submission.

  1. Use the bmod command to change a job slot limit after submission.
  2. For example, to change a job array job slot limit to 250 for a job array with job ID 123:

    bmod -J "%250" 123 
    

View a job array job slot limit

  1. To view job array job slot limits use the -A and -l options of bjobs. The job array job slot limit is displayed in the Job Name field in the same format in which it was set.
  2. For example, the following output displays the job array job slot limit of 100 for a job array with job ID 123:

    bjobs -A -l 123
    Job <123>, Job Name <myArray[1-1000]%100>, User <user1>, Project <default>, Sta
                         tus <PEND>, Queue <normal>, Job Priority <20>, Command <my
                         Job>
    Wed Feb 29 12:34:56: Submitted from host <hostA>, CWD <$HOME>;
     
     COUNTERS:
     NJOBS PEND DONE RUN EXIT SSUSP USUSP PSUSP
        10    9   0    1    0     0     0     0 
    

Platform Computing Inc.
www.platform.com
Knowledge Center         Contents    Previous  Next    Index