Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



Using LSF Desktop Support to Run Jobs


You run jobs in LSF desktop support in much the same way as you run jobs in standard LSF, using the bsub command. However, you must make sure that the queue or host to which you submit your job is configured for LSF desktop support. In addition, unless your cluster is configured as an LSF desktop support-only cluster and the LSF desktop support queue is the default queue, you must specify this queue when you submit a job.

In this chapter

[ Top ]


Submitting a Job

You can submit one or more jobs to LSF desktop support using the bsub command. You can submit a job that runs the following types of PC commands: .cmd, .bat, and .exe.

You must make sure that the queue or host to which you submit your job is configured for LSF desktop support. In addition, unless your cluster is configured as an LSF desktop support-only cluster and the LSF desktop support queue is the default queue, you must specify this queue when you submit a job.

Submit a job using the bsub command:

On the command line, type a bsub command to submit the job to LSF desktop support. Make sure that you include the applications and files required to run the job, and specify a queue name if you are not using the default LSF desktop support queue AC_QUEUE. For example:

bsub -q LongQueue \
-f "BinDir/JobScript.cmd > JobScript.cmd" \
-f "InputDir/InputFile > InputFile" \
-f "OutputDir/OutputFile < OutputFile" \
JobScript.cmd

By default, all files are cached, but the LSF administrator can globally disable or selectively enable file caching, as described in the Platform LSF Desktop Support Administrator's Guide. If selective file caching is enabled by the administrator, you can append + to a file name when submitting a job. For example:

bsub -f "local_file+ > remote_file" caches the file on the desktop client.

For the complete syntax of the bsub command for LSF desktop support, see bsub.

[ Top ]


Submitting a Job with Many File Transfers

LSF desktop support allows a maximum of 32 file transfer requests with the
-f option in the LSF desktop support version of the bsub command. To specify more than 32 file transfers, use the zip and unzip commands to reduce the number of file transfer requests.

In the following example, myjob.exe requires a total of 66 file transfers: 33 to copy files to the desktop client, and 33 to copy the results from the desktop client.

To transfer many files:

  1. Zip the data files together into one file. For example:
    zip data.zip data1 data2 data3 ...data33
    
  2. Create a job wrapper that unzips the data files, runs the executable and zips the results. For example, the wrapper myjob.bat might look like this:
    unzip data.zip
    myjob.exe
    zip result.zip result1 result2 result3 ... result33
    
  3. Submit the job, transferring the data files, the wrapper and the executable to the desktop client, and transferring the zipped results file back from the desktop client. For example:
    bsub -f "data.zip > data.zip" -f "myjob.bat > myjob.bat" -f 
    "myjob.exe > myjob.exe" -f "result.zip < result.zip" 
    myjob.bat
    
  4. When the job is completed, unzip the result file:
    unzip result.zip
    

If you do not have zip and unzip on your system, you can get them from the Internet, and install them on each desktop client as required.

[ Top ]


Using Job Arrays

A job array is a sequence of jobs that share the same executable but have different input files. Creating a job array allows you to submit, control and monitor these jobs as a single unit. Using standard LSF commands, you can also control and monitor individual jobs that were submitted from an array.

After the jobs are submitted, LSF independently schedules and dispatches the individual jobs. Each job submitted from a job array shares the same job ID as the job array, and is uniquely referenced using an array index.

You create a job array at job submission time using the bsub command.

If a job needs to know its own ID, you can retrieve the ID of any job in an array using the LSB_JOBINDEX environment variable. See To retrieve the ID of a job in an array: for instructions.

To submit a job array using the bsub command:

  1. Specify the bsub command and include the -J option:

    bsub -J "arrayName[indexList, ...]"

    where arrayName is a string used to identify the job array. You can use alphabetic characters, numerals 0 to 9, period (.), dash (-) and underscore (_).

    where indexList can be a range of unique positive integers, such as

    [1-5]

    or indexList can be in the format:

    [start-end[:step]]

    where start is used with end to specify the start of a range of indices, and end specifies the end of the range. step specifies the value to increment the indices in the range. For example:

    [1-10:2]

    specifies a range of 1-10 with a step value 2, creating indices 1,3,5,7 and 9.

  2. Make sure that your input files are all stored in the current working directory (or specify the full path name for the directory where they are stored), and are all named consistently to correspond with the indices of the array. For example, if the array indices are from 1 to 1000:
    input.1, input.2, input.3, ..., input.1000
    
  3. If there is more than one LSF desktop support queue, and you are not using the default queue, specify the queue name.

Example: Submitting a job array

The following example submits a job array with 1000 entries. The special character %I is replaced by the index of the job in the array.

bsub -J "array[1-1000]" \

-f "input.%I > input" \

-f "output.%I < output" \

-f "job.cmd > job.cmd" \

job.cmd

The above example submits 1000 jobs that correspond to 1000 input files named input.1 to input.1000. The -f option is used to copy the input files to the desktop client and the resulting output files to the desktop server. It is also used to copy the executable, which is job.cmd. The job array is submitted to the default queue AC_QUEUE.

To retrieve the ID of a job in an array:

  1. Write a Windows script to access the job environment when the job is run. For example, create a file called job_index.bat, as follows:
    cmd /c set LSB_JOBID>>test
    cmd /c set LSB_JOBINDEX>>test
    cmd /c set LSB_RUNID>>test
  2. Submit the job array to the LSF desktop support queue. For example:

    bsub -q AC_QUEUE -f "test > test"
    -f "job_index.bat > job_index.bat"
    -f "test_back%I < test" -J "a[1-10]" job_index.bat

    Job <775> is submitted to default queue <AC_QUEUE>.

  3. After the job is finished, test_back should look something like the following:

    cat test_back1
    LSB_JOBID=105
    LSB_JOBINDEX=1
    LSB_RUNID=1

    where LSF_JOBID is the run ID of the job, LSB_JOBINDEX is the job array index, and the value of LSB_RUNID is the number of times the job has been redispatched.

[ Top ]


Displaying Active Jobs

You can display the jobs submitted by a specific user or all users using the bjobs command. By default, the bjobs command displays jobs submitted by the user who invoked the command.

To display active jobs using bjobs:

  1. On the command line, type the following: bjobs -u user_name. For example, bjobs -u all displays the jobs submitted by all users.

    Jobs are displayed in the following order:

    • Running jobs
    • Pending jobs, listed in the order in which they are scheduled
    • Jobs, those in high priority queues are listed before those in lower priority queues
  2. Use the information provided to determine the status of a job. For more information about job status, see Job Status Codes and What They Mean.

For additional information about the bjobs command, refer to the Platform LSF Reference.

[ Top ]


Displaying Job History Information

You can use the bhist command to track what happened to your job after submitting it. The bhist command displays a summary of the pending, suspended, and running times of jobs for the user who invoked the command. Use bhist -u all to display a summary for all users in the cluster. For more details about the bhist command, refer to the bhist command in the Platform LSF Reference.

To display detailed job history information:

  1. On the command line, type the bhist command as follows:

    bhist -l job_ID

    The -l option displays the time information and a complete history of scheduling events for each job. For example:

    bhist -l "jobarray[5]"

    displays the job history information for the fifth element in jobarray.

For additional information about the bhist command, refer to the Platform LSF Reference.

Example

Job submission

bsub -q AC_QUEUE -f "myjob2.exe+ > myjob.exe-" -f "myjoboutput.log+ <"
	 myjob.exe
Job <751> is submitted to queue <AC_QUEUE>.
12:53pm Fri, Aug-20-2004 my_host:~/tmp
[417]- bsub -q AC_QUEUE -f "myjob2.exe- > myjob.exe-" -f "myjoboutput.log+ 
	 <" myjob.exe
Job <752> is submitted to queue <AC_QUEUE>.

Job history

12:53pm Fri, Aug-20-2004 my_host:~/tmp
[418]- bhist -l 751 752
Job <751>, User <LSF_user>, Project <default>, Command <myjob.exe>
Fri Aug 20 12:53:47: Submitted from host <my_host.lsf.platform.com>, to Queue
	 <AC_QUEUE>, CWD <$HOME/tmp>, Copy Files "myjob2.exe+ > myjob.
	 exe-" "myjoboutput.log+ < myjoboutput.log+";
Fri Aug 20 12:53:57: Dispatched to <my_host.lsf.platform.com>;
Fri Aug 20 12:53:57: Starting (Pid 9327);
Fri Aug 20 12:54:16: Running with execution home 
</home/LSF_user/milkyway/fyactest/top/ach_top/work/.jobs>, Execution CWD 
</home/LSF_user/tmp>, Execution Pid <-1>;
Fri Aug 20 12:54:16: Waiting;
Fri Aug 20 12:58:39: External Message "Running on <LSF_user>" was posted from 
"root"to message box 0;
Fri Aug 20 12:58:40: Running;
Fri Aug 20 13:03:44: Done successfully. The CPU time used is 0.0 seconds;
Fri Aug 20 13:04:00: Post job process done successfully;

Summary of time in seconds spent in various states by Fri Aug 20 13:04:00
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
274 0 323 0 0 0 597

[ Top ]


Job Status Codes and What They Mean

Some job status codes in LSF desktop support have a slightly different meaning than in standard LSF. Refer to the following, which gives specific LSF desktop support meanings for those codes.


Note: You cannot suspend a running LSF desktop support job.

Codes and LSF desktop support-specific descriptions

Code Description
PEND
The job is still queued in the LSF Scheduler
WAIT
The job is still queued in the desktop server, waiting for an desktop client to run it.
RUN
The job is currently running on an desktop client.
UNKNOWN
The desktop client service has not reported for a long time (by default, for more than 600 seconds); for example, while the Tomcat application server is restarting.

All other job status codes retain their standard LSF meanings.

[ Top ]


Killing a Job

You can stop a job from processing and remove it from the system using the bkill command.

To kill a job:

On the command line, type the bkill command as follows:

bkill job_ID

For example:

bkill 1234

kills job 1234.

To kill an entire job array:

On the command line, type the bkill command as follows:

bkill jobarray_ID

where jobarray_ID is the job ID of the array. For example:

bkill 12345

kills the entire array.

To kill an element in a job array:

On the command line, type the bkill command as follows:

bkill "jobarray_ID[index]"

where jobarray_ID is the job ID of the array, and index is the element number you want to kill. For example:

bkill "12345[5]"

kills the fifth element in the array.

[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: January 29, 2009
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.