Instead of specifying an explicit runtime limit for jobs, you can specify an estimated run time for jobs. LSF uses the estimated value for job scheduling purposes only, and does not kill jobs that exceed this value unless the jobs also exceed a defined runtime limit. The format of runtime estimate is same as run limit set by the bsub -W option or the RUNLIMIT parameter in lsb.queues and lsb.applications.
Use JOB_RUNLIMIT_RATIO in lsb.params to limit the runtime estimate users can set. If JOB_RUNLIMIT_RATIO is set to 0 no restriction is applied to the runtime estimate. The ratio does not apply to the RUNTIME parameter in lsb.applications.
The job-level runtime estimate setting overrides the RUNTIME setting in an application profile in lsb.applications.
The following LSF features use the estimated runtime value to schedule jobs:
Job chunking
Advance reservation
SLA
Slot reservation
Backfill
Define a runtime estimate
Define the RUNTIME parameter at the application level. Use the bsub -We option at the job-level.
You can specify the runtime estimate as hours and minutes, or minutes only. The following examples show an application-level runtime estimate of three hours and 30 minutes:
Configure normalized run time
LSF uses normalized run time for scheduling in order to account for different processing speeds of the execution hosts.
Tip: If you want the scheduler to use wall-clock (absolute) run time instead of normalized run time, define ABS_RUNLIMIT=Y in the file lsb.params or in the file lsb.applications for the application associated with your job.
LSF calculates the normalized run time using the following formula:
NORMALIZED_RUN_TIME = RUNTIME * CPU_Factor_Normalization_Host / CPU_Factor_Execute_Host
You can specify a host name or host model with the runtime estimate so that LSF uses a specific host name or model as the normalization host. If you do not specify a host name or host model, LSF uses the CPU factor for the default normalization host as described in the following table.
If you define…
|
In the file…
|
Then…
|
DEFAULT_HOST_SPEC
|
lsb.queues
|
LSF selects the default normalization host for the queue.
|
DEFAULT_HOST_SPEC
|
lsb.params
|
LSF selects the default normalization host for the cluster.
|
No default host at either the queue or cluster level
|
|
LSF selects the submission host as the normalization host.
|
To specify a host name (defined in lsf.cluster.clustername) or host model (defined in lsf.shared) as the normalization host, insert the "/" character between the minutes and the host name or model, as shown in the following examples:
RUNTIME=3:30/hostA
bsub -We 3:30/hostA
LSF calculates the normalized run time using the CPU factor defined for hostA.
RUNTIME=210/Ultra5S
bsub -We 210/Ultra5S
LSF calculates the normalized run time using the CPU factor defined for host model Ultra5S.
Tip: Use lsinfo to see host name and host model information.
How estimated run time interacts with run limits
The following table includes all the expected behaviors for the combinations of job-level runtime estimate (
-We), job-level rum limit (
-W), application-level runtime estimate (RUNTIME), application-level run limit (RUNLIMIT), queue-level run limit (RUNLIMIT, both default and hard limit).
Ratio is the value of JOB_RUNLIMIT_RATIO defined in
lsb.params. The dash (—) indicates no value is defined for the job.
Job-runtime estimate
|
Job-run limit
|
Application runtime estimate
|
Application run limit
|
Queue default run limit
|
Queue hard run limit
|
Result
|
T1
|
-
|
—
|
—
|
—
|
—
|
Job is accepted
Jobs running longer than T1*ratio are killed
|
T1
|
T2>T1*ratio
|
—
|
—
|
—
|
—
|
Job is rejected
|
T1
|
T2<=T1*ratio
|
—
|
—
|
—
|
—
|
Job is accepted
Jobs running longer than T2 are killed
|
T1
|
T2<=T1*ratio
|
T3
|
T4
|
—
|
—
|
Job is accepted
Jobs running longer than T2 are killed
T2 overrides T4 or T1*ratio overrides T4
T1 overrides T3
|
T1
|
T2<=T1*ratio
|
—
|
—
|
T5
|
T6
|
Job is accepted
Jobs running longer than T2 are killed
If T2>T6, the job is rejected
|
T1
|
—
|
T3
|
T4
|
—
|
—
|
Job is accepted
Jobs running longer than T1*ratio are killed
T2 overrides T4 or T1*ratio overrides T4
T1 overrides T3
|
T1
|
—
|
—
|
—
|
T5
|
T6
|
Job is accepted
Jobs running longer than T1*ratio are killed
If T1*ratio>T6, the job is rejected
|