Knowledge Center Contents Previous Next Index |
Job Priorities
Contents
User-Assigned Job Priority
User-assigned job priority provides controls that allow users to order their jobs in a queue. Job order is the first consideration to determine job eligibility for dispatch. Jobs are still subject to all scheduling policies regardless of job priority. Jobs with the same priority are ordered first come first served.
The job owner can change the priority of their own jobs. LSF and queue administrators can change the priority of all jobs in a queue.
User-assigned job priority is enabled for all queues in your cluster, and can be configured with automatic job priority escalation to automatically increase the priority of jobs that have been pending for a specified period of time.
Considerations
The
btop
andbbot
commands move jobs relative to other jobs of the same priority. These commands do not change job priority.In this section
Configure job priority
- To configure user-assigned job priority edit
lsb.params
and define MAX_USER_PRIORITY. This configuration applies to all queues in your cluster.- Use
bparams -l
to display the value of MAX_USER_PRIORITY.Syntax
MAX_USER_PRIORITY=max_priority
Where:
max_priority
Specifies the maximum priority a user can assign to a job. Valid values are positive integers. Larger values represent higher priority; 1 is the lowest.
LSF and queue administrators can assign priority beyond
max_priority
for jobs they own.Example
MAX_USER_PRIORITY=100Specifies that 100 is the maximum job priority that can be specified by a user.
Specify job priority
- Job priority is specified at submission using
bsub
and modified after submission usingbmod
. Jobs submitted without a priority are assigned the default priority of MAX_USER_PRIORITY/2.Syntax
bsub -sppriority
bmod [-sppriority
| -spn]job_ID
Where:
-sp
priority
Specifies the job priority. Valid values for
priority
are any integers between 1 and MAX_USER_PRIORITY (displayed bybparams -l
). Incorrect job priorities are rejected.LSF and queue administrators can specify priorities beyond MAX_USER_PRIORITY for jobs they own.
-spn
Sets the job priority to the default priority of MAX_USER_PRIORITY/2 (displayed by
bparams -l
).View job priority information
- Use the following commands to view job history, the current status and system configurations:
bhist -l
job_ID
Displays the history of a job including changes in job priority.
bjobs -l [
job_ID
]Displays the current job priority and the job priority at submission time. Job priorities are changed by the job owner, LSF and queue administrators, and automatically when automatic job priority escalation is enabled.
bparams -l
Displays values for:
- The maximum user priority, MAX_USER_PRIORITY
- The default submission priority, MAX_USER_PRIORITY/2
- The value and frequency used for automatic job priority escalation, JOB_PRIORITY_OVER_TIME
Automatic Job Priority Escalation
Automatic job priority escalation automatically increases job priority of jobs that have been pending for a specified period of time. User-assigned job priority (see User-Assigned Job Priority) must also be configured.
As long as a job remains pending, LSF automatically increases the job priority beyond the maximum priority specified by MAX_USER_PRIORITY. Job priority is not increased beyond the value of
max_int
on your system.Pending job resize allocation requests for resizable jobs inherit the job priority from the original job. When the priority of the allocation request gets adjusted, the priority of the original job is adjusted as well. The job priority of a running job is adjusted when there is an associated resize request for allocation growth.
bjobs
displays the updated job priority.If necessary, a new pending resize request is regenerated after the job gets dispatched. The new job priority is used.
For requeued and rerun jobs, the dynamic priority value is reset. For migrated jobs, the existing dynamic priority value is carried forward. The priority is recalculated based on the original value.
Configure job priority escalation
- To configure job priority escalation edit
lsb.params
and define JOB_PRIORITY_OVER_TIME.User-assigned job priority must also be configured,
- Use
bparams -l
to display the values of JOB_PRIORITY_OVER_TIME.Syntax
JOB_PRIORITY_OVER_TIME=increment
/interval
Where:
increment
Specifies the value used to increase job priority every
interval
minutes. Valid values are positive integers.
interval
Specifies the frequency, in minutes, to
increment
job priority. Valid values are positive integers.Example
JOB_PRIORITY_OVER_TIME=3/20Specifies that every 20 minute
interval increment
to job priority of pending jobs by 3.Absolute Job Priority Scheduling
Absolute job priority scheduling (APS) provides a mechanism to control the job dispatch order to prevent job starvation.
When configured in a queue, APS sorts pending jobs for dispatch according to a job priority value calculated based on several configurable job-related factors. Each job priority weighting factor can contain subfactors. Factors and subfactors can be independently assigned a weight.
APS provides administrators with detailed yet straightforward control of the job selection process.
- APS only sorts the jobs; job scheduling is still based on configured LSF scheduling policies. LSF attempts to schedule and dispatch jobs based on their order in the APS queue, but the dispatch order is not guaranteed.
- The job priority is calculated for pending jobs across multiple queues based on the sum of configurable factor values. Jobs are then ordered based on the calculated APS value.
- You can adjust the following for APS factors:
- A weight for scaling each job-related factor and subfactor
- Limits for each job-related factor and subfactor
- A grace period for each factor and subfactor
- To configure absolute priority scheduling (APS) across multiple queues, define APS queue groups. When you submit a job to any queue in a group, the job's dispatch priority is calculated using the formula defined in the group's master queue.
- Administrators can also set a static system APS value for a job. A job with a system APS priority is guaranteed to have a higher priority than any calculated value. Jobs with higher system APS settings have priority over jobs with lower system APS settings.
- Administrators can use the ADMIN factor to manually adjust the calculated APS value for individual jobs.
Scheduling priority factors
To calculate the job priority, APS divides job-related information into several categories. Each category becomes a factor in the calculation of the scheduling priority. You can configure the weight, limit, and grace period of each factor to get the desired job dispatch order.
LSF sums the value of each factor based on the weight of each factor.
Factor weight
The weight of a factor expresses the importance of the factor in the absolute scheduling priority. The factor weight is multiplied by the value of the factor to change the factor value. A positive weight increases the importance of the factor, and a negative weight decreases the importance of a factor. Undefined factors have a weight of 0, which causes the factor to be ignored in the APS calculation.
Factor limit
The limit of a factor sets the minimum and maximum absolute value of each weighted factor. Factor limits must be positive values.
Factor grace period
Each factor can be configured with a grace period. The factor only counted as part of the APS value when the job has been pending for a long time and it exceeds the grace period.
Factors and subfactors
Where LSF gets the job information for each factor
Enable absolute priority scheduling
Configure APS_PRIORITY in an absolute priority queue in
lsb.queues
.
APS_PRIORITY=WEIGHT[
[factor
,
value
] [subfactor
,
value
]...]...] LIMIT[
[factor
,
value
] [subfactor
,
value
]...]...] GRACE_PERIOD[
[factor
,
value
] [subfactor
,
value
]...]...]
Pending jobs in the queue are ordered according to the calculated APS value.
If weight of a subfactor is defined, but the weight of parent factor is not defined, the parent factor weight is set as 1.
The WEIGHT and LIMIT factors are floating-point values. Specify a
value
for GRACE_PERIOD in seconds (value
s
), minutes (value
m
), or hours (value
h
).The default unit for grace period is hours.
For example, the following sets a grace period of 10 hours for the MEM factor, 10 minutes for the JPRIORITY factor, 10 seconds for the QPRIORITY factor, and 10 hours (default) for the RSRC factor:
GRACE_PERIOD[[MEM,10h] [JPRIORITY, 10m] [QPRIORITY,10s] [RSRC, 10]]You cannot specify zero (0) for the WEIGHT, LIMIT, and GRACE_PERIOD of any factor or subfactor.
APS queues cannot configure cross-queue fairshare (FAIRSHARE_QUEUES) or host-partition fairshare.
Modify the system APS value (bmod)
The absolute scheduling priority for a newly submitted job is dynamic. Job priority is calculated and updated based on formula specified by APS_PRIORITY in the absolute priority queue. Administrators can use
bmod
to manually override the calculated APS value.Run
bmod -apsn
job_ID
to undo the previousbmod -aps
setting.Assign a static system priority and ADMIN factor value
Administrators can use using
bmod -aps "system=value"
to assign a static job priority for a pending job. The value cannot be zero (0).In this case, job's absolute priority is not calculated. The system APS priority is guaranteed to be higher than any calculated APS priority value. Jobs with higher system APS settings have priority over jobs with lower system APS settings.
The system APS value set by
bmod -aps
is preserved aftermbatchd
reconfiguration ormbatchd
restart.Use the ADMIN factor to adjust the APS value
Administrators can use
bmod -aps "admin=value"
to change the calculated APS value for a pending job. The ADMIN factor is added to the calculated APS value to change the factor value. The absolute priority of the job is recalculated. The value cannot be zero (0).A
bmod -aps
command always overrides the lastbmod -aps
commandsThe ADMIN APS value set by
bmod -aps
is preserved aftermbatchd
reconfiguration ormbatchd
restart.Example bmod output
The following commands change the APS values for jobs 313 and 314:
bmod -aps "system=10" 313
Parameters of job <313> are being changedbmod -aps "admin=10.00" 314
Parameters of job <314> are being changedView modified APS values
Use
bjobs -aps
to see the effect of the changes:bjobs -aps
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME APS 313 user1 PEND owners hostA myjob Feb 12 01:09 (10) 321 user1 PEND owners hostA myjob Feb 12 01:09 - 314 user1 PEND normal hostA myjob Feb 12 01:08 109.00 312 user1 PEND normal hostA myjob Feb 12 01:08 99.00 315 user1 PEND normal hostA myjob Feb 12 01:08 99.00 316 user1 PEND normal hostA myjob Feb 12 01:08 99.00Use
bjobs -l
to show APS values modified by the administrator:bjobs -l
Job <313>, User <user1>, Project <default>, Service Class <SLASamples>, Status <RUN>, Queue <normal>, Command <myjob>,System Absolute Priority <10>
Job <314>, User <user1>, Project <default>, Status <PEND>, Queue <normal>, Command <myjob>,Admin factor value <10>
Use
bhist -l
to see historical information about administrator changes to APS values. For example, after running these commands:
bmod -aps "system=10" 108
bmod -aps "admin=20" 108
bmod -apsn 108
bhist -l
shows the sequence changes to job 108:bhist -l
Job <108>, User <user1>, Project <default>, Command <sleep 10000> Tue Feb 13 15:15:26: Submitted from host <HostB>, to Queue <normal>, CWD </scratch/user1>; Tue Feb 13 15:15:40: Parameters of Job are changed:Absolute Priority Scheduling factor string changed to : system=10;
Tue Feb 13 15:15:48: Parameters of Job are changed:Absolute Priority Scheduling factor string changed to : admin=20;
Tue Feb 13 15:15:58: Parameters of Job are changed:Absolute Priority Scheduling factor string deleted;
Summary of time in seconds spent in various states by Tue Feb 13 15:16:02 PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL 36 0 0 0 0 0 36Configure APS across multiple queues
Use
QUEUE_GROUP
in an absolute priority queue inlsb.queues
to configure APS across multiple queues.When APS is enabled in the queue with APS_PRIORITY, the FAIRSHARE_QUEUES parameter is ignored. The QUEUE_GROUP parameter replaces FAIRSHARE_QUEUES, which is obsolete in LSF 7.0.
Example 1
You want to schedule jobs from the normal queue and the short queue, factoring the job priority (weight 1) and queue priority (weight 10) in the APS value:
Begin Queue QUEUE_NAME = normal PRIORITY = 30 NICE = 20 APS_PRIORITY = WEIGHT [[JPRIORITY, 1] [QPRIORITY, 10]] QUEUE_GROUP = short DESCRIPTION = For normal low priority jobs, running only if hosts are lightly loaded. End Queue ... Begin Queue QUEUE_NAME = short PRIORITY = 20 NICE = 20 End QueueThe APS value for jobs from the normal queue and the short queue are: calculated as:
APS_PRIORITY = 1 * (1 *job_priority
+ 10 *queue_priority
)The first 1 is the weight of the WORK factor; the second 1 is the weight of the job priority subfactor; the 10 is the weight of queue priority subfactor.
If you want the job priority to increase based on the pending time, you must configure
JOB_PRIORITY_OVER_TIME
parameter in thelsb.params
.Example 2
Extending example 1, you want to add user-based fairshare with a weight of 100 to the APS value in the normal queue:
Begin Queue QUEUE_NAME = normal PRIORITY = 30 NICE = 20 FAIRSHARE = USER_SHARES [[user1, 5000] [user2, 5000] [others, 1]] APS_PRIORITY = WEIGHT [[JPRIORITY, 1] [QPRIORITY, 10] [FS, 100]] QUEUE_GROUP = short DESCRIPTION = For normal low priority jobs, running only if hosts are lightly loaded. End QueueThe APS value is now calculated as
APS_PRIORITY = 1 * (1 *job_priority
+ 10 *queue_priority
) + 100 *user_priority
Example 3
Extending example 2, you now to add swap space to the APS value calculation. The APS configuration changes to:
APS_PRIORITY = WEIGHT [[JPRIORITY, 1] [QPRIORITY, 10] [FS, 100] [SWAP, -10]]And the APS value is now calculated as
APS_PRIORITY = 1 * (1 *job_priority
+ 10 *queue_priority
) + 100 *user_priority
+ 1 * (-10 * SWAP)View pending job order by the APS value
Run
bjobs -aps
to see APS information for pending jobs in the order of absolute scheduling priority. The order that the pending jobs are displayed is the order in which the jobs are considered for dispatch.The APS value is calculated based on the current scheduling cycle, so jobs are not guaranteed to be dispatched in this order.
Pending jobs are ordered by APS value. Jobs with system APS values are listed first, from highest to lowest APS value. Jobs with calculated APS values are listed next ordered from high to low value. Finally, jobs not in an APS queue are listed. Jobs with equal APS values are listed in order of submission time.
If queues are configured with the same priority,
bjobs -aps
may not show jobs in the correct expected dispatch order. Jobs may be dispatched in the order the queues are configured inlsb.queues
. You should avoid configuring queues with the same priority.Example bjobs -aps output
The following example uses this configuration;
- The APS only considers the job priority and queue priority for jobs from normal queue (priority 30) and short queue (priority 20)
- APS_PRIORITY = WEIGHT [[QPRIORITY, 10] [JPRIORITY, 1]]
- QUEUE_GROUP = short
- Priority queue (40) and idle queue (15) do not use APS to order jobs
- JOB_PRIORITY_OVER_TIME=5/10 in
lsb.params
- MAX_USER_PRIORITY=100 in
lsb.params
bjobs -aps
was run at 14:41:bjobs -aps
JOBID USER STAT QUEUE FROM_HOST JOB_NAME SUBMIT_TIME APS 15 User2 PEND priority HostB myjob Dec 21 14:30 - 22 User1 PEND Short HostA myjob Dec 21 14:30 (60) 2 User1 PEND Short HostA myjob Dec 21 11:00 360 12 User2 PEND normal HostB myjob Dec 21 14:30 355 4 User1 PEND Short HostA myjob Dec 21 14:00 270 5 User1 PEND Idle HostA myjob Dec 21 14:01 -For job 2, APS = 10 * 20 + 1 * (50 + 220 * 5 /10) = 360
For job 12, APS = 10 *30 + 1 * (50 + 10 * 5/10) = 355
For job 4, APS = 10 * 20 + 1 * (50 + 40 * 5 /10) = 270
View APS configuration for a queue
Run
bqueues -l
to see the current APS information for a queue:bqueues -l normal
QUEUE: normal -- No description provided. This is the default queue. PARAMETERS/STATISTICS PRIO NICE STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SSUSP USUSP RSV 500 20 Open:Active - - - - 0 0 0 0 0 0 SCHEDULING PARAMETERS r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - - - loadStop - - - - - - - - - - - SCHEDULING POLICIES: FAIRSHARE APS_PRIORITY APS_PRIORITY: WEIGHT FACTORS LIMIT FACTORS GRACE PERIOD FAIRSHARE 10000.00 - - RESOURCE 101010.00 - 1010h PROCESSORS -10.01 - - MEMORY 1000.00 20010.00 3h SWAP 10111.00 - - WORK 1.00 - - JOB PRIORITY -999999.00 10000.00 4131s QUEUE PRIORITY 10000.00 10.00 - USER_SHARES: [user1, 10] SHARE_INFO_FOR: normal/ USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME user1 10 3.333 0 0 0.0 0 USERS: all HOSTS: all REQUEUE_EXIT_VALUES: 10Feature interactions
Fairshare
The default user-based fairshare can be a factor in APS calculation by adding the FS factor to APS_PRIORITY in the queue.
- APS cannot be used together with DISPATCH_ORDER=QUEUE.
- APS cannot be used together with cross-queue fairshare (FAIRSHARE_QUEUES). The QUEUE_GROUP parameter replaces FAIRSHARE_QUEUES, which is obsolete in LSF 7.0.
- APS cannot be used together with queue-level fairshare or host-partition fairshare.
FCFS
APS overrides the job sort result of FCFS.
SLA scheduling
APS cannot be used together with SLA scheduling.
Job requeue
All requeued jobs are treated as newly submitted jobs for APS calculation. The job priority, system, and ADMIN APS factors are reset on requeue.
Rerun jobs
Rerun jobs are not treated the same as requeued jobs. A job typically reruns because the host failed, not through some user action (like job requeue), so the job priority is not reset for rerun jobs.
Job migration
Suspended (
bstop
) jobs and migrated jobs (bmig
) are always scheduled before pending jobs. For migrated jobs, LSF keeps the existing job priority information.If LSB_REQUEUE_TO_BOTTOM and LSB_MIG2PEND are configured in
lsf.conf
, the migrated jobs keep their APS information. When LSB_REQUEUE_TO_BOTTOM and LSB_MIG2PEND are configured, the migrated jobs need to compete with other pending jobs based on the APS value. If you want to reset the APS value, the you should usebrequeue
, notbmig
.Resource reservation
The resource reservation is based on queue policies. The APS value does not affect current resource reservation policy.
Preemption
The preemption is based on queue policies. The APS value does not affect the current preemption policy.
Chunk jobs
The first chunk job to be dispatched is picked based on the APS priority. Other jobs in the chunk is picked based on the APS priority and the default chunk job scheduling policies.
The following job properties must be the same for all chunk jobs:
- Submitting user
- Resource requirements
- Host requirements
- Queue or application profile
- Job priority
Backfill scheduling
Not affected.
Advance reservation
Not affected.
Resizable jobs
For new resizable job allocation requests, the resizable job inherits the APS value from the original job. The subsequent calculations use factors as follows:
Factor or sub-factor Behavior FAIRSHARE Resizable jobs submitting into fairshare queues or host partitions are subject to fairshare scheduling policies. The dynamic priority of the user who submitted the job is the most important criterion. LSF treats pending resize allocation requests as a regular job and enforces the fairshare user priority policy to schedule them.The dynamic priority of users depends on:
- Their share assignment
- The slots their jobs are currently consuming
- The resources their jobs consumed in the past
- The adjustment made by the fairshare plugin (libfairshareadjust.*)
Resizable job allocation changes affect the user priority calculation if RUN_JOB_FACTOR is greater than zero (0). Resize add requests increase number of slots in use and decrease user priority. Resize release requests decrease number of slots in use, and increase user priority. The faster a resizable job grows, the lower the user priority is, the less likely a pending allocation request can get more slots. MEM Use the value inherited from the original job PROC Use the MAX value of the resize request SWAP Use the value inherited from the original job JPRIORITY Use the value inherited from the original job. If the automatic job priority escalation is configured, the dynamic value is calculated as described in Automatic Job Priority Escalation.For a requeued and rerun resizable jobs, the JPRIORITY is reset, and the new APS value is calculated with the new JPRIORITY.For migrated resizable job, the JPRIORITY is carried forward, and the new APS value is calculated with the JPRIORITY continued from the original value. QPRIORITY Use the value inherited from the original job ADMIN Use the value inherited from the original job
Platform Computing Inc.
www.platform.com |
Knowledge Center Contents Previous Next Index |