Knowledge Center Contents Previous Next Index |
Fairshare Scheduling
To configure any kind of fairshare scheduling, you should understand the following concepts:
- User share assignments
- Dynamic share priority
- Job dispatch order
You can configure fairshare at either host level or queue level. If you require more control, you can implement hierarchical fairshare. You can also set some additional restrictions when you submit a job.
To get ideas about how to use fairshare scheduling to do different things, see Ways to Configure Fairshare.
Contents
- Basic Concepts
- Understanding Fairshare Scheduling
- User Share Assignments
- Dynamic User Priority
- How Fairshare Affects Job Dispatch Order
- User-based Fairshare
- Host Partition User-based Fairshare
- Queue-level User-based Fairshare
- Cross-queue User-based Fairshare
- Hierarchical User-based Fairshare
- Queue-based Fairshare
- Advanced Topics
Understanding Fairshare Scheduling
By default, LSF considers jobs for dispatch in the same order as they appear in the queue (which is not necessarily the order in which they are submitted to the queue). This is called first-come, first-served (FCFS) scheduling.
Fairshare scheduling divides the processing power of the LSF cluster among users and queues to provide fair access to resources, so that no user or queue can monopolize the resources of the cluster and no queue will be starved.
If your cluster has many users competing for limited resources, the FCFS policy might not be enough. For example, one user could submit many long jobs at once and monopolize the cluster's resources for a long time, while other users submit urgent jobs that must wait in queues until all the first user's jobs are all done. To prevent this, use fairshare scheduling to control how resources should be shared by competing users.
Fairshare is not necessarily equal share: you can assign a higher priority to the most important users. If there are two users competing for resources, you can:
- Give all the resources to the most important user
- Share the resources so the most important user gets the most resources
- Share the resources so that all users have equal importance
Queue-level vs. host partition fairshare
You can configure fairshare at either the queue level or the host level. However, these types of fairshare scheduling are mutually exclusive. You cannot configure queue-level fairshare and host partition fairshare in the same cluster.
If you want a user's priority in one queue to depend on their activity in another queue, you must use cross-queue fairshare or host-level fairshare.
Fairshare policies
A fairshare policy defines the order in which LSF attempts to place jobs that are in a queue or a host partition. You can have multiple fairshare policies in a cluster, one for every different queue or host partition. You can also configure some queues or host partitions with fairshare scheduling, and leave the rest using FCFS scheduling.
How fairshare scheduling works
Each fairshare policy assigns a fixed number of shares to each user or group. These shares represent a fraction of the resources that are available in the cluster. The most important users or groups are the ones with the most shares. Users who have no shares cannot run jobs in the queue or host partition.
A user's dynamic priority depends on their share assignment, the dynamic priority formula, and the resources their jobs have already consumed.
The order of jobs in the queue is secondary. The most important thing is the dynamic priority of the user who submitted the job. When fairshare scheduling is used, LSF tries to place the first job in the queue that belongs to the user with the highest dynamic priority.
User Share Assignments
Both queue-level and host partition fairshare use the following syntax to define how shares are assigned to users or user groups.
Syntax
[
user
,number_shares
]Enclose each user share assignment in square brackets, as shown. Separate multiple share assignments with a space between each set of square brackets.
user
Specify users of the queue or host partition. You can assign the shares:
- to a single user (specify
user_name
)- to users in a group, individually (specify
group_name
@) or collectively (specifygroup_name
)- to users not included in any other share assignment, individually (specify the keyword
default
) or collectively (specify the keywordothers
)By default, when resources are assigned collectively to a group, the group members compete for the resources according to FCFS scheduling. You can use hierarchical fairshare to further divide the shares among the group members.
When resources are assigned to members of a group individually, the share assignment is recursive. Members of the group and of all subgroups always compete for the resources according to FCFS scheduling, regardless of hierarchical fairshare policies.
number_shares
Specify a positive integer representing the number of shares of cluster resources assigned to the user.
The number of shares assigned to each user is only meaningful when you compare it to the shares assigned to other users, or to the total number of shares. The total number of shares is just the sum of all the shares assigned in each share assignment.
Examples
[User1, 1] [GroupB, 1]Assigns 2 shares: 1 to
User1
, and 1 to be shared by the users inGroupB
. Each user inGroupB
has equal importance.User1
is as important as all the users inGroupB
put together. In this example, it does not matter if the number of shares is 1, 6 or 600. As long asUser1
andGroupB
are both assigned the same number of shares, the relationship stays the same.[User1, 10] [GroupB@, 1]If
GroupB
contains 10 users, assigns 20 shares in total: 10 toUser1
, and 1 to each user inGroupB
. Each user inGroupB
has equal importance.User1
is ten times as important as any user inGroupB
.[User1, 10] [User2, 9] [others, 8]Assigns 27 shares: 10 to
User1
, 9 toUser2
, and 8 to the remaining users, as a group.User1
is slightly more important thanUser2
. Each of the remaining users has equal importance.
- If there are 3 users in total, the single remaining user has all 8 shares, and is almost as important as
User1
andUser2
.- If there are 12 users in total, then 10 users compete for those 8 shares, and each of them is significantly less important than
User1
andUser2
.[User1, 10] [User2, 6] [default, 4]The relative percentage of shares held by a user will change, depending on the number of users who are granted shares by default.
- If there are 3 users in total, assigns 20 shares: 10 to
User1
, 6 toUser2
, and 4 to the remaining user.User1
has half of the available resources (10 shares out of 20).- If there are 12 users in total, assigns 56 shares: 10 to
User1
, 6 toUser2
, and 4 to each of the remaining 10 users.User1
has about a fifth of the available resources (10 shares out of 56).Dynamic User Priority
LSF calculates a
dynamic user priority
for individual users or for a group, depending on how the shares are assigned. The priority is dynamic because it changes as soon as any variable in formula changes. By default, a user's dynamic priority gradually decreases after a job starts, and the dynamic priority immediately increases when the job finishes.How LSF calculates dynamic priority
By default, LSF calculates the dynamic priority for each user based on:
- The number of shares assigned to the user
- The resources used by jobs belonging to the user:
- Number of job slots reserved and in use
- Run time of running jobs
- Cumulative actual CPU time (not normalized), adjusted so that recently used CPU time is weighted more heavily than CPU time used in the distant past
If you enable additional functionality, the formula can also involve additional resources used by jobs belonging to the user:
- Historical run time of finished jobs
- Committed run time, specified at job submission with the
-W
option ofbsub
, or in the queue with the RUNLIMIT parameter inlsb.queues
- Memory usage adjustment made by the fairshare plugin (
libfairshareadjust.*
).How LSF measures fairshare resource usage
LSF measures resource usage differently, depending on the type of fairshare:
- For user-based fairshare:
- For queue-level fairshare, LSF measures the resource consumption of all the user's jobs in the queue. This means a user's dynamic priority can be different in every queue.
- For host partition fairshare, LSF measures resource consumption for all the user's jobs that run on hosts in the host partition. This means a user's dynamic priority is the same in every queue that uses hosts in the same partition.
- For queue-based fairshare, LSF measures the resource consumption of all jobs in each queue.
Default dynamic priority formula
By default, LSF calculates dynamic priority according to the following formula:
dynamic priority =
number_shares
/ (cpu_time
*CPU_TIME_FACTOR
+run_time
*RUN_TIME_FACTOR
+ (1 +job_slots
)*
RUN_JOB_FACTOR
+fairshare_adjustment
*FAIRSHARE_ADJUSTMENT_FACTOR
)
note:
The maximum value of dynamic user priority is 100 times the number of user shares (if the denominator in the calculation is less than 0.01, LSF rounds up to 0.01).For
cpu_time
,run_time
, andjob_slots
, LSF uses the total resource consumption of all the jobs in the queue or host partition that belong to the user or group.number_shares
The number of shares assigned to the user.
cpu_time
The cumulative CPU time used by the user (measured in hours). LSF calculates the cumulative CPU time using the actual (not normalized) CPU time and a decay factor such that 1 hour of recently-used CPU time decays to 0.1 hours after an interval of time specified by HIST_HOURS in
lsb.params
(5 hours by default).run_time
The total run time of running jobs (measured in hours).
job_slots
The number of job slots reserved and in use.
fairshare_adjustment
The adjustment calculated by the fairshare adjustment plugin (libfairshareadjust.*).
Configuring the default dynamic priority
You can give additional weight to the various factors in the priority calculation by setting the following parameters in
lsb.params
.
- CPU_TIME_FACTOR
- RUN_TIME_FACTOR
- RUN_JOB_FACTOR
- FAIRSHARE_ADJUSTMENT_FACTOR
- HIST_HOURS
If you modify the parameters used in the dynamic priority formula, it affects every fairshare policy in the cluster.
CPU_TIME_FACTOR
The CPU time weighting factor.
Default
: 0.7RUN_TIME_FACTOR
The run time weighting factor.
Default
: 0.7RUN_JOB_FACTOR
The job slots weighting factor.
Default
: 3FAIRSHARE_ADJUSTMENT_FACTOR
The fairshare plugin (
libfairshareadjust.*
) weighting factor.
Default
: 0HIST_HOURS
Interval for collecting resource consumption history
Default
: 5Customizing the dynamic priority
In some cases the dynamic priority equation may require adjustments beyond the run time, cpu time, and job slot dependencies provided by default. The fairshare adjustment plugin is open source and can be customized once you identify specific requirements for dynamic priority.
All information used by the default priority equation (except the user shares) is passed to the fairshare plugin. In addition, the fairshare plugin is provided with current memory use over the entire cluster and the average memory allocated to a slot in the cluster.
note:
If you modify the parameters used in the dynamic priority formula, it affects every fairshare policy in the cluster. The fairshare adjustment plugin (libfairshareadjust.*
) is not queue-specific.Example
Jobs assigned to a single slot on a host can consume host memory to the point that other slots on the hosts are left unusable. The default dynamic priority calculation considers job slots used, but doesn't account for unused job slots effectively blocked by another job.
The fairshare adjustment plugin example code provided by Platform LSF is found in the examples directory of your installation, and implements a memory-based dynamic priority adjustment as follows:
fairshare adjustment= (1+
slots
)*((total_memory
/slots
)/(slot_memory
*THRESHOLD
))slots
The number of job slots in use by started jobs.
total_memory
The total memory in use by started jobs.
slot_memory
The average memory allocated per slot.
THRESHOLD
The memory threshold set in the fairshare adjustment plugin.
How Fairshare Affects Job Dispatch Order
Within a queue, jobs are dispatched according to the queue's scheduling policy.
- For FCFS queues, the dispatch order depends on the order of jobs in the queue (which depends on job priority and submission time, and can also be modified by the job owner).
- For fairshare queues, the dispatch order depends on dynamic share priority, then order of jobs in the queue (which is not necessarily the order in which they are submitted to the queue).
A user's priority gets higher when they use less than their fair share of the cluster's resources. When a user has the highest priority, LSF considers one of their jobs first, even if other users are ahead of them in the queue.
If there are only one user's jobs pending, and you do not use hierarchical fairshare, then there is no resource contention between users, so the fairshare policies have no effect and jobs are dispatched as usual.
Job dispatch order among queues of equivalent priority
The order of dispatch depends on the order of the queues in the queue configuration file. The first queue in the list is the first to be scheduled.
Jobs in a fairshare queue are always considered as a group, so the scheduler attempts to place all jobs in the queue before beginning to schedule the next queue.
Jobs in an FCFS queue are always scheduled along with jobs from other FCFS queues of the same priority (as if all the jobs belonged to the same queue).
Example
In a cluster, queues A, B, and C are configured in that order and have equal queue priority.
Jobs with equal job priority are submitted to each queue in this order: C B A B A.
- If all queues are FCFS queues, order of dispatch is C B A B A (queue A is first; queues B and C are the same priority as A; all jobs are scheduled in FCFS order).
- If all queues are fairshare queues, order of dispatch is AA BB C (queue A is first; all jobs in the queue are scheduled; then queue B, then C).
- If A and C are fairshare, and B is FCFS, order of dispatch is AA B B C (queue A jobs are scheduled according to user priority; then queue B jobs are scheduled in FCFS order; then queue C jobs are scheduled according to user priority)
- If A and C are FCFS, and B is fairshare, order of dispatch is C A A BB (queue A is first; queue A and C jobs are scheduled in FCFS order, then queue B jobs are scheduled according to user priority)
- If any of these queues uses cross-queue fairshare, the other queues must also use cross-queue fairshare and belong to the same set, or they cannot have the same queue priority. For more information, see Cross-queue User-based Fairshare.
Host Partition User-based Fairshare
User-based fairshare policies configured at the host level handle resource contention across multiple queues.
You can define a different fairshare policy for every host partition. If multiple queues use the host partition, a user has the same priority across multiple queues.
To run a job on a host that has fairshare, users must have a share assignment (USER_SHARES in the
HostPartition
section oflsb.hosts
). Even cluster administrators cannot submit jobs to a fairshare host if they do not have a share assignment.View host partition information
- Use
bhpart
to view the following information:
- Host partitions configured in your cluster
- Number of shares (for each user or group in a host partition)
- Dynamic share priority (for each user or group in a host partition)
- Number of started jobs
- Number of reserved jobs
- CPU time, in seconds (cumulative CPU time for all members of the group, recursively)
- Run time, in seconds (historical and actual run time for all members of the group, recursively)
%bhpart Partition1
HOST_PARTITION_NAME: Partition1 HOSTS: hostA hostB hostC SHARE_INFO_FOR: Partition1/ USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME group1 100 5.440 5 0 200.0 1324Configure host partition fairshare scheduling
- To configure host partition fairshare, define a host partition in
lsb.hosts
.Use the following format.
Begin HostPartition HPART_NAME = Partition1 HOSTS = hostA hostB ~hostC USER_SHARES = [groupA@, 3] [groupB, 7] [default, 1] End HostPartition
- A host cannot belong to multiple partitions.
- Optional: Use the reserved host name
all
to configure a single partition that applies to all hosts in a cluster.- Optional: Use the not operator (
~
) to exclude hosts or host groups from the list of hosts in the host partition.- Hosts in a host partition cannot participate in queue-based fairshare.
- Hosts that are not included in any host partition are controlled by FCFS scheduling policy instead of fairshare scheduling policy.
Queue-level User-based Fairshare
User-based fairshare policies configured at the queue level handle resource contention among users in the same queue. You can define a different fairshare policy for every queue, even if they share the same hosts. A user's priority is calculated separately for each queue.
To submit jobs to a fairshare queue, users must be allowed to use the queue (USERS in
lsb.queues
) and must have a share assignment (FAIRSHARE inlsb.queues
). Even cluster and queue administrators cannot submit jobs to a fairshare queue if they do not have a share assignment.View queue-level fairshare information
- To find out if a queue is a fairshare queue, run
bqueues -l
. If you see "USER_SHARES" in the output, then a fairshare policy is configured for the queue.Configure queue-level fairshare
- To configure a fairshare queue, define FAIRSHARE in
lsb.queues
and specify a share assignment for all users of the queue:FAIRSHARE = USER_SHARES[[user
,number_shares
]...]
- You must specify at least one user share assignment.
- Enclose the list in square brackets, as shown.
- Enclose each user share assignment in square brackets, as shown.
Cross-queue User-based Fairshare
User-based fairshare policies configured at the queue level handle resource contention across multiple queues.
Applying the same fairshare policy to several queues
With cross-queue fairshare, the same user-based fairshare policy can apply to several queues can at the same time. You define the fairshare policy in a
master queue
and listslave queues
to which the same fairshare policy applies; slave queues inherit the same fairshare policy as your master queue. For job scheduling purposes, this is equivalent to having one queue with one fairshare tree.In this way, if a user submits jobs to different queues, user priority is calculated by taking into account all the jobs the user has submitted across the defined queues.
To submit jobs to a fairshare queue, users must be allowed to use the queue (USERS in
lsb.queues
) and must have a share assignment (FAIRSHARE inlsb.queues
). Even cluster and queue administrators cannot submit jobs to a fairshare queue if they do not have a share assignment.User and queue priority
By default, a user has the same priority across the master and slave queues. If the same user submits several jobs to these queues, user priority is calculated by taking into account all the jobs the user has submitted across the master-slave set.
If DISPATCH_ORDER=QUEUE is set in the master queue, jobs are dispatched according to queue priorities first, then user priority. This avoids having users with higher fairshare priority getting jobs dispatched from low-priority queues.
Jobs from users with lower fairshare priorities who have pending jobs in higher priority queues are dispatched before jobs in lower priority queues. Jobs in queues having the same priority are dispatched according to user priority.
Queues that are not part of the ordered cross-queue fairshare can have any priority. Their priority can fall within the priority range of cross-queue fairshare queues and they can be inserted between two queues using the same fairshare tree.
View cross-queue fairshare information
- Run
bqueues -l
to know if a queue is part of cross-queue fairshare.The FAIRSHARE_QUEUES parameter indicates cross-queue fairshare. The first queue listed in the FAIRSHARE_QUEUES parameter is the master queue-the queue in which fairshare is configured; all other queues listed inherit the fairshare policy from the master queue.
All queues that participate in the same cross-queue fairshare display the same fairshare information (SCHEDULING POLICIES, FAIRSHARE_QUEUES, USER_SHARES, SHARE_INFO_FOR) when
bqueues -l
is used. Fairshare information applies to all the jobs running in all the queues in the master-slave set.
bqueues -l
also displays DISPATCH_ORDER in the master queue if it is defined.bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP normal 30 Open:Active - - - - 1 1 0 0 short 40 Open:Active - 4 2 - 1 0 1 0 license 50 Open:Active 10 1 1 - 1 0 1 0bqueues -l normal
QUEUE: normal -- For normal low priority jobs, running only if hosts are lightly loaded. This is the default queue. PARAMETERS/STATISTICS PRIO NICE STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SSUSP USUSP RSV 30 20 Open:Inact_Win - - - - 1 1 0 0 0 0 SCHEDULING PARAMETERS r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - - - loadStop - - - - - - - - - - - cpuspeed bandwidth loadSched - - loadStop - - SCHEDULING POLICIES: FAIRSHARE FAIRSHARE_QUEUES: normal short license USER_SHARES: [user1, 100] [default, 1] SHARE_INFO_FOR: normal/ USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME ADJUST user1 100 9.645 2 0 0.2 7034 0.000 USERS: all users HOSTS: all ...bqueues -l short
QUEUE: short PARAMETERS/STATISTICS PRIO NICE STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SSUSP USUSP RSV 40 20 Open:Inact_Win - 4 2 - 1 0 1 0 0 0 SCHEDULING PARAMETERS r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - - - loadStop - - - - - - - - - - - cpuspeed bandwidth loadSched - - loadStop - - SCHEDULING POLICIES: FAIRSHARE FAIRSHARE_QUEUES: normal short license USER_SHARES: [user1, 100] [default, 1] SHARE_INFO_FOR: short/ USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME user1 100 9.645 2 0 0.2 7034 USERS: all users HOSTS: all ...Configuring cross-queue fairshare
Considerations
- FAIRSHARE must be defined in the master queue. If it is also defined in the queues listed in FAIRSHARE_QUEUES, it will be ignored.
- Cross-queue fairshare can be defined more than once within
lsb.queues
. You can define several sets of master-slave queues. However, a queue cannot belong to more than one master-slave set. For example, you can define:
- In master queue
normal
:FAIRSHARE_QUEUES=short license
- In master queue
priority
:FAIRSHARE_QUEUES= night owners
- You cannot, however, define
night
,owners
, orpriority
as slaves in thenormal
queue; ornormal
,short
andlicense
as slaves in thepriority
queue; orshort
,license
,night
,owners
as master queues of their own.- Cross-queue fairshare cannot be used with host partition fairshare. It is part of queue-level fairshare.
Configure cross-queue fairshare
- Decide to which queues in your cluster cross-queue fairshare will apply.
For example, in your cluster you may have the queues
normal
,priority
,short
, andlicense
and you want cross-queue fairshare to apply only tonormal
,license
, andshort
.- Define fairshare policies in your master queue.
In the queue you want to be the master, for example
normal
, define the following inlsb.queues
:
- FAIRSHARE and specify a share assignment for all users of the queue.
- FAIRSHARE_QUEUES and list slave queues to which the defined fairshare policy will also apply
- PRIORITY to indicate the priority of the queue.
Begin Queue QUEUE_NAME = queue1 PRIORITY = 30 NICE = 20 FAIRSHARE = USER_SHARES[[user1,100] [default,1]] FAIRSHARE_QUEUES = queue2 queue3 DESCRIPTION = For normal low priority jobs, running only if hosts are lightly loaded. End Queue- In all the slave queues listed in FAIRSHARE_QUEUES, define all queue values as desired.
For example:
Begin Queue QUEUE_NAME = queue2 PRIORITY = 40 NICE = 20 UJOB_LIMIT = 4 PJOB_LIMIT = 2 End Queue Begin Queue QUEUE_NAME = queue3 PRIORITY = 50 NICE = 10 PREEMPTION = PREEMPTIVE QJOB_LIMIT = 10 UJOB_LIMIT = 1 PJOB_LIMIT = 1 End QueueControlling job dispatch order in cross-queue fairshare
DISPATCH_ORDER parameter (lsb.queues)
Use DISPATCH_ORDER=QUEUE in the master queue to define an
ordered
cross-queue fairshare set. DISPATCH_ORDER indicates that jobs are dispatched according to the order of queue priorities, not user fairshare priority.Priority range in cross-queue fairshare
By default, the range of priority defined for queues in cross-queue fairshare cannot be used with any other queues. The priority of queues that are not part of the cross-queue fairshare cannot fall between the priority range of cross-queue fairshare queues.
For example, you have 4 queues:
queue1
,queue2
,queue3
, andqueue4
. You configure cross-queue fairshare forqueue1
,queue2
, andqueue3
, and assign priorities of 30, 40, 50 respectively. The priority ofqueue4
(which is not part of the cross-queue fairshare) cannot fall between 30 and 50, but it can be any number up to 29 or higher than 50. It does not matter ifqueue4
is a fairshare queue or FCFS queue.If DISPATCH_ORDER=QUEUE is set in the master queue, queues that are not part of the ordered cross-queue fairshare can have any priority. Their priority can fall within the priority range of cross-queue fairshare queues and they can be inserted between two queues using the same fairshare tree. In the example above,
queue4
can have any priority, including a priority falling between the priority range of the cross-queue fairshare queues (30-50).Jobs from equal priority queues
- If two or more
non-fairshare
queues have the same priority, their jobs are dispatched first-come, first-served based on submission time or job ID as if they come from the same queue.- If two or more
fairshare
queues have the same priority, jobs are dispatched in the order the queues are listed inlsb.queues
.Hierarchical User-based Fairshare
For both queue and host partitions, hierarchical user-based fairshare lets you allocate resources to users in a hierarchical manner.
By default, when shares are assigned to a group, group members compete for resources according to FCFS policy. If you use hierarchical fairshare, you control the way shares that are assigned collectively are divided among group members.
If groups have subgroups, you can configure additional levels of share assignments, resulting in a multi-level share tree that becomes part of the fairshare policy.
How hierarchical fairshare affects dynamic share priority
When you use hierarchical fairshare, the dynamic share priority formula does not change, but LSF measures the resource consumption for all levels of the share tree. To calculate the dynamic priority of a group, LSF uses the resource consumption of all the jobs in the queue or host partition that belong to users in the group and all its subgroups, recursively.
How hierarchical fairshare affects job dispatch order
LSF uses the dynamic share priority of a user or group to find out which user's job to run next. If you use hierarchical fairshare, LSF works through the share tree from the top level down, and compares the dynamic priority of users and groups at each level, until the user with the highest dynamic priority is a single user, or a group that has no subgroups.
View hierarchical share information for a group
- Use
bugroup -l
to find out if you belong to a group, and what the share distribution is.bugroup -l
GROUP_NAME: group1 USERS: group2/ group3/ SHARES: [group2,20] [group3,10] GROUP_NAME: group2 USERS: user1 user2 user3 SHARES: [others,10] [user3,4] GROUP_NAME: group3 USERS: all SHARES: [user2,10] [default,5]This command displays all the share trees that are configured, even if they are not used in any fairshare policy.
View hierarchical share information for a host partition
By default,
bhpart
displays only the top level share accounts associated with the partition.
- Use
bhpart -r
to display the group information recursively.The output lists all the groups in the share tree, starting from the top level, and displays the following information:
- Number of shares
- Dynamic share priority (LSF compares dynamic priorities of users who belong to same group, at the same level)
- Number of started jobs
- Number of reserved jobs
- CPU time, in seconds (cumulative CPU time for all members of the group, recursively)
- Run time, in seconds (historical and actual run time for all members of the group, recursively)
bhpart -r Partition1
HOST_PARTITION_NAME: Partition1 HOSTS: HostA SHARE_INFO_FOR: Partition1/ USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME group1 40 1.867 5 0 48.4 17618 group2 20 0.775 6 0 607.7 24664 SHARE_INFO_FOR: Partition1/group2/ USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME user1 8 1.144 1 0 9.6 5108 user2 2 0.667 0 0 0.0 0 others 1 0.046 5 0 598.1 19556Configuring hierarchical fairshare
To define a hierarchical fairshare policy, configure the top-level share assignment in
lsb.queues
orlsb.hosts
, as usual. Then, for any group of users affected by the fairshare policy, configure a share tree in theUserGroup
section oflsb.users
. This specifies how shares assigned to the group, collectively, are distributed among the individual users or subgroups.If shares are assigned to members of any group individually, using
@
, there can be no further hierarchical fairshare within that group. The shares are assigned recursively to all members of all subgroups, regardless of further share distributions defined inlsb.users
. The group members and members of all subgroups compete for resources according to FCFS policy.You can choose to define a hierarchical share tree for some groups but not others. If you do not define a share tree for any group or subgroup, members compete for resources according to FCFS policy.
Configure a share tree
- Group membership is already defined in the
UserGroup
section oflsb.users
. To configure a share tree, use theUSER_SHARES
column to describe how the shares are distributed in a hierachical manner. Use the following format.Begin UserGroup GROUP_NAME GROUP_MEMBER USER_SHARES GroupB (User1 User2) () GroupC (User3 User4) ([User3, 3] [User4, 4]) GroupA (GroupB GroupC User5) ([User5, 1] [default, 10]) End UserGroup
- User groups must be defined before they can be used (in the
GROUP_MEMBER
column) to define other groups.- Enclose the share assignment list in parentheses, as shown, even if you do not specify any user share assignments.
An
Engineering
queue or host partition organizes users hierarchically, and divides the shares as shown. It does not matter what the actual number of shares assigned at each level is.
The Development group gets the largest share (50%) of the resources in the event of contention. Shares assigned to the Development group can be further divided among the Systems, Application, and Test groups, which receive 15%, 35%, and 50%, respectively. At the lowest level, individual users compete for these shares as usual.
One way to measure a user's importance is to multiply their percentage of the resources at every level of the share tree. For example,
User1
is entitled to 10% of the available resources (.50 x .80 x .25 = .10) andUser3
is entitled to 4% (.80 x .20 x .25 = .04). However, if Research has the highest dynamic share priority among the 3 groups at the top level, and ChipY has a higher dynamic priority than ChipX, the next comparison is betweenUser3
andUser4
, so the importance ofUser1
is not relevant. The dynamic priority ofUser1
is not even calculated at this point.Queue-based Fairshare
When a priority is set in a queue configuration, a high priority queue tries to dispatch as many jobs as it can before allowing lower priority queues to dispatch any job. Lower priority queues are blocked until the higher priority queue cannot dispatch any more jobs. However, it may be desirable to give some preference to lower priority queues and regulate the flow of jobs from the queue.
Queue-based fairshare
allows flexible slot allocation per queue as an alternative to absolute queue priorities by enforcing asoft job slot limit
on a queue. This allows you to organize the priorities of your work and tune the number of jobs dispatched from a queue so that no single queue monopolizes cluster resources, leaving other queues waiting to dispatch jobs.You can balance the distribution of job slots among queues by configuring a ratio of jobs waiting to be dispatched from each queue. LSF then attempts to dispatch a certain percentage of jobs from each queue, and does not attempt to drain the highest priority queue entirely first.
When queues compete, the allocated slots per queue are kept within the limits of the configured share. If only one queue in the pool has jobs, that queue can use all the available resources and can span its usage across all hosts it could potentially run jobs on.
Managing pools of queues
You can configure your queues into a
pool
, which is a named group of queues using the same set of hosts. A pool is entitled to a slice of the available job slots. You can configure as many pools as you need, but each pool must use the same set of hosts. There can be queues in the cluster that do not belong to any pool yet share some hosts used by a pool.How LSF allocates slots for a pool of queues
During job scheduling, LSF orders the queues within each pool based on the shares the queues are entitled to. The number of running jobs (or job slots in use) is maintained at the percentage level specified for the queue. When a queue has no pending jobs, leftover slots are redistributed to other queues in the pool with jobs pending.
The total number of slots in each pool is constant; it is equal to the number of slots in use plus the number of free slots to the maximum job slot limit configured either in
lsb.hosts
(MXJ) or inlsb.resources
for a host or host group. The accumulation of slots in use by the queue is used in ordering the queues for dispatch.Job limits and host limits are enforced by the scheduler. For example, if LSF determines that a queue is eligible to run 50 jobs, but the queue has a job limit of 40 jobs, no more than 40 jobs will run. The remaining 10 job slots are redistributed among other queues belonging to the same pool, or make them available to other queues that are configured to use them.
Accumulated slots in use
As queues run the jobs allocated to them, LSF accumulates the slots each queue has used and decays this value over time, so that each queue is not allocated more slots than it deserves, and other queues in the pool have a chance to run their share of jobs.
Interaction with other scheduling policies
- Queues participating in a queue-based fairshare pool cannot be preemptive or preemptable.
- You should not configure slot reservation (SLOT_RESERVE) in queues that use queue-based fairshare.
- Cross-queue user-based fairshare (FAIRSHARE_QUEUES) can undo the dispatching decisions of queue-based fairshare. Cross-queue user-based fairshare queues should not be part of a queue-based fairshare pool.
Examples
Three queues using two hosts each with maximum job slot limit of 6 for a total of 12 slots to be allocated:
queue1
shares 50% of slots to be allocated = 2 * 6 * 0.5 = 6 slotsqueue2
shares 30% of slots to be allocated = 2 * 6 * 0.3 = 3.6 -> 4 slotsqueue3
shares 20% of slots to be allocated = 2 * 6 * 0.2 = 2.4 -> 3 slots; however, since the total cannot be more than 12,queue3
is actually allocated only 2 slots.Four queues using two hosts each with maximum job slot limit of 6 for a total of 12 slots;
queue4
does not belong to any pool.
queue1
shares 50% of slots to be allocated = 2 * 6 * 0.5 = 6queue2
shares 30% of slots to be allocated = 2 * 6 * 0.3 = 3.6 -> 4queue3
shares 20% of slots to be allocated = 2 * 6 * 0.2 = 2.4 -> 2queue4
shares no slots with other queues
queue4
causes the total number of slots to be less than the total free and in use by thequeue1
,queue2
, andqueue3
that do belong to the pool. It is possible that the pool may get all its shares used up byqueue4
, and jobs from the pool will remain pending.
queue1
,queue2
, andqueue3
belong to one pool,queue6
,queue7
, andqueue8
belong to another pool, andqueue4
andqueue5
do not belong to any pool.LSF orders the queues in the two pools from higher-priority queue to lower-priority queue (
queue1
is highest andqueue8
is lowest):queue1 -> queue2 -> queue3 -> queue6 -> queue7 -> queue8If the queue belongs to a pool, jobs are dispatched from the highest priority queue first. Queues that do not belong to any pool (
queue4
andqueue5
) are merged into this ordered list according to their priority, but LSF dispatches as many jobs from the non-pool queues as it can:queue1 -> queue2 -> queue3 -> queue4 -> queue5 -> queue6 -> queue7 -> queue8Configuring Slot Allocation per Queue
Configure as many pools as you need in
lsb.queues
.SLOT_SHARE parameter
The SLOT_SHARE parameter represents the percentage of running jobs (job slots) in use from the queue. SLOT_SHARE must be greater than zero (0) and less than or equal to 100.
The sum of SLOT_SHARE for all queues in the pool does not need to be 100%. It can be more or less, depending on your needs.
SLOT_POOL parameter
The SLOT_POOL parameter is the name of the pool of job slots the queue belongs to. A queue can only belong to one pool. All queues in the pool must share the same set of hosts.
Host job slot limit
The hosts used by the pool must have a maximum job slot limit, configured either in
lsb.hosts
(MXJ) orlsb.resources
(HOSTS and SLOTS).Configure slot allocation per queue
- For each queue that uses queue-based fairshare, define the following in
lsb.queues
:
- SLOT_SHARE
- SLOT_POOL
- Optional: Define the following in
lsb.queues
for each queue that uses queue-based fairshare:
- HOSTS to list the hosts that can receive jobs from the queue
If no hosts are defined for the queue, the default is all hosts.
tip:
Hosts for queue-based fairshare cannot be in a host partition.- PRIORITY to indicate the priority of the queue.
- For each host used by the pool, define a maximum job slot limit, either in
lsb.hosts
(MXJ) orlsb.resources
(HOSTS and SLOTS).Configure two pools
The following example configures pool A with three queues, with different shares, using the hosts in host group groupA:
Begin Queue QUEUE_NAME = queue1 PRIORITY = 50 SLOT_POOL = poolA SLOT_SHARE = 50 HOSTS = groupA ... End Queue Begin Queue QUEUE_NAME = queue2 PRIORITY = 48 SLOT_POOL = poolA SLOT_SHARE = 30 HOSTS = groupA ... End Queue Begin Queue QUEUE_NAME = queue3 PRIORITY = 46 SLOT_POOL = poolA SLOT_SHARE = 20 HOSTS = groupA ... End QueueThe following configures a pool named
poolB
, with three queues with equal shares, using the hosts in host groupgroupB
:Begin Queue QUEUE_NAME = queue4 PRIORITY = 44 SLOT_POOL = poolB SLOT_SHARE = 30 HOSTS = groupB ... End Queue Begin Queue QUEUE_NAME = queue5 PRIORITY = 43 SLOT_POOL = poolB SLOT_SHARE = 30 HOSTS = groupB ... End Queue Begin Queue QUEUE_NAME = queue6 PRIORITY = 42 SLOT_POOL = poolB SLOT_SHARE = 30 HOSTS = groupB ... End QueueView Queue-based Fairshare Allocations
View configured job slot share
- Use
bqueues -l
to show the job slot share (SLOT_SHARE) and the hosts participating in the share pool (SLOT_POOL):QUEUE: queue1 PARAMETERS/STATISTICS PRIO NICE STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SSUSP USUSP RSV 50 20 Open:Active - - - - 0 0 0 0 0 0 Interval for a host to accept two jobs is 0 seconds STACKLIMIT MEMLIMIT 2048 K 5000 K SCHEDULING PARAMETERS r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - - - loadStop - - - - - - - - - - - cpuspeed bandwidth loadSched - - loadStop - - USERS: all users HOSTS: groupA/ SLOT_SHARE: 50% SLOT_POOL: poolAView slot allocation of running jobs
- Use
bhosts
,bmgroup
, andbqueues
to verify how LSF maintains the configured percentage of running jobs in each queue.The queues configurations above use the following hosts groups:
bmgroup -r
GROUP_NAME HOSTS groupA hosta hostb hostc groupB hostd hoste hostfEach host has a maximum job slot limit of 5, for a total of 15 slots available to be allocated in each group:
bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV hosta ok - 5 5 5 0 0 0 hostb ok - 5 5 5 0 0 0 hostc ok - 5 5 5 0 0 0 hostd ok - 5 5 5 0 0 0 hoste ok - 5 5 5 0 0 0 hostf ok - 5 5 5 0 0 0Pool named
poolA
containsqueue1,
queue2
, andqueue3
.poolB
containsqueue4,
queue5, and
queue6. The
bqueues
command shows the number of running jobs in each queue:bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP queue1 50 Open:Active - - - - 492 484 8 0 queue2 48 Open:Active - - - - 500 495 5 0 queue3 46 Open:Active - - - - 498 496 2 0 queue4 44 Open:Active - - - - 985 980 5 0 queue5 43 Open:Active - - - - 985 980 5 0 queue6 42 Open:Active - - - - 985 980 5 0As a result:
queue1
has a 50% share and can run 8 jobs; queue2
has a 30% share and can run 5 jobs;queue3
has a 20% share and is entitled 3 slots, but since the total number of slots available must be 15, it can run 2 jobs; queue4
,queue5
, andqueue6
all share 30%, so 5 jobs are running in each queue.Typical Slot Allocation Scenarios
3 queues with SLOT_SHARE 50%, 30%, 20%, with 15 job slots
This scenario has three phases:
- All three queues have jobs running, and LSF assigns the number of slots to queues as expected: 8, 5, 2. Though queue
Genova
deserves 3 slots, the total slot assignment must be 15, soGenova
is allocated only 2 slots:bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP Roma 50 Open:Active - - - - 1000 992 8 0 Verona 48 Open:Active - - - - 995 990 5 0 Genova 48 Open:Active - - - - 996 994 2 0- When queue
Verona
has done its work, queuesRoma
andGenova
get their respective shares of 8 and 3. This leaves 4 slots to be redistributed to queues according to their shares: 50% (2 slots) toRoma
, 20% (1 slot) toGenova
. The one remaining slot is assigned to queueRoma
again:bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP Roma 50 Open:Active - - - - 231 221 11 0 Verona 48 Open:Active - - - - 0 0 0 0 Genova 48 Open:Active - - - - 496 491 4 0- When queues
Roma
andVerona
have no more work to do,Genova
can use all the available slots in the cluster:bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP Roma 50 Open:Active - - - - 0 0 0 0 Verona 48 Open:Active - - - - 0 0 0 0 Genova 48 Open:Active - - - - 475 460 15 0The following figure illustrates phases 1, 2, and 3:
2 pools, 30 job slots, and 2 queues out of any pool
poolA
uses 15 slots and contains queuesRoma
(50% share, 8 slots),Verona
(30% share, 5 slots), andGenova
(20% share, 2 remaining slots to total 15).poolB
with 15 slots containing queuesPisa
(30% share, 5 slots),Venezia
(30% share, 5 slots), andBologna
(30% share, 5 slots).- Two other queues
Milano
andParma
do not belong to any pool, but they can use the hosts ofpoolB
. The queues fromMilano
toBologna
all have the same priority.The queues
Milano
andParma
run very short jobs that get submitted periodically in bursts. When no jobs are running in them, the distribution of jobs looks like this:QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP Roma 50 Open:Active - - - - 1000 992 8 0 Verona 48 Open:Active - - - - 1000 995 5 0 Genova 48 Open:Active - - - - 1000 998 2 0 Pisa 44 Open:Active - - - - 1000 995 5 0 Milano 43 Open:Active - - - - 2 2 0 0 Parma 43 Open:Active - - - - 2 2 0 0 Venezia 43 Open:Active - - - - 1000 995 5 0 Bologna 43 Open:Active - - - - 1000 995 5 0
When
Milano
andParma
have jobs, their higher priority reduces the share of slots free and in use byVenezia
andBologna
:QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP Roma 50 Open:Active - - - - 992 984 8 0 Verona 48 Open:Active - - - - 993 990 3 0 Genova 48 Open:Active - - - - 996 994 2 0 Pisa 44 Open:Active - - - - 995 990 5 0 Milano 43 Open:Active - - - - 10 7 3 0 Parma 43 Open:Active - - - - 11 8 3 0 Venezia 43 Open:Active - - - - 995 995 2 0 Bologna 43 Open:Active - - - - 995 995 2 0
Round-robin slot distribution - 13 queues and 2 pools
- Pool
poolA
has 3 hosts each with 7 slots for a total of 21 slots to be shared. The first 3 queues are part of the poolpoolA
sharing the CPUs with proportions 50% (11 slots), 30% (7 slots) and 20% (3 remaining slots to total 21 slots).- The other 10 queues belong to pool
poolB
, which has 3 hosts each with 7 slots for a total of 21 slots to be shared. Each queue has 10% of the pool (3 slots).The initial slot distribution looks like this:
bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP Roma 50 Open:Active - - - - 15 6 11 0 Verona 48 Open:Active - - - - 25 18 7 0 Genova 47 Open:Active - - - - 460 455 3 0 Pisa 44 Open:Active - - - - 264 261 3 0 Milano 43 Open:Active - - - - 262 259 3 0 Parma 42 Open:Active - - - - 260 257 3 0 Bologna 40 Open:Active - - - - 260 257 3 0 Sora 40 Open:Active - - - - 261 258 3 0 Ferrara 40 Open:Active - - - - 258 255 3 0 Napoli 40 Open:Active - - - - 259 256 3 0 Livorno 40 Open:Active - - - - 258 258 0 0 Palermo 40 Open:Active - - - - 256 256 0 0 Venezia 4 Open:Active - - - - 255 255 0 0Initially, queues
Livorno
,Palermo
, andVenezia
inpoolB
are not assigned any slots because the first 7 higher priority queues have used all 21 slots available for allocation.As jobs run and each queue accumulates used slots, LSF favors queues that have not run jobs yet. As jobs finish in the first 7 queues of
poolB
, slots are redistributed to the other queues that originally had no jobs (queuesLivorno
,Palermo
, andVenezia
). The total slot count remains 21 in all queues inpoolB
.bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP Roma 50 Open:Active - - - - 15 6 9 0 V 48 Open:Active - - - - 25 18 7 0 Genova 47 Open:Active - - - - 460 455 5 0 Pisa 44 Open:Active - - - - 263 261 2 0 Milano 43 Open:Active - - - - 261 259 2 0 Parma 42 Open:Active - - - - 259 257 2 0 Bologna 40 Open:Active - - - - 259 257 2 0 Sora 40 Open:Active - - - - 260 258 2 0 Ferrara 40 Open:Active - - - - 257 255 2 0 Napoli 40 Open:Active - - - - 258 256 2 0 Livorno 40 Open:Active - - - - 258 256 2 0 Palermo 40 Open:Active - - - - 256 253 3 0 Venezia 4 Open:Active - - - - 255 253 2 0The following figure illustrates the round-robin distribution of slot allocations between queues
Livorno
andPalermo
:
How LSF rebalances slot usage
In the following examples, job runtime is not equal, but varies randomly over time.
3 queues in one pool with 50%, 30%, 20% shares
A pool configures 3 queues:
queue1
50% with short-running jobsqueue2
20% with short-running jobsqueue3
30% with longer running jobsAs
queue1
andqueue2
finish their jobs, the number of jobs inqueue3
expands, and asqueue1
andqueue2
get more work, LSF rebalances the usage:
10 queues sharing 10% each of 50 slots
In this example,
queue1
(the curve with the highest peaks) has the longer running jobs and so has less accumulated slots in use over time. LSF accordingly rebalances the load when all queues compete for jobs to maintain a configured 10% usage share.
Using Historical and Committed Run Time
By default, as a job is running, the dynamic priority decreases gradually until the job has finished running, then increases immediately when the job finishes.
In some cases this can interfere with fairshare scheduling if two users who have the same priority and the same number of shares submit jobs at the same time.
To avoid these problems, you can modify the dynamic priority calculation by using either or both of the following weighting factors:
- Historical run time decay
- Committed run time
Historical run time decay
By default, historical run time does not affect the dynamic priority. You can configure LSF so that the user's dynamic priority increases
gradually
after a job finishes. After a job is finished, its run time is saved as the historical run time of the job and the value can be used in calculating the dynamic priority, the same way LSF considers historical CPU time in calculating priority. LSF applies a decaying algorithm to the historical run time to gradually increase the dynamic priority over time after a job finishes.Configure historical run time
- Specify ENABLE_HIST_RUN_TIME=Y in
lsb.params
.Historical run time is added to the calculation of the dynamic priority so that the formula becomes the following:
dynamic priority =number_shares
/ (cpu_time
* CPU_TIME_FACTOR +run_time
* RUN_TIME_FACTOR + (1 +job_slots
)* RUN_JOB_FACTOR + fairshare_adjustment(struct*shareAdjustPair)*FAIRSHARE_ADJUSTMENT_FACTOR)
historical_run_time-
(measured in hours) of finished jobs accumulated in the user's share account file. LSF calculates the historical run time using the actual run time of finished jobs and a decay factor such that 1 hour of recently-used run time decays to 0.1 hours after an interval of time specified by HIST_HOURS inlsb.params
(5 hours by default).How mbatchd reconfiguration and restart affects historical run time
After restarting or reconfiguring
mbatchd
, the historical run time of finished jobs might be different, since it includes jobs that may have been cleaned frommbatchd
before the restart.mbatchd
restart only reads recently finished jobs fromlsb.events
, according to the value of CLEAN_PERIOD inlsb.params
. Any jobs cleaned before restart are lost and are not included in the new calculation of the dynamic priority.Example
The following fairshare parameters are configured in
lsb.params
:CPU_TIME_FACTOR = 0 RUN_JOB_FACTOR = 0 RUN_TIME_FACTOR = 1 FAIRSHARE_ADJUSTMENT_FACTOR = 0Note that in this configuration, only run time is considered in the calculation of dynamic priority. This simplifies the formula to the following:
dynamic priority =
number_shares
/(run_time
* RUN_TIME_FACTOR)Without the historical run time, the dynamic priority increases suddenly as soon as the job finishes running because the run time becomes zero, which gives no chance for jobs pending for other users to start.
When historical run time is included in the priority calculation, the formula becomes:
dynamic priority =
number_shares
/ (historical_run_time
+run_time
) * RUN_TIME_FACTOR)Now the dynamic priority increases gradually as the historical run time decays over time.
Committed run time weighting factor
Committed run time
is the run time requested at job submission with the-W
option ofbsub
, or in the queue configuration with theRUNLIMIT
parameter. By default, committed run time does not affect the dynamic priority.While the job is running, the actual run time is subtracted from the committed run time. The user's dynamic priority decreases
immediately
to its lowest expected value, and is maintained at that value until the job finishes. Job run time is accumulated as usual, and historical run time, if any, is decayed.When the job finishes, the committed run time is set to zero and the actual run time is added to the historical run time for future use. The dynamic priority increases gradually until it reaches its maximum value.
Providing a weighting factor in the run time portion of the dynamic priority calculation prevents a "job dispatching burst" where one user monopolizes job slots because of the latency in computing run time.
Configure committed run time
- Set a value for the
COMMITTED_RUN_TIME_FACTOR
parameter inlsb.params
. You should also specify aRUN_TIME_FACTOR
, to prevent the user's dynamic priority from increasing as the run time increases.If you have also enabled the use of historical run time, the dynamic priority is calculated according to the following formula:
dynamic priority =
number_shares
/ (cpu_time
*CPU_TIME_FACTOR
+ (historical_run_time
+run_time
) *RUN_TIME_FACTOR
+ (committed_run_time
-run_time
) *COMMITTED_RUN_TIME_FACTOR
+ (1 +job_slots
)*
RUN_JOB_FACTOR
+ fairshare_adjustment(struc* shareAdjustPair)*FAIRSHARE_ADJUSTMENT_FACTOR
)
committed_run_time
-The run time requested at job submission with the-W
option ofbsub
, or in the queue configuration with theRUNLIMIT
parameter. This calculation measures the committed run time in hours.In the calculation of a user's dynamic priority,
COMMITTED_RUN_TIME_FACTOR
determines the relative importance of the committed run time in the calculation. If the-W
option ofbsub
is not specified at job submission and aRUNLIMIT
has not been set for the queue, the committed run time is not considered.COMMITTED_RUN_TIME_FACTOR
can be any positive value between 0.0 and 1.0. The default value is 0.0. As the value ofCOMMITTED_RUN_TIME_FACTOR
approaches 1.0, more weight is given to the committed run time in the calculation of the dynamic priority.Limitation
If you use queue-level fairshare, and a running job has a committed run time, you should not switch that job to or from a fairshare queue (using
bswitch
). The fairshare calculations will not be correct.Run time displayed by bqueues and bhpart
The run time displayed by
bqueues
andbhpart
is the sum of the actual, accumulated run time and the historical run time, but does not include the committed run time.Example
The following fairshare parameters are configured in
lsb.params
:CPU_TIME_FACTOR = 0 RUN_JOB_FACTOR = 0 RUN_TIME_FACTOR = 1 FAIRSHARE_ADJUSTMENT_FACTOR = 0 COMMITTED_RUN_TIME_FACTOR = 1Without a committed run time factor, dynamic priority for the job owner drops gradually while a job is running:
![]()
When a committed run time factor is included in the priority calculation, the dynamic priority drops as soon as the job is dispatched, rather than gradually dropping as the job runs:
![]()
Users Affected by Multiple Fairshare Policies
If you belong to multiple user groups, which are controlled by different fairshare policies, each group probably has a different dynamic share priority at any given time. By default, if any one of these groups becomes the highest priority user, you could be the highest priority user in that group, and LSF would attempt to place your job.
To restrict the number of fairshare policies that will affect your job, submit your job and specify a single user group that your job will belong to, for the purposes of fairshare scheduling. LSF will not attempt to dispatch this job unless the group you specified is the highest priority user. If you become the highest priority user because of some other share assignment, another one of your jobs might be dispatched, but not this one.
Submit a job and specify a user group
- To associate a job with a user group for the purposes of fairshare scheduling, use
bsub -G
and specify a group that you belong to. If you use hierarchical fairshare, you must specify a group that does not contain any subgroups.Example
User1
shares resources withgroupA
andgroupB
.User1
is also a member ofgroupA
, but not any other groups.
User1
submits a job:bsub sleep 100By default, the job could be considered for dispatch if either
User1
orGroupA
has highest dynamic share priority.
User1
submits a job and associates the job withGroupA
:bsub -G groupA sleep 100If
User1
is the highest priority user, this job will not be considered.
User1
can only associate the job with a group that he is a member of.User1
cannot associate the job with his individual user account, becausebsub -G
only accepts group names.Example with hierarchical fairshare
In the share tree,
User1
shares resources withGroupA
at the top level.GroupA
has 2 subgroups, B and C.GroupC
has 1 subgroup,GroupD
.User1
also belongs toGroupB
andGroupC
.
User1
submits a job:bsub sleep 100
By default, the job could be considered for dispatch if either
User1
,GroupB
, orGroupC
has highest dynamic share priority.
User1
submits a job and associates the job withGroupB
:bsub -G groupB sleep 100
If
User1
orGroupC
is the highest priority user, this job will not be considered.
User1
cannot associate the job withGroupC
, becauseGroupC
includes a subgroup.User1
cannot associate the job with his individual user account, becausebsub -G
only accepts group names.Ways to Configure Fairshare
Global fairshare
Global fairshare balances resource usage across the entire cluster according to one single fairshare policy. Resources used in one queue affect job dispatch order in another queue.
If two users compete for resources, their dynamic share priority is the same in every queue.
Configure global fairshare
- To configure global fairshare, you must use host partition fairshare. Use the keyword
all
to configure a single partition that includes all the hosts in the cluster.Begin HostPartition HPART_NAME =GlobalPartition HOSTS = all USER_SHARES = [groupA@, 3] [groupB, 7] [default, 1] End HostPartitionChargeback fairshare
Chargeback fairshare lets competing users share the same hardware resources according to a fixed ratio. Each user is entitled to a specified portion of the available resources.
If two users compete for resources, the most important user is entitled to more resources.
Configure chargeback fairshare
- To configure chargeback fairshare, put competing users in separate user groups and assign a fair number of shares to each group.
Example
Suppose two departments contributed to the purchase of a large system. The engineering department contributed 70 percent of the cost, and the accounting department 30 percent. Each department wants to get their money's worth from the system.
- Define 2 user groups in
lsb.users
, one listing all the engineers, and one listing all the accountants.Begin UserGroup Group_Name Group_Member eng_users (user6 user4) acct_users (user2 user5) End UserGroup- Configure a host partition for the host, and assign the shares appropriately.
Begin HostPartition HPART_NAME = big_servers HOSTS = hostH USER_SHARES = [eng_users, 7] [acct_users, 3] End HostPartitionEqual Share
Equal share balances resource usage equally between users.
Configure equal share
- To configure equal share, use the keyword
default
to define an equal share for every user.Begin HostPartition HPART_NAME = equal_share_partition HOSTS = all USER_SHARES = [default, 1] End HostPartitionPriority user and static priority fairshare
There are two ways to configure fairshare so that a more important user's job always overrides the job of a less important user, regardless of resource use.
- Priority User Fairshare: Dynamic priority is calculated as usual, but more important and less important users are assigned a drastically different number of shares, so that resource use has virtually no effect on the dynamic priority: the user with the overwhelming majority of shares always goes first. However, if two users have a similar or equal number of shares, their resource use still determines which of them goes first. This is useful for isolating a group of high-priority or low-priority users, while allowing other fairshare policies to operate as usual most of the time.
- Static Priority Fairshare: Dynamic priority is no longer dynamic, because resource use is ignored. The user with the most shares always goes first. This is useful to configure multiple users in a descending order of priority.
Configure priority user fairshare
A queue is shared by key users and other users.
Priority user fairshare gives priority to important users, so their jobs override the jobs of other users. You can still use fairshare policies to balance resources among each group of users.
If two users compete for resources, and one of them is a priority user, the priority user's job always runs first.
- Define a user group for priority users in
lsb.users
, naming it accordingly.For example,
key_users
.- Configure fairshare and assign the overwhelming majority of shares to the key users:
Begin Queue QUEUE_NAME = production FAIRSHARE = USER_SHARES[[key_users@, 2000] [others, 1]] ... End QueueIn the above example, key users have 2000 shares each, while other users together have only 1 share. This makes it virtually impossible for other users' jobs to get dispatched unless none of the users in the
key_users
group has jobs waiting to run.If you want the same fairshare policy to apply to jobs from all queues, configure host partition fairshare in a similar way.
Configure static priority fairshare
Static priority fairshare assigns resources to the user with the most shares. Resource usage is ignored.
- To implement static priority fairshare, edit lsb.params and set all the weighting factors used in the dynamic priority formula to 0 (zero).
- Set CPU_TIME_FACTOR to 0
- Set RUN_TIME_FACTOR to 0
- Set RUN_JOB_FACTOR to 0
- Set COMMITTED_RUN_TIME_FACTOR to 0
- Set FAIRSHARE_ADJUSTMENT_FACTOR to 0
The results is: dynamic priority = number_shares / 0.01 (if the denominator in the dynamic proiority calculation is less than 0.01, LSF rounds up to 0.01)
If two users compete for resources, the most important user's job always runs first.
Resizable jobs and fairshare
Resizable jobs submitting into fairshare queues or host partitions are subject to fairshare scheduling policies. The dynamic priority of the user who submitted the job is the most important criterion. LSF treats pending resize allocation requests as a regular job and enforces the fairshare user priority policy to schedule them.
The dynamic priority of users depends on:
- Their share assignment
- The slots their jobs are currently consuming
- The resources their jobs consumed in the past
- The adjustment made by the fairshare plugin (libfairshareadjust.*)
Resizable job allocation changes affect the user priority calculation if the
RUN_JOB_FACTOR
orFAIRSHARE_ADJUSTMENT_FACTOR
is greater than zero (0). Resize add requests increase number of slots in use and decrease user priority. Resize release requests decrease number of slots in use, and increase user priority. The faster a resizable job grows, the lower the user priority is, the less likely a pending allocation request can get more slots.
note:
The effect of resizable job allocation changes when the Fairshare_adjustment_factor is greater than 0 depends on the user-defined fairshare adjustment plugin (libfairshareadjust.*).After job allocation changes, bqueues and
bhpart
displays updated user priority.
Platform Computing Inc.
www.platform.com |
Knowledge Center Contents Previous Next Index |