The lsb.hosts file contains host-related configuration information for the server hosts in the cluster. It is also used to define host groups, host partitions, and compute units.
This file is optional. All sections are optional.
By default, this file is installed in LSB_CONFDIR/cluster_name/configdir.
Optional. Defines the hosts, host types, and host models used as server hosts, and contains per-host configuration information. If this section is not configured, LSF uses all hosts in the cluster (the hosts listed in lsf.cluster.cluster_name) as server hosts.
The entries in a line for a host override the entries in a line for its model or type.
When you modify the cluster by adding or removing hosts, no changes are made to lsb.hosts. This does not affect the default configuration, but if hosts, host models, or host types are specified in this file, you should check this file whenever you make changes to the cluster and update it manually if necessary.
The first line consists of keywords identifying the load indices that you wish to configure on a per-host basis. The keyword HOST_NAME must be used; the others are optional. Load indices not listed on the keyword line do not affect scheduling decisions.
Each subsequent line describes the configuration information for one host, host model or host type. Each line must contain one entry for each keyword. Use empty parentheses ( ) or a dash (-) to specify the default value for an entry.
Specifies a threshold for exited jobs. Specify a number of jobs. If the number of jobs that exit over a period of time specified by JOB_EXIT_RATE_DURATION in lsb.params (5 minutes by default) exceeds the number of jobs you specify as the threshold in this parameter, LSF invokes LSF_SERVERDIR/eadmin to trigger a host exception.
EXIT_RATE for a specific host overrides a default GLOBAL_EXIT_RATE specified in lsb.params.
Enables automatic job migration and specifies the migration threshold for checkpointable or rerunnable jobs, in minutes.
LSF automatically migrates jobs that have been in the SSUSP state for more than the specified number of minutes. Specify a value of 0 to migrate jobs immediately upon suspension. The migration threshold applies to all jobs running on the host.
Job-level command line migration threshold overrides threshold configuration in application profile and queue. Application profile configuration overrides queue level configuration. When a host migration threshold is specified, and is lower than the value for the job, the queue, or the application, the host value is used.
Does not affect MultiCluster jobs that are forwarded to a remote cluster.
The number of job slots on the host.
With MultiCluster resource leasing model, this is the number of job slots on the host that are available to the local cluster.
Use “!” to make the number of job slots equal to the number of CPUs on a host.
For the reserved host name default, “!” makes the number of job slots equal to the number of CPUs on all hosts in the cluster not otherwise referenced in the section.
By default, the number of running and suspended jobs on a host cannot exceed the number of job slots. If preemptive scheduling is used, the suspended jobs are not counted as using a job slot.
On multiprocessor hosts, to fully use the CPU resource, make the number of job slots equal to or greater than the number of processors.
Specify io, it, ls, mem, pg, r15s, r1m, r15m, swp, tmp, ut, or a non-shared custom external load index as a column. Specify multiple columns to configure thresholds for multiple load indices.
Scheduling and suspending thresholds for dynamic load indices supported by LIM, including external load indices.
Each load index column must contain either the default entry or two numbers separated by a slash ‘/’, with no white space. The first number is the scheduling threshold for the load index; the second number is the suspending threshold.
Queue-level scheduling and suspending thresholds are defined in lsb.queues. If both files specify thresholds for an index, those that apply are the most restrictive ones.
Begin Host
HOST_NAME MXJ JL/U r1m pg DISPATCH_WINDOW
hostA 1 - 0.6/1.6 10/20 (5:19:00-1:8:30 20:00-8:30)
SUNSOL 1 - 0.5/2.5 - 23:00-8:00
default 2 1 0.6/1.6 20/40 ()
End Host
SUNSOL is a host type defined in lsf.shared. This example Host section configures one host and one host type explicitly and configures default values for all other load-sharing hosts.
HostA runs one batch job at a time. A job will only be started on hostA if the r1m index is below 0.6 and the pg index is below 10; the running job is stopped if the r1m index goes above 1.6 or the pg index goes above 20. HostA only accepts batch jobs from 19:00 on Friday evening until 8:30 Monday morning and overnight from 20:00 to 8:30 on all other days.
For hosts of type SUNSOL, the pg index does not have host-specific thresholds and such hosts are only available overnight from 23:00 to 8:00.
The entry with host name default applies to each of the other hosts in the cluster. Each host can run up to two jobs at the same time, with at most one job from each user. These hosts are available to run jobs at all times. Jobs may be started if the r1m index is below 0.6 and the pg index is below 20, and a job from the lowest priority queue is suspended if r1m goes above 1.6 or pg goes above 40.
Optional. Defines host groups.
The name of the host group can then be used in other host group, host partition, and queue definitions, as well as on the command line. Specifying the name of a host group has exactly the same effect as listing the names of all the hosts in the group.
Host groups are specified in the same format as user groups in lsb.users.
The first line consists of two mandatory keywords, GROUP_NAME and GROUP_MEMBER, as well as optional keywords, CONDENSE and GROUP_ADMIN. Subsequent lines name a group and list its membership.
The sum of all host groups, compute groups, and host partitions cannot be more than 1024.
A space-delimited list of host names or previously defined host group names, enclosed in one pair of parentheses.
You cannot use more than one pair of parentheses to define the list.
The names of hosts and host groups can appear on multiple lines because hosts can belong to multiple groups. The reserved name all specifies all hosts in the cluster. An exclamation mark (!) indicates an externally-defined host group, which the egroup executable retrieves.
You can use string literals and special characters when defining host group members. Each entry cannot contain any spaces, as the list itself is space delimited.
When a leased-in host joins the cluster, the host name is in the form of host@cluster. For these hosts, only the host part of the host name is subject to pattern definitions.
Use a tilde (~) to exclude specified hosts or host groups from the list.
Use an asterisk (*) as a wildcard character to represent any number of characters.
Use square brackets with a hyphen ([integer1 - integer2]) to define a range of non-negative integers at the end of a host name. The first integer must be less than the second integer.
Use square brackets with commas ([integer1, integer2 ...]) to define individual non-negative integers at the end of a host name.
Use square brackets with commas and hyphens (for example, [integer1 - integer2, integer3, integer4 - integer5]) to define different ranges of non-negative integers at the end of a host name.
Host group administrators have the ability to open or close the member hosts for the group they are administering.
the GROUP_ADMIN field is a space-delimited list of user names or previously defined user group names, enclosed in one pair of parentheses.
You cannot use more than one pair of parentheses to define the list.
The names of users and user groups can appear on multiple lines because users can belong to and administer multiple groups.
Host group administrator rights are inherited. For example, if the user admin2 is an administrator for host group hg1 and host group hg2 is a member of hg1, admin2 is also an administrator for host group hg2.
When host group administrators (who are not also cluster administrators) open or close a host, they must specify a comment with the -C option.
Begin HostGroup
GROUP_NAME GROUP_MEMBER GROUP_ADMIN
groupA (hostA hostD) (user1 user10)
groupB (hostF groupA hostK) ()
groupC (!) ()
End HostGroup
groupA includes hostA and hostD and can be administered by user1 and user10.
groupB includes hostF and hostK, along with all hosts in groupA. It has no administrators (only the cluster administrator can control the member hosts).
The group membership of groupC is defined externally and retrieved by the egroup executable.
Begin HostGroup
GROUP_NAME GROUP_MEMBER GROUP_ADMIN
groupA (all) ()
groupB (groupA ~hostA ~hostB) (user11 user14)
groupC (hostX hostY hostZ) ()
groupD (groupC ~hostX) usergroupB
groupE (all ~groupC ~hostB) ()
groupF (hostF groupC hostK) ()
End HostGroup
groupA contains all hosts in the cluster and is administered by the cluster administrator.
groupB contains all the hosts in the cluster except for hostA and hostB and is administered by user11 and user14.
groupC contains only hostX, hostY, and hostZ and is administered by the cluster administrator.
groupD contains the hosts in groupC except for hostX. Note that hostX must be a member of host group groupC to be excluded from groupD. usergroupB is the administrator for groupD.
groupE contains all hosts in the cluster excluding the hosts in groupC and hostB and is administered by the cluster administrator.
groupF contains hostF, hostK, and the 3 hosts in groupC and is administered by the cluster administrator.
Begin HostGroup
GROUP_NAME CONDENSE GROUP_MEMBER GROUP_ADMIN
groupA N (all) ()
groupB N (hostA, hostB) (usergroupC user1)
groupC Y (all)()
End HostGroup
groupA shows uncondensed output and contains all hosts in the cluster and is administered by the cluster administrator.
groupB shows uncondensed output, and contains hostA and hostB. It is administered by all members of usergroupC and user1.
groupC shows condensed output and contains all hosts in the cluster and is administered by the cluster administrator.
Begin HostGroup
GROUP_NAME CONDENSE GROUP_MEMBER GROUP_ADMIN
groupA Y (host*) (user7)
groupB N (*A) ()
groupC N (hostB* ~hostB[1-50]) ()
groupD Y (hostC[1-50] hostC[101-150]) (usergroupJ)
groupE N (hostC[51-100] hostC[151-200]) ()
groupF Y (hostD[1,3] hostD[5-10]) ()
groupG N (hostD[11-50] ~hostD[15,20,25] hostD2) ()
End HostGroup
groupA shows condensed output, and contains all hosts starting with the string host. It is administered by user7.
groupB shows uncondensed output, and contains all hosts ending with the string A, such as hostA and is administered by the cluster administrator.
groupC shows uncondensed output, and contains all hosts starting with the string hostB except for the hosts from hostB1 to hostB50 and is administered by the cluster administrator.
groupD shows condensed output, and contains all hosts from hostC1 to hostC50 and all hosts from hostC101 to hostC150 and is administered by the the members of usergroupJ.
groupE shows uncondensed output, and contains all hosts from hostC51 to hostC100 and all hosts from hostC151 to hostC200 and is administered by the cluster administrator.
groupF shows condensed output, and contains hostD1, hostD3, and all hosts from hostD5 to hostD10 and is administered by the cluster administrator.
groupG shows uncondensed output, and contains all hosts from hostD11 to hostD50 except for hostD15, hostD20, and hostD25. groupG also includes hostD2. It is administered by the cluster administrator.
Optional. Used with host partition user-based fairshare scheduling. Defines a host partition, which defines a user-based fairshare policy at the host level.
Configure multiple sections to define multiple partitions.
The members of a host partition form a host group with the same name as the host partition.
If you configure a host partition, you cannot configure fairshare at the queue level.
Jobs in the queue sometimes may be dispatched to the host partition even though hosts not belonging to any host partition have a lighter load.
If some hosts belong to one host partition and some hosts belong to another, only the priorities of one host partition are used when dispatching a parallel job to hosts from more than one host partition.
If a resource is shared among hosts included in host partitions and hosts that are not included in any host partition, jobs in queues that use the host partitions will always get the shared resource first, regardless of queue priority.
If a resource is shared among host partitions, jobs in queues that use the host partitions listed first in the HostPartition section of lsb.hosts will always have priority to get the shared resource first. To allocate shared resources among host partitions, LSF considers host partitions in the order they are listed in lsb.hosts.
Specifies the hosts in the partition, in a space-separated list.
A host cannot belong to multiple partitions.
Hosts that are not included in any host partition are controlled by the FCFS scheduling policy instead of the fairshare scheduling policy.
Optionally, use the reserved host name all to configure a single partition that applies to all hosts in a cluster.
Optionally, use the not operator (~) to exclude hosts or host groups from the list of hosts in the host partition.
Enclose each user share assignment in square brackets, as shown.
Separate a list of multiple share assignments with a space between the square brackets.
To a single user (specify user_name). To specify a Windows user account, include the domain name in uppercase letters (DOMAIN_NAME\user_name).
To users in a group, individually (specify group_name@) or collectively (specify group_name). To specify a Windows user group, include the domain name in uppercase letters (DOMAIN_NAME\group_name).
To users not included in any other share assignment, individually (specify the keyword default) or collectively (specify the keyword others).
By default, when resources are assigned collectively to a group, the group members compete for the resources according to FCFS scheduling. You can use hierarchical fairshare to further divide the shares among the group members.
Specify a positive integer representing the number of shares of the cluster resources assigned to the user.
The number of shares assigned to each user is only meaningful when you compare it to the shares assigned to other users or to the total number of shares. The total number of shares is just the sum of all the shares assigned in each share assignment.
Optional. Defines compute units.
Once defined, the compute unit can be used in other compute unit and queue definitions, as well as in the command line. Specifying the name of a compute unit has the same effect as listing the names of all the hosts in the compute unit.
Compute units are similar to host groups, with the added feature of granularity allowing the construction of structures that mimic the network architecture. Job scheduling using compute unit resource requirements effectively spreads jobs over the cluster based on the configured compute units.
To enforce consistency, compute unit configuration has the following requirements:
Compute units are specified in the same format as host groups in lsb.hosts.
The first line consists of three mandatory keywords, NAME, MEMBER, and TYPE, as well as optional keywords CONDENSE and ADMIN. Subsequent lines name a compute unit and list its membership.
The sum of all host groups, compute groups, and host partitions cannot be more than 1024.
A space-delimited list of host names or previously defined compute unit names, enclosed in one pair of parentheses.
You cannot use more than one pair of parentheses to define the list.
The names of hosts and host groups can appear only once, and only in a compute unit type of the finest granularity.
An exclamation mark (!) indicates an externally-defined host group, which the egroup executable retrieves.
You can use string literals and special characters when defining compute unit members. Each entry cannot contain any spaces, as the list itself is space delimited.
Use a tilde (~) to exclude specified hosts or host groups from the list.
Use an asterisk (*) as a wildcard character to represent any number of characters.
Use square brackets with a hyphen ([integer1 - integer2]) to define a range of non-negative integers at the end of a host name. The first integer must be less than the second integer.
Use square brackets with commas ([integer1, integer2...]) to define individual non-negative integers at the end of a host name.
Use square brackets with commas and hyphens (for example, [integer1 - integer2, integer3, integer4 - integer5]) to define different ranges of non-negative integers at the end of a host name.
Compute unit names cannot be used in compute units of the finest granularity.
You cannot include host or host group names except in compute units of the finest granularity.
You must not skip levels of granularity. For example:
If lsb.params contains COMPUTE_UNIT_TYPES=enclosure rack cabinet then a compute unit of type cabinet can contain compute units of type rack, but not of type enclosure.
The keywords all, allremote, all@cluster, other and default cannot be used when defining compute units.
Compute unit administrators have the ability to open or close the member hosts for the compute unit they are administering.
the ADMIN field is a space-delimited list of user names or previously defined user group names, enclosed in one pair of parentheses.
You cannot use more than one pair of parentheses to define the list.
The names of users and user groups can appear on multiple lines because users can belong to and administer multiple compute units.
Compute unit administrator rights are inherited. For example, if the user admin2 is an administrator for compute unit cu1 and compute unit cu2 is a member of cu1, admin2 is also an administrator for compute unit cu2.
When compute unit administrators (who are not also cluster administrators) open or close a host, they must specify a comment with the -C option.
COMPUTE_UNIT_TYPES=enclosure rack cabinet)
Begin ComputeUnit
NAME MEMBER TYPE
encl1 (host1 host2) enclosure
encl2 (host3 host4) enclosure
encl3 (host5 host6) enclosure
encl4 (host7 host8) enclosure
rack1 (encl1 encl2) rack
rack2 (encl3 encl4) rack
cbnt1 (rack1 rack2) cabinet
End ComputeUnit
encl1, encl2, encl3 and encl4 are the finest granularity, and each contain two hosts.
rack1 is of coarser granularity and contains two levels. At the enclosure level rack1 contains encl1 and encl2. At the lowest level rack1 contains host1, host2, host3, and host4.
rack2 has the same structure as rack1, and contains encl3 and encl4.
cbnt1 contains two racks (rack1 and rack2), four enclosures (encl1, encl2, encl3, and encl4) and all eight hosts. Compute unit cbnt1 is the coarsest granularity in this example.
Begin ComputeUnit
NAME CONDENSE MEMBER TYPE ADMIN
encl1 Y (hg123 ~hostA ~hostB) enclosure (user11 user14)
encl2 Y (hg456) enclosure ()
encl3 N (hostA hostB) enclosure usergroupB
encl4 N (hgroupX ~hostB) enclosure ()
encl5 Y (hostC* ~hostC[101-150]) enclosure usergroupJ
encl6 N (hostC[101-150]) enclosure ()
rack1 Y (encl1 encl2 encl3) rack ()
rack2 N (encl4 encl5) rack usergroupJ
rack3 N (encl6) rack ()
cbnt1 Y (rack1 rack2) cabinet ()
cbnt2 N (rack3) cabinet user14
End ComputeUnit
All six enclosures (finest granularity) contain only hosts and host groups. All three racks contain only enclosures. Both cabinets (coarsest granularity) contain only racks.
encl1 contains all the hosts in host group hg123 except for hostA and hostB and is administered by user11 and user14. Note that hostA and hostB must be members of host group hg123 to be excluded from encl1. encl1 shows condensed output.
encl2 contains host group hg456 and is administered by the cluster administrator. encl2 shows condensed output.
encl3 contains hostA and hostB. usergroupB is the administrator for encl3. encl3 shows uncondensed output.
encl4 contains host group hgroupX except for hostB. Since each host can appear in only one enclosure and hostB is already in encl3, it cannot be in encl4. encl4 is administered by the cluster administrator. encl4 shows uncondensed output.
encl5 contains all hosts starting with the string hostC except for hosts hostC101 to hostC150, and is administered by usergroupJ. encl5 shows condensed output.
rack1 contains encl1, encl2, and encl3. rack1 shows condensed output.
rack2 contains encl4, and encl5. rack2 shows uncondensed output.
cbnt1 contains rack1 and rack2. cbnt1 shows condensed output.
cbnt2 contains rack3. Even though rack3 only contains encl6, cbnt3 cannot contain encl6 directly because that would mean skipping the level associated with compute unit type rack. cbnt2 shows uncondensed output.
Variable configuration is used to automatically change LSF configuration based on time windows. You define automatic configuration changes in lsb.hosts by using if-else constructs and time expressions. After you change the files, reconfigure the cluster with the badmin reconfig command.
The expressions are evaluated by LSF every 10 minutes based on mbatchd start time. When an expression evaluates true, LSF dynamically changes the configuration based on the associated configuration statements. Reconfiguration is done in real time without restarting mbatchd, providing continuous system availability.