Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



MultiCluster Resource Leasing Model


The resource leasing model was developed to be transparent to the user.

In this section

Configuring the Provider Cluster:

Configuring of the Consumer Cluster:

Special Considerations under the Lease Model

[ Top ]


Overview of Lease Model

Two clusters agree that one cluster will borrow resources from the other, taking control of the resources. Both clusters must change their configuration to make this possible, and the arrangement, called a "lease", does not expire, although it might change due to changes in the cluster configuration.

With this model, scheduling of jobs is always done by a single cluster. When a queue is configured to run jobs on borrowed hosts, LSF schedules jobs as if the borrowed hosts actually belonged to the cluster.

How the lease model works

  1. Setup:
    • A resource provider cluster "exports" hosts, and specifies the clusters that will use the resources on these hosts.
    • A resource consumer cluster configures a queue with a host list that includes the borrowed hosts.
  2. Establishing a lease:
    • To establish a lease,
      1. Configure two clusters properly (the provider cluster must export the resources, and the consumer cluster must have a queue that requests remote resources).
      2. Start up the clusters.
      3. In the consumer cluster, submit jobs to the queue that requests remote resources.

      At this point, a lease is established that gives the consumer cluster control of the remote resources.

    • If the provider did not export the resources requested by the consumer, there is no lease. The provider continues to use its own resources as usual, and the consumer cannot use any resources from the provider.
    • If the consumer did not request the resources exported to it, there is no lease. However, when entire hosts are exported the provider cannot use resources that it has exported, so neither cluster can use the resources; they will be wasted.
  3. Changes to the lease:
    • The lease does not expire. To modify or cancel the lease, you should change the export policy in the provider cluster.
    • If you export a group of workstations allowing LSF to automatically select the hosts for you, these hosts do not change until the lease is modified. However, if the original lease could not include the requested number of hosts, LSF can automatically update the lease to add hosts that become available later on.
    • If the configuration changes and some resources are no longer exported, jobs from the consumer cluster that have already started to run using those resources will be killed and requeued automatically.

      If LSF selects the hosts to export, and the new export policy allows some of the same hosts to be exported again, then LSF tries to re-export the hosts that already have jobs from the consumer cluster running on them (in this case, the jobs continue running without interruption). If LSF has to kill some jobs from the consumer cluster to remove some hosts from the lease, it selects the hosts according to job run time, so it kills the most recently started jobs.

[ Top ]


Using the Lease Model

Submit jobs

LSF will automatically schedule jobs on the available resources, so jobs submitted to a queue that uses borrowed hosts can automatically use the borrowed resources.

bsub

To submit a job and request a particular host borrowed from another cluster, use the format host_name@cluster_name to specify the host. For example, to run a job on hostA in cluster4:

bsub -q myqueue -m hostA@cluster4 myjob

This will not work when you first start up the MultiCluster grid; the remote host names are not recognized until the lease has been established.

bmod

The bmod syntax also allows you to specify borrowed hosts in the same format host_name@cluster_name.

Administration

badmin

The administrator of the consumer cluster can open and close borrowed hosts using badmin. Use the format host_name@cluster_name to specify the borrowed host. This action only affects scheduling on the job slots that belong to that consumer cluster. For example, if slots on a host are shared among multiple consumers, one consumer can close the host, but the others will not be affected or be aware of any change.

You must be the administrator of the provider cluster to shut down or start up a host. This action will affect the consumer cluster as well.

Host groups or host partitions

When you define a host group in lsb.hosts, or a host partition, you can use the keyword allremote to indicate all borrowed hosts available to the cluster. You cannot define a host group that includes borrowed hosts specified by host name or cluster name.

Compute units

Compute units defined in lsb.hosts can use wild cards to include the names of borrowed hosts available to the cluster. You cannot define a host group that includes borrowed hosts specified by host name or cluster name directly.

Hosts running LSF 7 Update 4 or earlier cannot satisfy compute unit resource requirements, and thus cannot be included in compute units.

Automatic retry limits

The pre-execution command retry limit (MAX_PREEXEC_RETRY and REMOTE_MAX_PREEXEC_RETRY), job requeue limit (MAX_JOB_REQUEUE), and job preemption retry limit (MAX_JOB_PREEMPT) configured in lsb.params, lsb.queues, and lsb.applications apply to jobs running on remote leased hosts as if they are running on local hosts

Tracking

bhosts

By default, bhosts only shows information about hosts and resources that are available to the local cluster and information about jobs that are scheduled by the local cluster. Therefore, borrowed resources are included in the summary, but exported resources are not normally included (the exception is reclaimed resources, which are shown during the times that they are available to the local cluster).

For borrowed resources, the host name is displayed in the format host_name@cluster_name. The number of job slots shown is the number available to the consumer cluster, the JL/U and host status shown is determined by the consumer cluster, and the status shown is relative to the consumer cluster. For example, the consumer might see closed or closed_Full status, while the provider sees ok status.

bhosts -e

This option displays information about the exported resources. The provider cluster does not display JL/U or host status; this status information is determined by the consumer cluster and does not affect the provider.

bhosts -e -s

This option displays information about exported shared resources.

bjobs

The bjobs command shows all jobs associated with hosts in the cluster, including MultiCluster jobs. Jobs from remote clusters can be identified by the FROM_HOST column, which shows the remote cluster name and the submission or consumer cluster job ID in the format host_name@remote_cluster_name:remote_job_ID.

If the MultiCluster job is running under the job forwarding model, the QUEUE column shows a local queue, but if the MultiCluster job is running under the resource leasing model, the name of the remote queue is shown in the format queue_name@remote_cluster_name.

Use -w or -l to prevent the MultiCluster information from being truncated.

bclusters

For the resource leasing model, bclusters shows information about each lease.

[ Top ]


Resource Exporting

lsb.resources file

The lsb.resources file contains MultiCluster configuration information for the lease model, including the export policies which describe the hosts and resources that are exported, and the clusters that can use them.

You must reconfigure the cluster to make the configuration take effect.

Resources that can be exported

Job slots

To export resources, you must always export job slots on hosts, so that the consumer cluster can start jobs on the borrowed hosts.

Additional host-based resources

By default, all the jobs on a host compete for its resources. To help share resources fairly when a host's job slots are divided among multiple clusters, you can export quantities of memory and swap space, also for the use of the consumer cluster.

Shared resources

By default, shared resources such as software licenses are not exported. You can create a separate policy to export these resources.

Who can use exported resources

The export policy defines the consumers of exported resources. By default, resources that are exported can be used by the provider; this applies to job slots on a host and also to resources like memory.

With resource reclaim, exported job slots can be reclaimed by the provider if the consumer is not using them to run jobs. In this way, the provider can share in the use of the exported job slots. For more information, see Shared Lease.

[ Top ]


Creating an Export Policy

An export policy defined in lsb.resources is enclosed by the lines:

Begin HostExport
...
End HostExport

In each policy, you must specify which hosts to export, how many job slots, and distribution of resources. Optionally, you can specify quantities of memory and swap space.


To export hosts of HostExport Type==DLINUX, specifying swap space is mandatory. See Exporting Other Resources.

Configure as many different export policies as you need.

Each export policy corresponds to a separate lease agreement.

Export policy examples

This simple export policy exports a single job slot on a single host to a single consumer cluster:

Begin HostExport
PER_HOST=HostA
SLOTS=1
DISTRIBUTION=([Cluster5, 1])
End HostExport

This simple policy exports all the resources on a single Linux host to a single consumer cluster:

Begin HostExport
RES_SELECT=type==LINUX
NHOSTS=1
DISTRIBUTION=([Cluster5, 1])
End HostExport

Exporting hosts

To export resources such as job slots or other resources, you must specify which hosts the resources are located on. There are two ways to specify which hosts you want to export: you can list host names, or you can specify resource requirements and let LSF find hosts that match those resource requirements. The method you use to specify the exported hosts determines the method that LSF uses to share the hosts among competing consumer clusters.

Exporting a large number of hosts

If you have a group of similar hosts, you can share a portion of these hosts with other clusters. To choose this method, let LSF automatically select the hosts to export. The group of hosts can be shared among multiple consumer clusters, but each host is leased to only one consumer cluster, and all the job slots on the host are exported to the consumer.

See Exporting Workstations.

Sharing a large computer

You can share a powerful multiprocessor host among multiple clusters. To choose this method, export one or more hosts by name and specify the number of job slots to export. The exported job slots on each host are divided among multiple consumer clusters.

See Exporting Special Hosts.

Distributing exported resources

An export policy exports specific resources. The distribution statement in lsb.resources partitions these resources, assigning a certain amount exclusively to each consumer cluster. Clusters that are not named in the distribution list do not get to use any of the resources exported by the policy.

The simplest distribution policy assigns all of the exported resources to a single consumer cluster:

DISTRIBUTION=([Cluster5, 1])

Distribution list syntax

The syntax for the distribution list is a series of share assignments. Enclose each share assignment in square brackets, as shown, and use a space to separate multiple share assignments. Enclose the full list in parentheses:

DISTRIBUTION=([share_assignment]...)

Share assignment syntax

The share assignment determines what fraction of the total resources is assigned to each cluster.

The syntax of each share assignment is the cluster name, a comma, and the number of shares.

[cluster_name, number_shares]

Examples

[ Top ]


Exporting Workstations

These steps describe the way to share part of a large farm of identical hosts. This is most useful for reallocating resources among different departments, to meet a temporary need for more processing power.

  1. Create the new policy.
  2. Specify the hosts that are affected by the policy. Each host is entirely exported; the provider cluster does not save any job slots on the exported hosts for its own use. See Allowing LSF to select the hosts you want to export.
  3. Specify the distribution policy. This determines which clusters share in the use of the exported job slots. See Distribution policy for automatically selected hosts.
  4. Optional. Share additional resources (any combination of memory, swap space, or shared resources). See

Allowing LSF to select the hosts you want to export

To export a set of hosts that meet certain resource requirements, specify both RES_SELECT and NHOSTS in lsb.resources.

For RES_SELECT, specify the selection criteria using the same syntax as the "select" part of the resource requirement string (normally used in the LSF bsub command). For details about resource selection syntax, see Administering Platform LSF. For this parameter, if you do not specify the required host type, the default is "type==any".

For NHOSTS, specify a maximum number of hosts to export.

Begin HostExport
RES_SELECT=type==LINUX
NHOSTS=4

In this example, we want to export 4 Linux hosts. If the cluster has 5 Linux hosts available, 4 are exported, and the last one is not exported. If the cluster has only 3 Linux hosts available at this time, then only 3 hosts are exported, but LSF can update the lease automatically if another host becomes available to export later on.

Use lshosts to view the host types that are available in your cluster.

Distribution policy for automatically selected hosts

For syntax of the distribution policy, see Distributing exported resources.

When you export hosts by specifying the resource selection statement, multiple hosts are divided among multiple consumer clusters, but each host is entirely exported to a single consumer cluster. All the job slots on a host are exported to the consumer cluster, along with all its other host-based resources including swap space and memory.

Example

Begin HostExport
RES_SELECT=type==LINUX
NHOSTS=2
DISTRIBUTION=([C1, 1] [C2, 1])
End HostExport

In this example, 2 hosts that match the resource requirements are selected, suppose they are HostA and HostB, and each has 2 job slots. All job slots on each host are exported. Resources are shared evenly among 2 clusters, each cluster gets 1/2 of the resources.

Since the hosts are automatically selected, the hosts are distributed to only one consumer cluster, so the first host, HostA, goes to Cluster1, and the second host, HostB, goes to Cluster2. Assume each host has 2 job slots for use by the consumer cluster. Cluster1 gets 2 job slots on HostA, and Cluster2 gets 2 job slots on HostB.

In this example there is an even distribution policy, but it is still possible for one consumer cluster to get more resources than the other, if the exported hosts are not all identical.

[ Top ]


Exporting Special Hosts

These steps describe the way to share a large multiprocessor host among multiple clusters. This is most useful for allowing separate departments to share the cost and use of a very powerful host. It might also be used to allow multiple clusters occasional access to a host that has some unique feature.

  1. Create the new policy.
  2. Specify the hosts that are affected by the policy. See Naming the hosts you want to export.
  3. Specify how many job slots you want to export from each host. Optionally, reduce the number of job slots available to the local cluster by the same amount. See Controlling job slots.
  4. Specify the distribution policy. This determines which clusters share in the use of the exported job slots. See Distribution policy for named hosts.
  5. Optional. Share additional resources (any combination of memory, swap space, or shared resources). See

Naming the hosts you want to export

Specify the name of a host in the PER_HOST parameter in lsb.resources:

Begin HostExport
PER_HOST=HostA

If you specify multiple hosts, this policy will apply to all the hosts you specify:

Begin HostExport
PER_HOST=HostA HostB HostC

Controlling job slots

Use the SLOTS parameter to specify the number of job slots to export from each host.

By default, the provider can still run the usual number of jobs at all times. The additional jobs that the consumer clusters are allowed to start might overload the host. If you are concerned with keeping the host's performance consistent, reduce the job slot configuration in the local cluster to compensate for the number of slots exported to remote clusters.

For example, this policy exports 4 job slots on each host:

Begin HostExport
PER_HOST=HostA HostB
SLOTS=4

Distribution policy for named hosts

For syntax of the distribution policy, see Distributing exported resources.

When you export hosts by specifying host names, the job slots on each host are divided among multiple consumer clusters, so each cluster gets a part of each host.

Example

Begin HostExport
PER_HOST=HostA HostB
SLOTS=2
DISTRIBUTION=([C1, 1] [C2, 1])
End HostExport

In this example, 2 job slots are exported from HostA and HostB. Resources are shared evenly among 2 clusters, so each cluster is entitled to 1/2 of the resources.

Because the hosts are specified by name, the distribution policy is applied at the job slot level. The first job slot on HostA goes to Cluster1, and the second job slot on HostA goes to Cluster2. Similarly, one job slot on HostB goes to Cluster1, and the other job slot on HostB goes to Cluster2. Each consumer cluster can start 2 jobs, one on HostA, and one on HostB.

The provider cluster can always use the number of job slots that are configured in the provider cluster (no matter how many slots are exported). You might want to adjust the configuration of the provider cluster after exporting hosts and reduce the number of job slots (MXJ in lsb.hosts); otherwise, you might notice a difference in performance because of the extra jobs that can be started by the consumer clusters.

[ Top ]


Exporting Other Resources

Once you have exported a host, you can export memory and swap space in addition to job slots.

By default, the consumer cluster borrows a job slot but is not guaranteed that there will be free memory or swap space, because all jobs on the host compete for the host's resources. If these resources are exported, each consumer cluster schedules work as if only the exported amount is available (the exported amount acts a limit for the consumer cluster), and the provider cluster can no longer use the amount that has been exported.


To export hosts of HostExport Type==DLINUX, exporting swap space is mandatory. If you do not specify swap space, the hosts of this host type are filtered because the resource is seen as unavailable

Exporting memory

To export memory, set MEM in lsb.resources host export policy, and specify the number of MB per host:

Exporting swap space

To export swap space, set SWP in lsb.resources host export policy, and specify the number of MB per host:

[ Top ]


Exporting Shared Resources

In addition to job slots and some other built-in resources, it is possible to export numeric shared resources (for example, representing software application licenses). The resource definitions in lsf.shared must be the same in both clusters.

Export policies for shared resources are defined in lsb.resources, after export policies for hosts. The configuration is different--shared resources are not exported per host.

When you export a shared resource to a consumer cluster, you must already have a host export policy that exports hosts to the same consumer cluster, and the shared resource must be available on one or more of those exported hosts. Otherwise, the export policy does not have any effect.

Configure shared resource export

In lsb.resources, configure a resource export policy for each resource as shown:

Begin SharedResourceExport
NAME         = AppX
NINSTANCES   = 10 
DISTRIBUTION = ([C1, 30] [C2, 70])
End SharedResourceExport

In each policy, you specify one shared numeric resource (here, a license for ApplicationX), the maximum number of these you want to export, and distribution, using the same syntax as a host export policy. See Distributing exported resources.

If some quantity of the resource is available, but not the full amount you configured, LSF exports as many instances of the resource as are available to the exported hosts.

[ Top ]


Shared Lease

Optional.

You can export resources from a cluster and enable shared lease, which allows the provider cluster to share in the use of the exported resources. This type of lease dynamically balances the job slots according to the load in each cluster.

Only job slots will be shared. If you export memory, swap space, and shared resources, they become available to the consumer cluster exclusively.

About shared lease

By default, exported resources are for the exclusive use of the consumer, they cannot be used by the provider. If they are not being used by the consumer, they are wasted.

There is a way to lease job slots to a cluster part-time. With shared lease, both provider and consumer clusters can have the opportunity to take any idle job slots. The benefit of the shared lease is that the provider cluster has a chance to share in the use of its exported resources, so the average resource usage is increased.

Shared lease is not compatible with advance reservation.

If you enable shared leasing, each host can only be exported to a single consumer cluster. Therefore, when shared leasing is enabled, you can export a group of workstations to multiple consumers using RES_SELECT syntax, but you cannot share a powerful multiprocessor host among multiple consumer clusters using PER_HOST syntax unless the distribution policy specifies just one cluster.

How it works

By default, a lease is exclusive, which means a fixed amount of exported resources is always dedicated exclusively to a consumer cluster. However, if you configure leases to be shared, the job slots exported by each export policy can also become available to the provider cluster.

Reclaimable resources are job slots that are exported with shared leasing enabled. The reclaim process is managed separately for each lease, so the set of job slots exported by one resource export policy to one consumer cluster is managed as a group.

When the provider cluster is started, the job slots are allocated to the provider cluster, except for one that is reserved for the consumer cluster, to allow a lease to be made. Therefore, all but one slot is initially available to the provider cluster, and one slot could be available to the consumer. The lease is made when the consumer schedules a job to run on the single job slot that is initially available to it.

To make job slots available to a different cluster, LSF automatically modifies the lease contract. The lease will go through a temporary "inactive" phase each time. When a lease is updated, the slots controlled by the corresponding export policy are distributed as follows: the slots that are being used to run jobs remain under the control of the cluster that is using them, but the slots that are idle are all made available to just one cluster.

To determine which cluster will reclaim the idle slots each time, LSF considers the number of idle job slots in each cluster:

idle_slots_provider = available_slots_provider - 
used_slots_provider 
idle_slots_consumer = available_slots_consumer - 
used_slots_consumer 

The action depends on the relative quantity of idle slots in each cluster.

LSF evaluates the status at regular intervals, specified by MC_RECLAIM_DELAY in lsb.params.

The calculations are performed separately for each set of reclaimable resources, so if a provider cluster has multiple resource export policies, some leases could be reconfigured in favor of the provider while others get reconfigured in favor of the consumer.

Configure shared leasing

Enable shared leasing

To make a shared lease, set TYPE=shared in the resource export policy (lsb.resources HostExport section). Remember that each resource export policy using PER_HOST syntax must specify just one cluster in the distribution policy, if the lease is shared.

Begin HostExport
PER_HOST=HostA
SLOTS=4
TYPE=shared
DISTRIBUTION=([C1, 1])
End HostExport

In this example, HostA is exported with shared leasing enabled, so the lease can be reconfigured at regular intervals, allowing LSF to give any idle job slots to the cluster that needs them the most.

Configure reclaim interval

Optional. To set the reclaim interval, set MC_RECLAIM_DELAY in lsb.params and specify how often to reconfigure a shared lease, in minutes. The interval is the same for every lease in the cluster.

The default interval is 10 minutes.

[ Top ]


Borrowing Resources

Default queues

When you add new hosts to a single LSF cluster, you might need to update your queues to start sending work to the new hosts. This is often not necessary, because queues with the default configuration can use all hosts in the local cluster.

However, when a MultiCluster provider cluster exports resources to a consumer cluster, the default queue configuration does not allow the consumer cluster to use those resources. You must update your queue configuration to start using the borrowed resources.

Queues that use borrowed hosts

By default, LSF queues only use hosts that belong to the submission cluster. Queues can use borrowed resources when they are configured to use borrowed hosts (and the provider cluster's export policy must be compatible).

Queues for parallel jobs

If your clusters do not have a shared file system, then parallel jobs that require a common file space could fail if they span multiple clusters. One way to prevent this is to submit these jobs to a queue that uses hosts all from one cluster (for example, configure the queue to use local hosts or borrowed hosts, but not both).

Configure a queue to use borrowed resources

To configure a queue to use borrowed resources, edit lsb.queues HOSTS parameter and specify the hosts you want to borrow from one or more other clusters.

all and allremote

Preference

You can specify preference levels for borrowed resources, as well as for local resources. If your clusters do not have a common file system, the extra overhead of file transfer between clusters can affect performance, if a job involves large files. In this case, you should give preference to local hosts.

HOSTS = all+1 allremote

[ Top ]


Running Parallel Jobs with the Lease Model

To run parallel jobs (specifying multiple processors with bsub -n) across clusters, you must configure the RemoteClusters list in each cluster. By default, this list is not configured. For more information on running parallel jobs, see Administering Platform LSF.

  1. If you do not already have a RemoteClusters list, create the RemoteClusters list and include the names of all remote clusters (the same list as lsf.shared). This enables proper communication among all clusters, and enables cross-cluster parallel jobs for all clusters.
  2. If you have a RemoteClusters list, and you do not want to run parallel jobs on resources from all provider clusters, configure the RECV_FROM column in lsf.cluster.cluster_name.
    • Specify "N" to exclude a remote cluster (LSF will not start parallel jobs on resources that belong to the remote cluster).
    • Specify "Y" to enable resource-sharing for parallel jobs. This is the default.

[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: March 13, 2009
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.