Using the lease model

Submit jobs

LSF will automatically schedule jobs on the available resources, so jobs submitted to a queue that uses borrowed hosts can automatically use the borrowed resources.

bsub

To submit a job and request a particular host borrowed from another cluster, use the format host_name@cluster_name to specify the host. For example, to run a job on hostA in cluster4:

bsub -q myqueue -m hostA@cluster4 myjob

This will not work when you first start up the MultiCluster grid; the remote host names are not recognized until the lease has been established.

bmod

The bmod syntax also allows you to specify borrowed hosts in the same format host_name@cluster_name.

Administration

badmin

The administrator of the consumer cluster can open and close borrowed hosts using badmin. Use the format host_name@cluster_name to specify the borrowed host. This action only affects scheduling on the job slots that belong to that consumer cluster. For example, if slots on a host are shared among multiple consumers, one consumer can close the host, but the others will not be affected or be aware of any change.

You must be the administrator of the provider cluster to shut down or start up a host. This action will affect the consumer cluster as well.

Host groups or host partitions

When you define a host group in lsb.hosts, or a host partition, you can use the keyword allremote to indicate all borrowed hosts available to the cluster. You cannot define a host group that includes borrowed hosts specified by host name or cluster name.

Compute units

Compute units defined in lsb.hosts can use wild cards to include the names of borrowed hosts available to the cluster. You cannot define a host group that includes borrowed hosts specified by host name or cluster name directly.

Hosts running LSF 7 Update 4 or earlier cannot satisfy compute unit resource requirements, and thus cannot be included in compute units.

Automatic retry limits

The pre-execution command retry limit (MAX_PREEXEC_RETRY and REMOTE_MAX_PREEXEC_RETRY), job requeue limit (MAX_JOB_REQUEUE), and job preemption retry limit (MAX_JOB_PREEMPT) configured in lsb.params, lsb.queues, and lsb.applications apply to jobs running on remote leased hosts as if they are running on local hosts

Tracking

bhosts

By default, bhosts only shows information about hosts and resources that are available to the local cluster and information about jobs that are scheduled by the local cluster. Therefore, borrowed resources are included in the summary, but exported resources are not normally included (the exception is reclaimed resources, which are shown during the times that they are available to the local cluster).

For borrowed resources, the host name is displayed in the format host_name@cluster_name. The number of job slots shown is the number available to the consumer cluster, the JL/U and host status shown is determined by the consumer cluster, and the status shown is relative to the consumer cluster. For example, the consumer might see closed or closed_Full status, while the provider sees ok status.

  • Cluster1 has borrowed one job slot on hostA. It shows the borrowed host is closed because that job slot is in use by a running job.

    bhosts
    HOST_NAME       STATUS  JL/U  MAX  NJOBS  RUN  SSUSP  USUSP  RSV
    hostA@cluster2  closed    -    1    1      1     0     0     0
  • Cluster2 has kept 3 job slots on hostA for its own use. It shows the host is open, because all the available slots are free.

    bhosts
    HOST_NAME       STATUS  JL/U  MAX  NJOBS  RUN  SSUSP  USUSP  RSV
    hostA            ok      -     3    0      0     0     0     0
bhosts -e

This option displays information about the exported resources. The provider cluster does not display JL/U or host status; this status information is determined by the consumer cluster and does not affect the provider.

bhosts -e -s

This option displays information about exported shared resources.

bjobs

The bjobs command shows all jobs associated with hosts in the cluster, including MultiCluster jobs. Jobs from remote clusters can be identified by the FROM_HOST column, which shows the remote cluster name and the submission or consumer cluster job ID in the format host_name@remote_cluster_name:remote_job_ID.

If the MultiCluster job is running under the job forwarding model, the QUEUE column shows a local queue, but if the MultiCluster job is running under the resource leasing model, the name of the remote queue is shown in the format queue_name@remote_cluster_name.

Use -w or -l to prevent the MultiCluster information from being truncated.

bclusters

For the resource leasing model, bclusters shows information about each lease.

  • Status

    • ok means that the resources are leased and the resources that belong to the provider are being used by the consumer.

    • conn indicates that a connection has been established but the lease has not yet started; probably because the consumer has not yet attempted to use the shared resources. The conn status remains until jobs are submitted, at which point the status changes to ok. If this status persists in a production environment, it could mean that the consumer cluster is not properly configured.

    • disc indicates that there is no connection between the two clusters.

  • Resource flow

    • For resources exported to another cluster, the resource flow direction is “EXPORT”, and the remote cluster specified is the consumer of the resources.

    • For resources borrowed from another cluster, the resource flow direction is IMPORT, and the remote cluster specified is the resource provider.