Remote queue workload job-forwarding scheduler

Enhanced scheduler decisions can be customized to consider characteristics of remote queues before forwarding a job. Remote queue attributes such as queue priority, number of preemptable jobs, and queue workload are sent to the submission scheduler. The decisions made by the scheduler, based on this information, depend on the setting of MC_PLUGIN_SCHEDULE_ENHANCE in lsb.params.

Queue workload and configuration is considered in conjunction with remote resource availability (MC_PLUGIN_REMOTE_RESOURCE=Y is automatically set in lsf.conf).

Tip:

Defining MC_PLUGIN_SCHEDULE_ENHANCE as a valid value, the submission scheduler supports the same remote resources as MC_PLUGIN_REMOTE_RESOURCE: -R "type==type_name", and -R "same[type]"

Remote queue counter collection

The submission cluster receives up-to-date information about each queue in remote clusters. This information is considered during job forwarding decisions.

Queue information is collected by the submission cluster when MC_PLUGIN_SCHEDULE_ENHANCE (on the submission cluster) is set to a valid value. Information is sent by each execution cluster when MC_PLUGIN_UPDATE_INTERVAL (on the execution cluster) is defined, and the submission cluster is collecting queue information.

Some jobs may be forwarded between counter update intervals. The submission scheduler increments locally stored counter information as jobs are forwarded, and reconciles incoming counter updates to account for all jobs.

The following counter information is collected for each queue:

  • queue ID

  • queue priority

  • total slots: The total number of slots (on all hosts) jobs are dispatched to from this queue. This includes slots on hosts with the status ok, and with the status closed due to running jobs.

  • available slots: The free slots, or slots (out of the total slots) which do not currently have a job running.

  • running slots: The number of slots currently running jobs from the queue.

  • pending slots: The number of slots required by jobs pending on the queue.

  • preemptable available slots: The number of slots the queue can access through preemption.

  • preemptable slots

  • [preemptable queue counters](1...n)

    • preemptable queue ID

    • preemptable queue priority

    • preemptable available slots

Note:

After a MultiCluster connection is established, counters take the time set in MC_PLUGIN_UPDATE_INTERVAL to update. Scheduling decisions made before this first interval has passed do not accurately account for remote queue workload.

The parameter MC_PLUGIN_SCHEDULE_ENHANCE was introduced in LSF Version 7 Update 6. All clusters within a MultiCluster configuration must be running a version of LSF containing this parameter to enable the enhanced scheduler.

Remote queue selection

The information considered by the job-forwarding scheduler when accounting for workload and remote resources depends on the setting of MC_PLUGIN_SCHEDULE_ENHANCE in lsb.params. Valid settings for this parameter are:

  • RESOURCE_ONLY

    Jobs are forwarded to the remote queue with the requested resources and the largest (available slots)-(pending slots).

  • COUNT_PREEMPTABLE

    Jobs are forwarded as with RESOURCE_ONLY, but if no appropriate queues have free slots, the best queue is selected based on the largest (preemptable available slots)-(pending slots).

  • COUNT_PREEMPTABLE with HIGH_QUEUE_PRIORITY

    Jobs are forwarded as with COUNT_PREEMPTABLE, but jobs are forwarded to the highest priority remote queue.

  • COUNT_PREEMPTABLE with PREEMPTABLE_QUEUE_PRIORITY

    Jobs are forwarded as with COUNT_PREEMPTABLE, but queue selection is based on which queues can preempt lowest priority queue jobs.

  • COUNT_PREEMPTABLE with PENDING_WHEN_NOSLOTS

    Jobs are forwarded as with COUNT_PREEMPTABLE, but if no queues have free slots even after preemption, submitted jobs pend.

  • COUNT_PREEMPTABLE with HIGH_QUEUE_PRIORITY and PREEMPTABLE_QUEUE_PRIORITY

    If no appropriate queues have free slots, the best queue is selected based on:
    • queues that can preempt lowest priority queue jobs

    • the number of preemptable jobs

    • the pending job workload

  • COUNT_PREEMPTABLE with HIGH_QUEUE_PRIORITY and PENDING_WHEN_NOSLOTS

    If no appropriate queues have free slots, queues with free slots after jobs are preempted are considered.

    If no queues have free slots even after preemption, submitted jobs pend.

  • COUNT_PREEMPTABLE with PREEMPTABLE_QUEUE_PRIORITY and PENDING_WHEN_NOSLOTS

    If no appropriate queues have free slots, queues are considered based on:
    • the most free slots after preempting lowest priority queue jobs and preemptable jobs

    If no queues have free slots even after preemption, submitted jobs pend.

  • COUNT_PREEMPTABLE with HIGH_QUEUE_PRIORITY and PREEMPTABLE_QUEUE_PRIORITY and PENDING_WHEN_NOSLOTS

    If no appropriate queues have free slots, queues are considered based on:
    • the most free slots after preempting lowest priority queue jobs and preemptable jobs

    If no queues have free slots even after preemption, submitted jobs pend.

The figure shown illustrates the scheduler decision-making process for valid settings of MC_PLUGIN_SCHEDULE_ENHANCE.

Note:

When the scheduler looks for maximum values, such as for (available slots)-(pending slots), these values can be negative so long as they are within the pending job limit for a receive-jobs queue set by IMPT_JOBBKLG in lsb.queues.

Figure 1. Scheduler decisions with MC_PLUGIN_SCHEDULE_ENHANCE set in lsb.params.

Limitations

Advance reservation

When an advance reservation is active on a remote cluster, slots within the advance reservation are excluded from the number of available slots. Inactive advance reservations do not affect the number of available slots since the slots may still be available for backfill jobs.

Same boolean resource within hostgroups

Hosts in a hostgroup configured without the required same boolean resources can cause ineffectual job-forwarding decisions from the scheduler.

For example, a job may be forwarded to a queue accessing a hostgroup with many slots available, only some of which have the boolean resource required. If there are not enough slots to run the job it will return to the submission cluster, which may continue forwarding the same job back to the same queue.

Same host type within hostgroups

A remote queue hostgroup satisfies host type requirements when any one of the hosts available is the host type requested by a job. As for boolean resources, the submission cluster assumes all slots within a hostgroup are of the same host type. Other hostgroup configurations can result in unexpected job-forwarding decisions.

Configure remote resource and preemptable job scheduling

Submission cluster scheduler considers whether remote resources exist, and only forwards jobs to a queue with free slots or space in the MultiCluster pending job threshold (IMPT_JOBBKLG).

If no appropriate queues with free slots or space for new pending jobs are found, the best queue is selected based on the number of preemptable jobs and the pending job workload.

  1. In the submission cluster define MC_PLUGIN_SCHEDULE_ENHANCE=COUNT_PREEMPTABLE in lsb.params.
  2. In the execution cluster set MC_PLUGIN_UPDATE_INTERVAL in lsb.params to a non-zero value.
  3. To make the changes take effect in both the submission and execution clusters run the following command:
    badmin reconfig

Configure remote resource and free slot scheduling

Submission cluster scheduler considers whether remote resources exist, and only forwards jobs to a queue with free slots or space in the MultiCluster pending job threshold (IMPT_JOBBKLG).If no appropriate queues with free slots or space for new pending jobs are found, the best queue is selected based on which queues can preempt lower priority jobs.

If no queues have free slots even after preemption, jobs pend on the submission cluster.

  1. In the submission cluster define MC_PLUGIN_SCHEDULE_ENHANCE=COUNT_PREEMPTABLE PENDING_WHEN_NOSLOTS in lsb.params.
  2. In the execution cluster set MC_PLUGIN_UPDATE_INTERVAL in lsb.params to a non-zero value.
  3. To make the changes take effect in both the submission and execution clusters run the following command:
    badmin reconfig

Configure remote resource, preemptable job, and queue priority free slot scheduling

All scheduler options are configured.

Submission cluster scheduler considers whether remote resources exist, and only forwards jobs to a queue with free slots or space in the MultiCluster pending job threshold (IMPT_JOBBKLG).

If no appropriate queues with free slots or space for new pending jobs are found, the best queue is selected based on the number of free slots after preempting low priority jobs and preemptable jobs.

If no queues have free slots even after preemption, jobs pend on the submission cluster.

  1. In the submission cluster define MC_PLUGIN_SCHEDULE_ENHANCE=COUNT_PREEMPTABLE HIGH_QUEUE_PRIORITY PREEMPABLE_QUEUE_PRIORITY PENDING_WHEN_NOSLOTS in lsb.params.
  2. In the execution cluster set MC_PLUGIN_UPDATE_INTERVAL in lsb.params to a non-zero value.
  3. To make the changes take effect in both the submission and execution clusters run the following command:
    badmin reconfig

Examples

MultiCluster job forwarding is enabled from a send-queue on Cluster1 to the receive-queues HighPriority@Cluster2 and HighPriority@Cluster3. Both clusters have lower priority queues from running local jobs, and the high priority queues can preempt jobs from the lower priority queues. The scheduler on Cluster1 has the following information about the remote clusters:

Example 1: MC_PLUGIN_SCHEDULE_ENHANCE=COUNT_PREEMPTABLE:

Cluster2 (100 total slots)

  • queue=HighPriority, priority=60, running slots=20, pending slots=20

  • queue=LowPriority, priority=20, running slots=50, pending slots=0

Cluster3 (100 total slots)

  • queue=HighPriority, priority=70, running slots=30, pending slots=5

  • queue=LowPriority, priority=20, running slots=60, pending slots=0

Cluster2 has a total of 70 running slots out of 100 total slots, with 20 pending slots. The number of (available slots) -(pending slots) for Cluster2 is 10. Cluster3 has a total of 90 running slots out of 100 total slots, with 5 pending slots. The number of (available slots) -(pending slots) for Cluster3 is 5. Thus a job forwarded from Cluster1 is sent to HighPriority@Cluster2.

Example 2: MC_PLUGIN_SCHEDULE_ENHANCE=COUNT_PREEMPTABLE PREEMPTABLE_QUEUE_PRIORITY:

Cluster2 (100 total slots)

  • queue=HighPriority, priority=50, running slots=20, pending slots=20

  • queue=LowPriority, priority=30, running slots=80, pending slots=0

Cluster3 (100 total slots)

  • queue=HighPriority, priority=50, running slots=30, pending slots=15

  • queue=LowPriority, priority=20, running slots=70, pending slots=0

In both Cluster1 and Cluster2, running jobs occupy all 100 slots. LowPriority@Cluster2 has a queue priority of 30, while LowPriority@Cluster3 has a queue priority of 20. Thus a job forwarded from Cluster1 is sent to HighPriority@Cluster3, where slots can be preempted from the lowerest priority queue.

Example 3: MC_PLUGIN_SCHEDULE_ENHANCE=COUNT_PREEMPTABLE HIGH_QUEUE_PRIORITY PREEMPTABLE_QUEUE_PRIORITY:

Cluster2 (100 total slots)

  • queue=HighPriority, priority=60, running slots=20, pending slots=20

  • queue=LowPriority, priority=20, running slots=50, pending slots=0

Cluster3 (100 total slots)

  • queue=HighPriority, priority=70, running slots=30, pending slots=5

  • queue=LowPriority, priority=20, running slots=60, pending slots=0

Cluster2 has a total of 70 running slots out of 100 total slots, with 20 pending slots. The number of (available slots) -(pending slots) for Cluster2 is 10. Cluster3 has a total of 90 running slots out of 100 total slots, with 5 pending slots. The number of (available slots) -(pending slots) for Cluster3 is 5.

Although (available slots)-(pending slots) is higher for Cluster2, Cluster3 contains a higher priority queue. Thus a job forwarded from Cluster1 is sent to HighPriority@Cluster3.

Example 4: MC_PLUGIN_SCHEDULE_ENHANCE=COUNT_PREEMPTABLE HIGH_QUEUE_PRIORITY PREEMPTABLE_QUEUE_PRIORITY:

Cluster2 (100 total slots)

  • queue=HighPriority, priority=60, running slots=20, pending slots=20

  • queue=LowPriority, priority=20, running slots=80, pending slots=0

Cluster3 (100 total slots)

  • queue=HighPriority, priority=60, running slots=30, pending slots=5

  • queue=LowPriority, priority=20, running slots=70, pending slots=0

In both Cluster1 and Cluster2, running jobs occupy all 100 slots. In this case (preemptable available slots)-(pending slots) is considered. For HighPriority@Cluster2 this number is (80-20)=60; for HighPriority@Cluster3 this number is (70-5)=65. Both queues have the same priority, thus a job forwarded from Cluster1 is sent to HighPriority@Cluster3.