Resource reclaim

Resource reclaim—a feature of Platform Symphony’s borrowing, lending, and sharing functionality—ensures that consumers can take back their deserved shared or lent resources as needed to meet workload demand.

This is not applicable to Symphony DE.

Scope


Applicability

Details

Operating system

  • All host types supported by the Symphony system

Exclusions

  • Does not apply to Symphony DE, which does not have resource lending, borrowing, and reclamation


About resource reclaim

Purpose of resource reclaim

Resource reclaim provides a way for the system to reallocate borrowed or shared resources to a consumer when the consumer has workload demand under any of the following conditions:

  • A lending consumer has workload demand that requires slots owned by the lending consumer

  • Share ratios are configured, and an under-allocated consumer (a consumer that is not currently using its deserved number of shared slots) has workload demand that requires the use of more slots

  • A time based resource plan has time intervals that change the number of owned resources, share ratios and limits, borrowing and lending policies, and borrowing and lending limits for one or more consumers

The system does not always return the same resource that the consumer originally lent. If workload is running on a borrowed resource, the system could reclaim a different physical resource (that meets the resource requirements) from the borrower and allocate that resource to the lending consumer in place of the original resource.

Benefits of resource reclaim

The following illustrations show how the resource reclaim feature works when borrowing, lending, or sharing are enabled.
Important:

Resource reclaim is enabled by default whenever borrowing and lending are enabled. You cannot disable resource reclaim for borrowed or lent resources.

How resource reclaim works for borrowing and lending

You can choose to enable borrowing and lending for owned resources. When you enable borrowing and lending, resource reclaim is always enabled.

Without resource reclaim for sharing (feature not enabled)

In this example, the share ratio is 3:1. Consumer A deserves 3 times the number of slots as Consumer B.

With resource reclaim for sharing (feature enabled)

In this example, the share ratio is 3:1. Consumer A deserves 3 times the number of slots as Consumer B.

Service instance interrupt handling

Resource reclaim is enabled whenever you enable lending or borrowing for leaf consumers that own resources. By default, the system:

  • Immediately sends an interrupt event to the service to notify it of the pending reclaim.

  • Allows the service the number of seconds specified in the reclaim grace period to complete processing before terminating the service instance. Tasks that were running on the service instance before it was killed are requeued to their respective sessions. The default grace period is 0 seconds.

  • After the reclaim grace period expires, EGO allows 120 seconds leeway time for the return of any reclaimed resources. This is to account for network overhead and other considerations.

The onServiceInterrupt service handler method provides the most effective way to manage an interruption caused by resource reclaim. Use of this method ensures that the service instance receives immediate notification of a pending interruption.

During a reclaim, the service interrupt indicates how much time the service instance takes to complete current running service method and the service instance to clean up. If the service method and cleanup does not complete within the set time, then Symphony will terminate the instance. If the timeout has not expired, Symphony will initiate cleanup after the current running service method completes.

If a task is running and the Invoke method completes during the applied reclaim grace period, the result of that method is treated as it would be treated under normal circumstances.

If a task is running and the Invoke method does not complete before the applied reclaim grace period expires, the service instance on which the task is running is terminated and the task is requeued.

Another but less effective way to manage an interruption is for the service instance to periodically call the getLastInterruptEvent method for interrupt events. With this method, the service instance polls and will not immediately detect the interrupt. While the service instance is polling, the reclaim grace period is expiring, and the service instance will have less time to return a result or shut down gracefully.

Configuration to enable resource reclaim

Borrowing and lending with respect to reclaim

Resource reclaim of borrowed resources is always enabled if you configure borrowing and lending at the consumer level. Borrowing and lending can only be configured at the leaf consumer.

Configuration source

Setting

Behavior

Platform Management Console: Consumers > Consumers & Plans > Resource Plan > Show Advanced Settings > Expand All

Owned Slots=integer

  • Specifies a number of slots owned by a leaf consumer. The leaf consumer is guaranteed to receive this number of slots, provided that the consumer has enough demand. If a consumer’s owned slots are lent to a borrowing consumer, and the lending consumer has workload demand, the system initiates a reclaim of the owned slots.

For the lending consumer:
  • Lend checkbox selected

  • Details:
    • Lend checkbox selected for the consumer to lend to

  • Enables the consumer to lend resources to the specified consumer(s)

  • The specified consumer(s) must have borrowing enabled and specify the lending consumer.

For the borrowing consumer:
  • Borrow checkbox selected

  • Details:
    • Borrow selected for the consumer to borrow from

  • Enables the consumer to borrow resources from the specified consumer(s)

  • The specified consumer(s) must have lending enabled and specify the borrowing consumer.


Share pool and share ratio

Resource reclaim for shared resources is enabled by default once you configure a share pool and share ratios for at least one consumer branch.


Configuration source

Setting

Behavior

Platform Management Console: Consumers > Consumers & Plans > Resource Plan > Show Advanced Settings > Expand All

Owned Slots=integer

  • Specifies a number of slots owned by a leaf consumer. The leaf consumer is guaranteed to receive this number of slots, provided that the consumer has enough demand. If a consumer’s owned slots are lent to a borrowing consumer, and the lending consumer has workload demand, the system initiates a reclaim of the owned slots.

  • Any unowned slots constitute a “share pool” for allocation to leaf consumers with unsatisfied demand.

Share Ratio selected and integer specified as a value

  • Sets the relative share ratios within a share pool.

  • If you specify 0 for a consumer, that consumer gives up its share of the share pool when a sibling has demand. A consumer with a share ratio of 0 does not receive any resources from the share pool.

Platform Management Console: Cluster > Summary > Cluster Properties > Specify resource allocation behavior

Reclaim shared resources selected

  • When selected (the default setting), the share pool reclaims resources from a consumer that is using more slots than it deserves based on its share ratio to meet the demands of a competing consumer with a higher share ratio.


Resource reclaim behavior

Order of resource reclaim (consumer level)

Consumers reclaim resources in the following order, regardless of a consumer’s history of resource usage:

When the system reclaims …

Then reclaim occurs in the order of…

Example

Borrowed resources

Resource requirements, determined by the resource group associated with the consumer.

If the lending consumer needs a Windows slot with a certain amount of available memory, the system looks first for an analogous resource to reclaim.

Shared resources

Relative consumer rank, configured in the Resource Plan. Consumer rank is an optional setting. A rank of 0 is the highest rank and larger numbers indicate a lower rank. The system reclaims resources from the lowest ranking consumer first.

The system first reclaims resources from a consumer with rank 50, and then reclaims resources from a consumer with rank 25.

  • By default, the system enforces share ratios at the level of the leaf (child) consumers. If your system is configured to enforce share ratios at the parent level, the system reclaims resources from the parent consumer.

Consumer A is a child consumer of Parent A. Parent A and Parent B are siblings. With share ratio enforced at the parent level, Parent A shares 10 slots with Parent B. Parent B is running workload on 5 slots obtained from Parent A’s share. If Consumer A has unsatisfied demand for 2 slots and all of Parent A’s slots are allocated, the system reclaims 2 slots from Parent B to allocate to Parent A.


Order of resource reclaim (resource level)

When the system must reclaim a resource from a consumer, and there are multiple possibilities for which resource could be reclaimed, these steps describe how your configuration choices help to determine exactly which task will be interrupted and which resource will be reclaimed.

Session importance (preemption rank or session priority) and preemption criteria are always potential influences, but the selective reclaim configuration is the most important parameter because it determines whether the other parameters can influence host selection or not. Note that selective reclaim can only be enabled if "Optimized for application specified conditions" (default setting) is configured through the PMC. If selective reclaim is disabled, the system will still select the "best" slot on a host, but it may appear that resource selection happens at random because there is no effort to select the "best" host among multiple candidates.

The system chooses the resource using the following logic.

  1. Consider selective reclaim configuration.

    1. If selective reclaim is disabled, reclaim resources as quickly as possible, with minimum overhead. This is the default.

      EXAMPLE: if multiple hosts in the consumer could meet the resource requirement, the system selects any one at random.

    2. If selective reclaim is enabled, reclaim resources from the less important sessions first. This option has greater overhead.

      EXAMPLE: if multiple hosts in the consumer could meet the resource requirement, the system selects all candidate hosts.

  2. For proportional or minimum services scheduling, consider preemption rank. For priority scheduling, consider session priority instead of preemption rank.

    1. With proportional or minimum services scheduling:

      From the host or hosts selected, select the least important session, according to preemption rank.

      If multiple sessions have equal low rank, select all candidate sessions.

      If the resource requirement is for an exclusive host, treat all sessions on a host as if they had the same rank as the most important session using the host.

    2. With priority scheduling:

      From the host or hosts selected, select the least important session, according to session priority.

      If multiple sessions have equal low priority, select the most recently started session.

      If the resource requirement is for an exclusive host, treat all sessions on a host as if they had the same priority as the most important session using the host.

  3. Consider preemption criteria.

    1. If the criteria is MostRecentTask, reclaim resources from the most recently submitted tasks first.

      EXAMPLE: from one or more sessions, the system selects the most most recently started task and reclaims the resource it is using.

      If multiple tasks have the same run time, the system selects any one at random.

      If multiple tasks run on a slot, consider the cumulative run time of all tasks using the slot.

    2. If the criteria is PolicyDefault, the behavior changes depending on the scheduling policy. This is the default setting for the parameter.

      • With proportional or minimum services scheduling:

        The default is to reclaim resources from the most over-allocated sessions first. This is the option with minimum overhead.

        EXAMPLE: from multiple sessions, the system selects the most over-allocated session, and reclaims a resource it is using (task selection is random).

        If multiple sessions are equally over-allocated, the system selects any one at random.

        If no session is over-allocated, select the least under-allocated instead.

      • With priority scheduling:

        The default is to selects a task from a session with the lowest priority, followed by tasks from the last started session. This is the option with minimum overhead.

Selective reclaim considerations

An application may be a candidate for selective reclaim when it may need to borrow slots from other consumers and has critical or long running tasks that you do not want interrupted.

Important:

Selective reclaim will not take effect if Reclaim optimization is configured as Optimized for standby service in the PMC.

Here are some considerations when using selective reclaim.

  • Are there any critical tasks in the application? If the answer is yes, configure a high preemption rank for critical sessions to protect critical tasks from being interrupted. Otherwise, leave all preemption ranks at the same level. (This only applies to proportional or minimum service policies. For the priority scheduling policy, the session priority is used.)

  • If there are long running tasks (not critical ones), set preemption criteria to MostRecentTask so that when reclaim happens, the CPU time of long running tasks is not lost.

  • If all the tasks are short running, set preemption criteria to default for better SSM performance.

Consumer demand

Consumers with workload demand can have lent resources reclaimed for them. When the system reclaims a resource, the system interrupts the borrower’s tasks running on the reclaimed resource. The reclaim grace period allows time for a task running on a borrowed slot to complete before the resource returns to its owner. To avoid being requeued, tasks must exit within the reclaim grace period.

By default, the system reclaims owned resources only after attempting to satisfy demand by borrowing resources from other lending consumers or from the share pool. You can change this behavior so that the system reclaims owned resources before allocating borrowed or shared resources.

Time interval transitions

With a time based resource plan that specifies different values for ownership, lend and borrow limits, share ratios and limits, or total slots in the share pool, a transition from one time interval to the next can trigger resource reclaim. By default, the system enforces ownership and limits when the new time interval takes effect. The following examples illustrate how time interval changes trigger resource reclaim:


When…

The behavior is…

Example

A consumer’s ownership increases for the new time interval, lending and borrowing are not configured, and another consumer is using more than its deserved resources

The system reclaims slots whether or not consumers have unsatisfied demand.

  1. Consumer A owns 10 slots between 8:00 a.m. and 5:00 p.m. and 25 slots between 5:01 and 11:49 p.m.

  2. At 5:01 p.m., Consumer B is using more than its deserved slots.

  3. At 5:01 p.m., the system reclaims 15 slots to allocate to Consumer A.

A consumer’s ownership decreases for the new time interval, and lending and borrowing are not configured

The system reclaims the number of slots required to conform to the ownership values configured for the new time interval, whether or not other consumers have unsatisfied demand.

  1. Consumer A owns 10 slots between 8:00 a.m. and 5:00 p.m. and 5 slots between 5:01 and 11:49 p.m.

  2. Consumer B owns 5 slots between 8:00 a.m. and 5:00 p.m. and 10 slots between 5:01 and 11:49 p.m.

  3. At 5:01 p.m., the system reclaims 5 slots from Consumer A, even if Consumer A has unsatisfied demand, and allocates 5 slots to Consumer B.

A consumer’s ownership decreases for the new time interval, borrowing and lending for the consumer are configured, and a lending consumer has slots available

The system reclaims the number of slots required to conform to the ownership values configured for the new time interval, and then the consumer borrows available resources; the resource status changes from owned to borrowed.

  1. Consumer A owns 10 slots between 8:00 a.m. and 5:00 p.m. and 5 slots between 5:01 and 11:49 p.m.

  2. Consumer B owns 5 slots between 8:00 a.m. and 5:00 p.m. and 10 slots between 5:01 and 11:49 p.m.

  3. At 5:00 p.m., Consumer A has workload running on 10 slots and Consumer B has workload running on 5 slots.

  4. At 5:01 p.m., the system reclaims 5 slots from Consumer A, even if Consumer A has unsatisfied demand, and allocates 5 slots to Consumer B.

  5. Consumer A is configured to borrow from Consumer B, and Consumer B is configured to lend to Consumer A.

  6. Consumer B has no demand for the 5 reclaimed slots. Consumer A borrows 5 slots from Consumer B.

A consumer’s lend limit decreases for the new time interval

The system reclaims the number of slots required to conform to the new lend limit whether or not the consumer has unsatisfied demand.

  1. Consumer A has a lend limit of 10 slots between 8:00 a.m. and 5:00 p.m. and 5 slots between 5:01 and 11:49 p.m.

  2. Consumer B borrows 10 slots from Consumer A.

  3. At 5:01 p.m., the system reclaims 5 slots from Consumer B and allocates them to Consumer A.

A consumer’s borrow limit decreases for the new time interval

The system reclaims the number of slots required to conform to the new borrow limit, whether or not the lending consumer has unsatisfied demand.

  1. Consumer A has a borrow limit of 10 slots between 8:00 a.m. and 5:00 p.m. and 5 slots between 5:01 and 11:49 p.m.

  2. Consumer A borrows 10 slots from Consumer B.

  3. At 5:01 p.m., the system reclaims 5 slots from Consumer A to return to Consumer B.

A consumer’s share limit decreases

The system reclaims the number of slots required to conform to the new share limit, whether or not a competing consumer has unsatisfied demand.

  1. Consumer A has a share limit of 10 slots between 8:00 a.m. and 5:00 p.m. and 5 slots between 5:01 and 11:49 p.m.

  2. A share pool is configured for the consumer branch (the parent consumer and its children).

  3. At 5:01 p.m., the system reclaims 5 slots from Consumer A to return to the share pool.

The total number of slots in the share pool decreases

The system reclaims the number of slots needed to maintain share ratios whether or not a competing consumer has unsatisfied demand.

  1. Consumers A and B each have a share ratio of 1.

  2. The consumer branch owns 10 slots between 8:00 a.m. and 5:00 p.m. and 4 slots between 5:01 and 11:49 p.m.

  3. At 5:00 p.m., Consumer A runs workload on 5 slots, and Consumer B runs workload on 5 slots.

  4. At 5:01 p.m., consumers A and B each return 3 slots to the share pool.

  5. During the new time interval, Consumer A runs workload on 2 slots and Consumer B runs workload on 2 slots.


Configuration to modify resource reclaim behavior

Configuration to modify the reclaim grace period

You can configure a different reclaim grace period behavior for each consumer.
Important:

The borrowing consumer determines the reclaim grace period. When you configure borrowing and lending, ensure that the lending consumer can wait for the maximum reclaim grace period configured for all of its borrowing consumers.


Configuration source

Setting

Behavior

Platform Management Console: Consumers > Consumers & Plans > consumer_name > Consumer Properties > Reclaim behavior

Reclaim grace period= integer

Seconds | Minutes | Hours

  • Specifies the wait time before the system interrupts workload running on a borrowed or shared host to reclaim the resource.

  • To reclaim resources almost immediately, specify 0 seconds.

  • If you leave the reclaim grace period blank or specify 0, the system uses a default grace period of 0 seconds.

  • As a best practice, you should specify a realistic value that allows tasks from all of your applications enough execution time and time to clean up to avoid unnecessary interruption.

    Consider both the typical length of a workload unit run by a borrowing consumer and the urgency of workload demand from the lending consumer.


Configuration to modify system rebalancing behavior

You can configure system rebalancing behavior for each consumer.
Note:

Child consumers do not inherit the value set for the parent consumer.


Configuration source

Setting

Behavior

Platform Management Console: Consumers > Consumers & Plans > consumer_name > Consumer Properties > Reclaim behavior

Rebalance when time intervals change selected

  • (Default setting) Enforces ownership, share ratios, and borrowing, lending, and share limits for this consumer when the new time interval takes effect, regardless of consumer demand.

  • If corresponding lending and borrowing consumers have different rebalancing settings (one is selected and the other is deselected), the consumer with an over-allocation in the new time interval determines which setting the system uses, which determines whether rebalancing occurs.

Rebalance when time intervals change deselected

  • When deselected, the system waits until borrowed resources are returned before enforcing new ownership, share ratios, and borrowing, lending, and share limits for this consumer.


Configuration to modify reclaim behavior for shared resources

You can configure whether the system reclaims shared resources or waits until consumers release shared resources after completing workload tasks, and whether to enforce share ratios at the parent level.

Configuration source

Setting

Behavior

Platform Management Console: Cluster > Summary > Cluster Properties > Specify resource allocation behavior

Reclaim shared resources selected

  • (Default setting) Enables the system to reclaim resources from an over-allocated consumer when a consumer with a higher share ratio has unsatisfied demand.

Reclaim shared resources deselected

  • When deselected, the system does not reclaim shared resources.

ego.conf

EGO_PARENT_QUOTA=Y

  • Enforces share ratios at the parent level, which allows a leaf (child) consumer to have resources reclaimed from another consumer branch, based on the parent consumers’ share ratios.

    By default EGO_PARENT_QUOTA is set to N.

  • You must restart EGO on the master and all master candidates after modifying ego.conf.


Configuration to modify reclaim behavior for owned resources

By default, consumers borrow resources before their owned resources are reclaimed. You can modify this behavior so that lent resources are reclaimed before borrowing resources from another consumer. This is useful when a consumer’s owned resources have specific characteristics required to run the consumer’s workload, or when borrowing from a different consumer branch incurs costs based on charge-back policies at your site.

Configuration source

Setting

Behavior

Platform Management Console: Cluster > Summary > Cluster Properties > Specify resource allocation behavior

Reclaim lent resources before borrowing selected

  • Enables reclaim of owned resources before borrowing resources from other consumers.

Reclaim lent resources before borrowing deselected

  • (Default setting) When deselected, consumers with unsatisfied demand borrow resources from other consumers before having their owned resources reclaimed.


Configuration to enable selective reclaim

By default, the system will reclaim resources as quickly as possible, with minimum overhead. You can modify this behavior so that the system considers the relative importance of running work and reclaims resources from less important sessions first.

Configuration source

Setting

Behavior

Platform Management Console: Symphony Workload > Configure Applications

Open the application profile and edit the General Settings section.

Enable Selective Reclaim = true

  • From all the suitable hosts in the consumer, consider session importance (preemption rank or session priority) and preemption criteria to determine which resource to reclaim.

Enable Selective Reclaim = false

  • (Default setting) From all the suitable hosts in the consumer, pick a host at random and reclaim a resource from that host. Session importance (preemption rank or session priority) and preemption criteria determine which slot on the host is chosen.


Configuration to modify preemption rank

This parameter is ignored if priority scheduling is used.

By default, all sessions are considered to be of equal importance when the system is reclaiming resources. You can modify this behavior by ranking sessions in order of importance when you create a new session. When sessions have different ranks, the system may reclaim resources from the low-ranking sessions first.

Preemption rank is similar to session priority but it cannot be changed after the session has started, and it is not used when the system has to allocate resources, only when it has to reclaim them.

Configuration source

Setting

Behavior

Platform Management Console: Symphony Workload > Configure Applications

Open the application profile and edit the Session Type Definition section.

Preemption Rank = n

  • Specifies the preemption rank, a numerical value from 1 -10000.

  • The default preemption rank is the lowest possible value,1.

  • To help protect an important session from losing a resource, specify a higher rank for the session. When there are multiple resources that could be reclaimed, the system may reclaim resources used by lower-ranking sessions first.

  • If you do not enable selective reclaim, setting the preemption rank may not have a significant effect.


Configuration to modify preemption criteria

By default, the preemption criteria depends on the scheduling policy, and minimizes system overhead.

If you do not have selective reclaim enabled, changing the preemption criteria may not have a significant effect on reclaim behavior.

You can modify this behavior so that the system reclaims resources from recently started tasks first.

Configuration source

Setting

Behavior

Platform Management Console: Symphony Workload > Configure Applications

Open the application profile and edit the General Settings section.

Preemption Criteria = MostRecentTask

  • From all the candidate sessions, find the task with the least run time, and reclaim the resource it is using.

  • If multiple tasks have the same run time, choose one at random.

  • If multiple tasks run on a slot, consider the cumulative run time of all tasks on that slot.

Preemption Criteria = Default

and

Scheduling Policy = Proportional Scheduling or Minimum Services

  • (Default setting) Reclaim resources from the most over-allocated session first.

  • If multiple sessions are equally over-allocated, pick one at random.

  • If no sessions are over-allocated, pick the least under-allocated.

Preemption Criteria = Default

and

Scheduling Policy = Priority Scheduling

  • (Default setting) If multiple resources are available, pick one at random.


Resource reclaim interface

Actions to monitor

You can monitor resource reclaim through the Platform Management Console.


Platform Management Console option

Description

Resources > Monitor Resource Allocation

  • Displays a list of consumers along with each consumer’s current allocation of owned, shared, and borrowed slots and the consumer’s current demand


Actions to control

Once you have configured borrowing, lending, and sharing for your cluster, you cannot directly control or release reclaimed resources. When you modify the resource plan and click Apply, changes take effect immediately and could trigger resource reclaim.


User

Interface

Behavior

  • Cluster administrator (EGO)

From the command line:

egosh resource close -reclaim resource_name

  • Closes a resource, preventing further allocation. The system reclaims the host before it closes; running workload units terminate as per the configured grace period.

  • Application developer

Using the API:

onServiceInterrupt

  • Notifies the service that the service instance manager has sent an interrupt signal.


Actions to display configuration


User

Command

Behavior

  • Cluster administrator

  • Consumer administrator

From the Platform Management Console:

  • Consumers > Consumers & Plans > consumer_name > Consumer Properties > Reclaim behavior

  • Displays the settings for Reclaim grace period and Rebalance when time intervals change

  • Cluster administrator

  • Consumer administrator

From the Platform Management Console:

  • Consumers > Consumers & Plans > Resource Plan> Show Advanced Settings > Expand All

  • Displays the ownership, rank, lend, borrow, and share settings for all consumers

  • Cluster administrator

From the Platform Management Console:

  • Cluster > Summary > Cluster Properties > Specify resource allocation behavior

  • Displays the settings for Reclaim shared resources and Reclaim lent resources before borrowing

  • Cluster administrator

  • Consumer administrator

From the Platform Management Console Dashboard:

  • Symphony Workload > Monitor Workload > Application Properties

From the command line:

  • soamview app app_name -l

  • Displays the setting for Selective Reclaim

  • Cluster administrator

  • Consumer administrator

From the Platform Management Console Dashboard:

  • Symphony Workload > Monitor Workload > Application Properties

From the command line:

  • soamview app app_name -l

  • Displays the setting for Preemption Criteria

  • Cluster administrator

  • Consumer administrator

From the Platform Management Console Dashboard:

  • Symphony Workload > Monitor Workload > application_name > Session ID > Session Properties

From the command line:

  • soamview session application_name:session_ID -l

  • Displays the setting for Preemption Rank