Resource reclaim provides a way for the system to reallocate borrowed or shared resources to a consumer when the consumer has workload demand under any of the following conditions:
A lending consumer has workload demand that requires slots owned by the lending consumer
Share ratios are configured, and an under-allocated consumer (a consumer that is not currently using its deserved number of shared slots) has workload demand that requires the use of more slots
A time based resource plan has time intervals that change the number of owned resources, share ratios and limits, borrowing and lending policies, and borrowing and lending limits for one or more consumers
The system does not always return the same resource that the consumer originally lent. If workload is running on a borrowed resource, the system could reclaim a different physical resource (that meets the resource requirements) from the borrower and allocate that resource to the lending consumer in place of the original resource.
Resource reclaim is enabled whenever you enable lending or borrowing for leaf consumers that own resources. By default, the system:
Immediately sends an interrupt event to the service to notify it of the pending reclaim.
Allows the service the number of seconds specified in the reclaim grace period to complete processing before terminating the service instance. Tasks that were running on the service instance before it was killed are requeued to their respective sessions. The default grace period is 0 seconds.
After the reclaim grace period expires, EGO allows 120 seconds leeway time for the return of any reclaimed resources. This is to account for network overhead and other considerations.
The onServiceInterrupt service handler method provides the most effective way to manage an interruption caused by resource reclaim. Use of this method ensures that the service instance receives immediate notification of a pending interruption.
During a reclaim, the service interrupt indicates how much time the service instance takes to complete current running service method and the service instance to clean up. If the service method and cleanup does not complete within the set time, then Symphony will terminate the instance. If the timeout has not expired, Symphony will initiate cleanup after the current running service method completes.
If a task is running and the Invoke method completes during the applied reclaim grace period, the result of that method is treated as it would be treated under normal circumstances.
If a task is running and the Invoke method does not complete before the applied reclaim grace period expires, the service instance on which the task is running is terminated and the task is requeued.
Another but less effective way to manage an interruption is for the service instance to periodically call the getLastInterruptEvent method for interrupt events. With this method, the service instance polls and will not immediately detect the interrupt. While the service instance is polling, the reclaim grace period is expiring, and the service instance will have less time to return a result or shut down gracefully.
Resource reclaim for shared resources is enabled by default once you configure a share pool and share ratios for at least one consumer branch.
When the system must reclaim a resource from a consumer, and there are multiple possibilities for which resource could be reclaimed, these steps describe how your configuration choices help to determine exactly which task will be interrupted and which resource will be reclaimed.
Session importance (preemption rank or session priority) and preemption criteria are always potential influences, but the selective reclaim configuration is the most important parameter because it determines whether the other parameters can influence host selection or not. Note that selective reclaim can only be enabled if "Optimized for application specified conditions" (default setting) is configured through the PMC. If selective reclaim is disabled, the system will still select the "best" slot on a host, but it may appear that resource selection happens at random because there is no effort to select the "best" host among multiple candidates.
The system chooses the resource using the following logic.
Consider selective reclaim configuration.
If selective reclaim is disabled, reclaim resources as quickly as possible, with minimum overhead. This is the default.
EXAMPLE: if multiple hosts in the consumer could meet the resource requirement, the system selects any one at random.
If selective reclaim is enabled, reclaim resources from the less important sessions first. This option has greater overhead.
EXAMPLE: if multiple hosts in the consumer could meet the resource requirement, the system selects all candidate hosts.
For proportional or minimum services scheduling, consider preemption rank. For priority scheduling, consider session priority instead of preemption rank.
With proportional or minimum services scheduling:
From the host or hosts selected, select the least important session, according to preemption rank.
If multiple sessions have equal low rank, select all candidate sessions.
If the resource requirement is for an exclusive host, treat all sessions on a host as if they had the same rank as the most important session using the host.
From the host or hosts selected, select the least important session, according to session priority.
If multiple sessions have equal low priority, select the most recently started session.
If the resource requirement is for an exclusive host, treat all sessions on a host as if they had the same priority as the most important session using the host.
If the criteria is MostRecentTask, reclaim resources from the most recently submitted tasks first.
EXAMPLE: from one or more sessions, the system selects the most most recently started task and reclaims the resource it is using.
If multiple tasks have the same run time, the system selects any one at random.
If multiple tasks run on a slot, consider the cumulative run time of all tasks using the slot.
If the criteria is PolicyDefault, the behavior changes depending on the scheduling policy. This is the default setting for the parameter.
With proportional or minimum services scheduling:
The default is to reclaim resources from the most over-allocated sessions first. This is the option with minimum overhead.
EXAMPLE: from multiple sessions, the system selects the most over-allocated session, and reclaims a resource it is using (task selection is random).
If multiple sessions are equally over-allocated, the system selects any one at random.
If no session is over-allocated, select the least under-allocated instead.
The default is to selects a task from a session with the lowest priority, followed by tasks from the last started session. This is the option with minimum overhead.
An application may be a candidate for selective reclaim when it may need to borrow slots from other consumers and has critical or long running tasks that you do not want interrupted.
Here are some considerations when using selective reclaim.
Are there any critical tasks in the application? If the answer is yes, configure a high preemption rank for critical sessions to protect critical tasks from being interrupted. Otherwise, leave all preemption ranks at the same level. (This only applies to proportional or minimum service policies. For the priority scheduling policy, the session priority is used.)
If there are long running tasks (not critical ones), set preemption criteria to MostRecentTask so that when reclaim happens, the CPU time of long running tasks is not lost.
If all the tasks are short running, set preemption criteria to default for better SSM performance.
Consumers with workload demand can have lent resources reclaimed for them. When the system reclaims a resource, the system interrupts the borrower’s tasks running on the reclaimed resource. The reclaim grace period allows time for a task running on a borrowed slot to complete before the resource returns to its owner. To avoid being requeued, tasks must exit within the reclaim grace period.
By default, the system reclaims owned resources only after attempting to satisfy demand by borrowing resources from other lending consumers or from the share pool. You can change this behavior so that the system reclaims owned resources before allocating borrowed or shared resources.
With a time based resource plan that specifies different values for ownership, lend and borrow limits, share ratios and limits, or total slots in the share pool, a transition from one time interval to the next can trigger resource reclaim. By default, the system enforces ownership and limits when the new time interval takes effect. The following examples illustrate how time interval changes trigger resource reclaim:
The borrowing consumer determines the reclaim grace period. When you configure borrowing and lending, ensure that the lending consumer can wait for the maximum reclaim grace period configured for all of its borrowing consumers.
Platform Management Console: Cluster > Summary > Cluster Properties > Specify resource allocation behavior |
||
Platform Management Console: Cluster > Summary > Cluster Properties > Specify resource allocation behavior |
||
Platform Management Console: Symphony Workload > Configure Applications Open the application profile and edit the General Settings section. |
||
This parameter is ignored if priority scheduling is used.
By default, all sessions are considered to be of equal importance when the system is reclaiming resources. You can modify this behavior by ranking sessions in order of importance when you create a new session. When sessions have different ranks, the system may reclaim resources from the low-ranking sessions first.
By default, the preemption criteria depends on the scheduling policy, and minimizes system overhead.
If you do not have selective reclaim enabled, changing the preemption criteria may not have a significant effect on reclaim behavior.
Once you have configured borrowing, lending, and sharing for your cluster, you cannot directly control or release reclaimed resources. When you modify the resource plan and click Apply, changes take effect immediately and could trigger resource reclaim.