Resource reclaim provides a way for the system to reallocate borrowed or shared resources to a consumer when the consumer has workload demand under any of the following conditions:
A lending consumer has workload demand that requires the use of all slots owned by the lending consumer
Share ratios are configured, and an under-allocated consumer (a consumer that is not currently using its deserved number of shared slots) has workload demand that requires the use of more slots
A time based resource plan has time intervals that change the number of owned resources, share ratios and limits, borrowing and lending policies, and borrowing and lending limits for one or more consumers
The system does not always return the same resource that the consumer originally lent. If workload is running on a borrowed resource, the system could reclaim a different physical resource (that meets the resource requirements) from the borrower and allocate that resource to the lending consumer in place of the original resource.
Immediately sends an interrupt event to the service to notify it of the pending reclaim.
Allows the service the number of seconds specified in the reclaim grace period to complete processing before terminating the service instance. Tasks that were running on the service instance before it was killed are requeued to their respective sessions. The default grace period is 0 seconds.
After the reclaim grace period expires, EGO allows 120 seconds leeway time for the return of any reclaimed resources. This is to account for network overhead and other considerations.
Rebalances resource allocation throughout the cluster when a new time interval begins.
Reclaims resources under the share model to meet unsatisfied demand within a consumer branch.
Borrows resources from other consumers before reclaiming lent resources.
The onServiceInterrupt service handler method provides the most effective way to manage an interruption caused by resource reclaim. Use of this method ensures that the service instance receives immediate notification of a pending interruption.
During a reclaim, the service interrupt indicates how much time the service instance takes to complete current running service method and the service instance to clean up. If the service method and cleanup does not complete within the set time, then Symphony will terminate the instance. If the timeout has not expired, Symphony will initiate cleanup after the current running service method completes.
If a task is running and the Invoke method completes during the applied reclaim grace period, the result of that method is treated as it would be treated under normal circumstances.
If a task is running and the Invoke method does not complete before the applied reclaim grace period expires, the service instance on which the task is running is terminated and the task is requeued.
Another but less effective way to manage an interruption is for the service instance to periodically call the getLastInterruptEvent method for interrupt events. With this method, the service instance polls and will not immediately detect the interrupt. While the service instance is polling, the reclaim grace period is expiring, and the service instance will have less time to return a result or shut down gracefully.
Resource reclaim for shared resources is enabled by default once you configure a share pool and share ratios for at least one consumer branch.
Consumers with workload demand can have lent resources reclaimed for them. When the system reclaims a resource, the system interrupts the borrower’s tasks running on the reclaimed resource. The reclaim grace period allows time for a task running on a borrowed slot to complete before the resource returns to its owner. To avoid being requeued, tasks must exit within the reclaim grace period.
By default, the system reclaims owned resources only after attempting to satisfy demand by borrowing resources from other lending consumers or from the share pool. You can change this behavior so that the system reclaims owned resources before allocating borrowed or shared resources.
With a time based resource plan that specifies different values for ownership, lend and borrow limits, share ratios and limits, or total slots in the share pool, a transition from one time interval to the next can trigger resource reclaim. By default, the system enforces ownership and limits when the new time interval takes effect. The following examples illustrate how time interval changes trigger resource reclaim:
The borrowing consumer determines the reclaim grace period. When you configure borrowing and lending, ensure that the lending consumer can wait for the maximum reclaim grace period configured for all of its borrowing consumers.
Platform Management Console: Cluster > Summary > Cluster Properties > Specify resource allocation behavior |
||
Platform Management Console: Cluster > Summary > Cluster Properties > Specify resource allocation behavior |
||
Once you have configured borrowing, lending, and sharing for your cluster, you cannot directly control or release reclaimed resources. When you modify the resource plan and click Apply, changes take effect immediately and could trigger resource reclaim.