Workflows Stuck on the AHP Server

While AnthillPro is a great product for build & deploy automations, there are problems that can occur/present themselves which cause workflows to be stuck in various statuses on the AnthillPro server. Not only is this seen with workflow cases, but it may happen with job traces, and command traces as well. These errors can occur if a persistant did not update within the database correctly, or if the step threw an error in a manner that caused processing to cease, and an "update" event could not be fired. Usually, the only way to resolve this issue is through database manipulation. Some examples of SQL statements that work are listed below with explanations.

Components

One thing to keep in mind is that while the following SQL will update the database with the respective value, there may still be instances of the values hanging around within anthill's memory. The only way to fully ensure that these values are cleared out would be to restart the server process.

Build Requests

Build requests may not update properly if the BuildRequestEndEvent does not fire. In that case, the SQL below will update the BuildRequest.

Type SQL
set all running build requests to a status UPDATE BUILD_REQUEST SET STATUS = 'XX' WHERE STATUS = 'YY'
set build requests with certain BuildRequest ID UPDATE BUILD_REQUEST SET STATUS = 'XX' WHERE ID = ZZ
SELECTOR (replace XX/YY/ZZ above with one of below)
Created New Build Life Delayed Build Determining Quiet Period
Failed Merged New Build Life Not Needed
Restarting Started Workflow Waiting on Dependency

Workflow Cases

Workflow cases may not update properly if the WorkflowEndEvent does not fire. In that case, the SQL below will update the WorkflowCase.

Type SQL
set all running workflows to a status UPDATE WORKFLOW_CASE SET STATUS = 'XX' WHERE STATUS = 'YY'
set workflows with a certain BuildLife ID UPDATE WORKFLOW_CASE SET STATUS = 'XX' WHERE BUILD_LIFE_ID = ZZ
SELECTOR (replace XX/YY/ZZ above with one of below)
Aborted Aborting Complete
Complete (Warning) Error Failed
Queued Restarting Running
Suspended Suspending Waiting on Agents

Job Traces, Job Step Traces, and Command Traces

Much like Workflow Cases, Job Traces can fail to update if an event to end the job does not fire. Also, while more rare than the previous, the step itself can still have a leftover status. Additionally, command traces, while usually hidden behind the scenes, these might also need updated if any of the prior failed.

Type SQL
set all running jobs to a status UPDATE JOB_TRACE SET STATUS = 'XX' WHERE STATUS = 'YY'
set all running step traces to a status UPDATE JOB_STEP_TRACE SET STATUS = 'XX' WHERE STATUS = 'YY'
set all running command traces to a status UPDATE JOB_STEP_CMD_TRACE SET STATUS = 'XX' WHERE STATUS = 'YY'
SELECTOR (replace XX/YY/ZZ above with one of below)
Aborted Aborting Failed
Not Needed Queued Resuming
Running Success Success (Warning)
Suspended Timed Out Waiting on Agents

Lockable Resources & Lock Holders

Lockable resources, whether defined or scripted, as well as any resource acquired & holding a lock, should be present in the locking tables of anthillpro. If the locks need cleared, and cannot be revoked manually (or there are simply too many to revoke) it is possible to clean these up via SQL.

Type SQL
clear all locks DELETE FROM LOCK_HOLDER_LOCK_RES;
DELETE FROM LOCK_HOLDER;

Appendix A: Kill All (Requires Server Restart)

Type SQL
kill everything

DELETE FROM LOCK_HOLDER_LOCK_RES;
DELETE FROM LOCK_HOLDER;
UPDATE JOB_TRACE SET STATUS = 'Aborted' WHERE STATUS IN ('Running','Restarting','Aborting','Waiting on Agents','Suspended');
UPDATE WORKFLOW_CASE SET STATUS = 'Aborted' WHERE STATUS IN ('Running','Restarting','Aborting','Waiting on Agents','Suspended');
UPDATE JOB_STEP_TRACE SET STATUS = 'Aborted' WHERE STATUS IN ('Running','Restarting','Aborting','Waiting on Agents','Suspended');
UPDATE JOB_STEP_CMD_TRACE SET STATUS = 'Aborted' WHERE STATUS IN ('Running','Restarting','Aborting','Waiting on Agents','Suspended');
UPDATE BUILD_REQUEST SET STATUS = 'Aborted' WHERE STATUS IN ('Running','Determining Quiet Period','NA','Restarting','Suspended');

If you would rather be persnickety about it to err on the side of caution, you can replace the IN clause with all the statuses listed above.

Appendix B: Component Identification

It can be hard to tell which SQL statement to run if you are unsure of which component is to blame. Therefore, listed below is a color-coded screenshot of what each status is.

Screenshot of a running build with sections highlighted showing the WorkflowCase, JobTrace, JobStepTrace, and JobStepCommandTrace