Timeout Expired Message

JMS timeouts can occur when there is no communication occurring between the server and agent(s). There is a heartbeat signal present (every 15, 30 seconds or so) to try and determine if the agent is still online and responsive. If no response is received, the agent is marked offline.

ERROR - com.urbancode.devilfish.services.jms.ReplyTimeoutException: Timeout expired: 10000ms
ERROR - Timeout expired: 10000ms

Some scenarios where the "Timeout Expired" message can appear:

Agent Work Overload

If an agent becomes overloaded it is possible that the communication is interrupted and the agent can no longer communicate with the server for the time being. The anthillpro server will then think the agent has gone unresponsive and will subsequently fail the step, since no response can be sent due to not enough resources being available. What might be happening here is that the agent could be overloaded at the time and it is trying to run the step while it is doing other work. The agent proceeds to run the step but has so much tasking it that it can no longer communicate back to the server the status of its command. So the server times out the step, regardless if the agent can complete the work later and return the logs & info to the server (noted usually by a 403 or a 401 error in the agent logs).

Server Work Overload

TBA

Network Blips

Due to instabilities in a network, the following error message can appear quite often when connections are flaky/being dropped.

Connection reset by peer: socket write error
  • NOTE: This error is seen mainly when dealing with SSL

Network Saturation

TBA

Firewall Issues

You may see errors like the following if the connection is being dropped by the firewall:

java.net.ConnectException: Connection Refused

Deadlocks

TBA