Early Thor k8s job failure (before wuid has started) can lead to agent(workflow) deadlock waiting for response

Description

If the thormanager hits an exception whilst starting, it fails to notify the agent and exits cleanly.
This causes the agent to sit indefinitely waiting for an appropriate workunit state change.

Ensure the manager exception in relayed to the workunit, and avoid a clean exit code.
NB: if the thormanager job died for other reasons (e.g. k8s delete it abruptly), then the agent would have noticed and caught the k8s exit status.

Conclusion

None

Activity

Show:
Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Components

Assignee

Reporter

Priority

Fix versions

Created November 22, 2023 at 1:29 PM
Updated November 23, 2023 at 5:33 PM
Resolved November 23, 2023 at 5:33 PM