Add an indication that jobs are blocked calling an external function

Description

Currently if a workunit is blocked in an external call (e.g., spraying a file), the status is shown as running, but there is no indication why it is not progressing.

It would be much better if the status was updated to indicate it was blocked, and some indication of why.

An outline implementation:

  • For functions that are timed, (optionally) save some details in the call to get the initial start time.

  • Add a thread which periodically checks if an external call has been waiting for more than <n> seconds, and if so update the workunit.

  • clear the information on failure.

Some complications:

  • How do we ensure the state is cleared on failure

  • What level of description is added to the state. It is important to minimize the overhead, so dynamically evaluating a string to put in the status on each call may be too expensive. That would limit whether the spray filename was included etc.

  • Do some calls (e.g., spray) want to unconditionally update the state as soon as it starts?

  • If a call exceeds a threshold, a timestamp could be added to the workunit indicating when the spray started.

  • Thor may also have some long running external calls, so a class that could be shared would be good.

Conclusion

None
50% Done
Type
Key
Summary
Priority
Story Points
Assignee
Status

Activity

Show:

Richard Chapman April 26, 2017 at 9:11 AM

Static information about what external calls are present in a subgraph might help.
(and other things - SOAPCALLS, Embedded C++, etc...)

These things actually happen at workflow item level...

Richard Chapman January 17, 2017 at 2:55 PM

There is a "Blocked" state already (together with a reason string" used when we are blocked waiting for a persist or waiting for thor.

Adding a 'blocks' attribute to some external functions (analogous to timed) would allow us to fairly easily support the cases where a function is KNOWN to take a while.

The cases where a function unexpectedly takes a long time are harder. Not sure that it is vital to completely minimize the overhead though - if you have decided that the function is expensive enough to time then it's probably not critical to minimize the per-call overhead. Certainly things like spray we would not care.

What happens when one thread is blocked but others are not? Or two threads are blocked but we can only give information about one?

Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Priority

Compatibility

Minor

Labels

Created September 9, 2016 at 9:36 AM
Updated November 22, 2017 at 11:01 AM

Flag notifications