Thor slave hang in countproject

Description

Very intermittent hang with countproject activity with multiple slaves
regression test mergededup.ecl can reproduce it if run many times

Conclusion

None

Activity

Show:

Jacob Cobbett-Smith August 23, 2016 at 4:32 PM

To happen, the countproject on a slave, must stop() before nextRow is ever called.
Which could happen if the downstream activities stop quickly, e.g. via a CHOOSEN.
e.g. if 1st slaves hit the CHOOSEN limit, other will quickly be told to stop().

In that case, the semaphore in COUNTPROJECT::onInputFinished will not have been signalled (normally signalled by 1st call to nextRow())

Meaning that the countproject::stop() when called will deadlock, since nextRow() will now never call it and the call to signalNext in countproject::stop() is after the PARENT::stop() that's stuck in the lookahead waiting in onInputFinished on the semaphore.

Mark Kelly August 22, 2016 at 2:19 PM

closeing to reissue similar fix for 6.0.6

Mark Kelly July 15, 2016 at 6:28 PM

Basically thread stuck in join because stop called before semaphore post

Mark Kelly July 15, 2016 at 6:28 PM

Should this go into 6.0.4 ?

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Components

Assignee

Reporter

Priority

Fix versions

Created July 15, 2016 at 6:22 PM
Updated August 24, 2016 at 12:57 PM
Resolved August 24, 2016 at 12:57 PM

Flag notifications