Thor crash in FileIO->close() since 7.12.32

Description

jSince 7.12.32 Thor crashes often in:
#0 Release<IFileIO> (ptr=0x0) at /home/centos/mkelly/HPCC-Platform/thorlcr/activities/./../../system/jlib/jscm.hpp:46
#1 ~Shared (this=<optimized out>, __in_chrg=<optimized out)> at /home/centos/mkelly/HPCC-Platform/thorlcr/activities/./../../system/jlib/jscm.hpp:62
#2 CDiskWriteSlaveActivityBase::close (this=this@entry=0x1080790) at /home/centos/mkelly/HPCC-Platform/thorlcr/activities/thdiskbaseslave.cpp:442
#3 0x00007f68654d23d4 in CDiskWriteSlaveActivityBase::process (this=0x1080790) at /home/centos/mkelly/HPCC-Platform/thorlcr/activities/thdiskbaseslave.cpp:573
#4 0x00007f68653fa705 in ProcessSlaveActivity::threadmain (this=0x1080790) at /home/centos/mkelly/HPCC-Platform/thorlcr/slave/slave.cpp:87
#5 0x00007f685f7ecdf4 in CThreadedPersistent::threadmain (this=0x10809b0) at /home/centos/mkelly/HPCC-Platform/system/jlib/jthread.cpp:591
#6 0x00007f685f7f3c10 in non-virtual thunk to CThreadedPersistent::CAThread::run() () at /home/centos/mkelly/HPCC-Platform/system/jlib/jthread.hpp:181
#7 0x00007f685f7ee118 in Thread::begin (this=0x10809b0) at /home/centos/mkelly/HPCC-Platform/system/jlib/jthread.cpp:292
#8 0x00007f685f7ed79d in Thread::_threadmain (v=0x10809b0) at /home/centos/mkelly/HPCC-Platform/system/jlib/jthread.cpp:138

(gdb) p tmpFileIO
$1 = {<Shared<IFileIO>> =

{ptr = 0x0}

, <No data fields>}

There was a change in 7.12.32 related to this area:
HPCC-25345 Avoid losing write close exceptions

Conclusion

None

Activity

Show:

Mark Kelly April 12, 2021 at 12:08 PM

ok, thx. I will issue PR for this today.

Jacob Cobbett-Smith April 12, 2021 at 9:23 AM

I'm guessing it's because it failed (exception thrown) in open(), and never set outputIO. close() should guard against outputIO being unset.
Do you have the workunit and/or logs? - combined with the stack above, that should clarify the circumstances.

But anyway from stack :

that does mean it was handling an exception - it could have been in open() and could have been before outputIO was set - could test by taking an exception after the createMultipleWrite call.

Should add a check around the code that's dealing with outputIO  in close), e.g.:

Mark Kelly April 12, 2021 at 2:10 AM
Edited

Are you thinking of a simple protection, as in -

Or bypassing a lot of code in CDiskWriteSlaveActivityBase::open(), as in -

Jacob Cobbett-Smith April 6, 2021 at 10:48 AM

I think you said you were planning to look at this/add a check, assigning to you for now.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Components

Assignee

Reporter

Priority

Compatibility

Point

Fix versions

Pull Request URL

Affects versions

Created April 5, 2021 at 9:04 PM
Updated April 15, 2021 at 3:42 PM
Resolved April 15, 2021 at 3:42 PM