Fixed
Pinned fields
Click on the next to a field label to start pinning.
Details
Components
Assignee
Jacob Cobbett-SmithJacob Cobbett-SmithReporter
Russell Wagner IIIRussell Wagner IIIPriority
MajorCompatibility
MinorFix versions
Pull Request URL
Affects versions
Details
Details
Components
Assignee
Jacob Cobbett-Smith
Jacob Cobbett-SmithReporter
Russell Wagner III
Russell Wagner IIIPriority
Compatibility
Minor
Fix versions
Pull Request URL
Affects versions
Created November 29, 2018 at 11:21 PM
Updated December 14, 2018 at 12:10 PM
Resolved December 14, 2018 at 12:10 PM
The job is segfaulting the thorslave:
The thorslave segfaulted on this job. Looks like it generated core files as well.
TS logs
10.173.71.44:/var/lib/HPCCSystems/thor100_71_5/thorslave.94.2018_11_29.log
0032F51B 2018-11-29 16:17:00.714 189606 1471742 "recvLoop - received bcast_stop, from : node=3, slave=3 - activity(ch=0, smartjoin, 1317)"
0032F51C 2018-11-29 16:17:00.785 189606 1471729 "clearNonLocalRows[slave=2], numCommitted=87621, totalRows(inc uncommitted)=87646, flushMarker=0 - activity(ch=0, smartjoin, 1317)"
0032F51D 2018-11-29 16:17:00.789 189606 1471729 "clearAllNonLocalRows(100): CThorSpillableRowArray::save (skipNulls=true, emptyRowSemantics=0) max rows = 87621 - activity(ch=0, smartjoin, 1317)"
0032F51E 2018-11-29 16:17:00.802 189606 1471729 "clearAllNonLocalRows(100): CThorSpillableRowArray::save done, rows written = 87621, bytes = 1051452 - activity(ch=0, smartjoin, 1317)"
0032F51F 2018-11-29 16:17:00.944 189606 1471729 "clearNonLocalRows[slave=2], numCommitted=67, totalRows(inc uncommitted)=87711, flushMarker=0 - activity(ch=0, smartjoin, 1317)"
0032F520 2018-11-29 16:17:00.944 189606 1471729 "================================================"
0032F521 2018-11-29 16:17:00.944 189606 1471729 "Program: 10.173.71.44:/mnt/disk1/HPCCSystems/bin/thorslave_lcr"
0032F522 2018-11-29 16:17:00.944 189606 1471729 "Signal: 11 Segmentation fault"
0032F523 2018-11-29 16:17:00.944 189606 1471729 "Fault IP: 00007FF65D81EF22"
0032F524 2018-11-29 16:17:00.944 189606 1471729 "Accessing: 0000000000000000"
0032F525 2018-11-29 16:17:00.944 189606 1471729 "Backtrace:"
0032F526 2018-11-29 16:17:00.961 189606 1471729 " /var/lib/HPCCSystems/queries/thor100_71_5_24200/V4167451050_libW20181129-154014.so(+0x6d3f22) [0x7ff65d81ef22]"
0032F527 2018-11-29 16:17:00.961 189606 1471729 " /opt/HPCCSystems/lib/libactivityslaves_lcr.so(+0xe44ab) [0x7ff669eca4ab]"
0032F528 2018-11-29 16:17:00.961 189606 1471729 " /opt/HPCCSystems/lib/libactivityslaves_lcr.so(+0xe45e3) [0x7ff669eca5e3]"
0032F529 2018-11-29 16:17:00.961 189606 1471729 " /opt/HPCCSystems/lib/libroxiemem.so(+0x16432) [0x7ff66488a432]"
0032F52A 2018-11-29 16:17:00.961 189606 1471729 " /opt/HPCCSystems/lib/libroxiemem.so(+0x16660) [0x7ff66488a660]"
0032F52B 2018-11-29 16:17:00.962 189606 1471729 " /opt/HPCCSystems/lib/libjlib.so(_ZN6Thread5beginEv+0x2c) [0x7ff6645c5cbc]"
0032F52C 2018-11-29 16:17:00.962 189606 1471729 " /opt/HPCCSystems/lib/libjlib.so(_ZN6Thread11_threadmainEPv+0x1e) [0x7ff6645c768e]"
0032F52D 2018-11-29 16:17:00.962 189606 1471729 " /lib64/libpthread.so.0(+0x7e25) [0x7ff6631dee25]"
0032F52E 2018-11-29 16:17:00.962 189606 1471729 " /lib64/libc.so.6(clone+0x6d) [0x7ff662f08bad]"
0032F52F 2018-11-29 16:17:00.962 189606 1471729 "Registers:"
0032F530 2018-11-29 16:17:00.962 189606 1471729 "EAX:0000000000000000 EBX:0000000000000000 ECX:0000000000000000 EDX:00007FF64D9C0080 ESI:0000000000000000 EDI:00000000015F4168"
0032F531 2018-11-29 16:17:00.962 189606 1471729 "R8 :0000000000000001 R9 :00007FF662E5716D R10:61202D20303D7265 R11:0000000000000000"
0032F532 2018-11-29 16:17:00.962 189606 1471729 "R12:00007FF410008440 R13:0000000000000000 R14:0000000000000043 R15:00007FF4100084B0"
0032F533 2018-11-29 16:17:00.962 189606 1471729 "CS:EIP:0033:00007FF65D81EF22"
0032F534 2018-11-29 16:17:00.962 189606 1471729 " ESP:00007FF40DFFA320 EBP:00007FF40DFFA350"
0032F535 2018-11-29 16:17:00.962 189606 1471729 "Stack[00007FF40DFFA320]: 0000000000000000 015F416800000000 00000000015F4168 0DFFA3C000000000 00007FF40DFFA3C0 645402DC00007FF4 00007FF6645402DC 0000000000007FF6"
0032F536 2018-11-29 16:17:00.962 189606 1471729 "Stack[00007FF40DFFA340]: 00007FF400000000 0000000000007FF4 0000000000000000 0193A07000000000 000000000193A070 69ECA4AB00000000 00007FF669ECA4AB 0DFFA3E000007FF6"
0032F537 2018-11-29 16:17:00.962 189606 1471729 "Stack[00007FF40DFFA360]: 00007FF40DFFA3E0 1000844000007FF4 00007FF410008440 0193A07000007FF4 000000000193A070 0000000200000000 0000000000000002 0000006400000000"
0032F538 2018-11-29 16:17:00.962 189606 1471729 "Stack[00007FF40DFFA380]: 0000000000000064 0000000200000000 0000000000000002 0193A4C000000000 000000000193A4C0 69ECA5E300000000 00007FF669ECA5E3 100360F000007FF6"
0032F539 2018-11-29 16:17:00.962 189606 1471729 "Stack[00007FF40DFFA3A0]: 00007FF4100360F0 6487FE8100007FF4 00007FF66487FE81 0193A53000007FF6 000000000193A530 8870000000000000 0000000188700000 0000000100000001"
0032F53A 2018-11-29 16:17:00.962 189606 1471729 "Stack[00007FF40DFFA3C0]: 0000000000000001 100015D000000000 00007FF4100015D0 540022A000007FF4 00007FF6540022A0 0000000A00007FF6 000008000000000A 0000000000000800"
0032F53B 2018-11-29 16:17:00.962 189606 1471729 "Stack[00007FF40DFFA3E0]: 0000000000000000 648899E700000000 00007FF6648899E7 0000000000007FF6 00007FF600000000 0000004E00007FF6 000000800000004E 8884003800000080"
0032F53C 2018-11-29 16:17:00.962 189606 1471729 "Stack[00007FF40DFFA400]: 00007FF488840038 1000163000007FF4 00007FF410001630 5400185000007FF4 00007FF654001850 64889D1400007FF6 00007FF664889D14 0000021000007FF6"
0032F53D 2018-11-29 16:17:00.962 189606 1471729 "ThreadList:
7FF6617E6700 140696174356224 189613: CMPNotifyClosedThread
7FF660FE5700 140696165963520 189614: CSocketBaseThread
7FF6607E4700 140696157570816 189615: MP Connection Thread
7FF65FFE3700 140696149178112 189617: CMemoryUsageReporter
7FF65F7E2700 140696140785408 189619: CBackupHandler
7FF65EFE1700 140696132392704 189621: CGraphProgressHandler
7FF40DFFB700 140686183610112 1471729: BackgroundReleaseBufferThread
7FF42CBF2700 140686699472640 1471733: ProcessSlaveActivity
7FF4529B6700 140687334663936 1471734: CGraphExecutor pool
7FF41D7FA700 140686443652864 1471742: CBroadcaster::CRecv
7FF4539B8700 140687351449344 1471743: CBroadcaster::CSend
7FF4531B7700 140687343056640 1471744: CRowProcessor
7FF4521B5700 140687326271232 1471745: CDistributorBase::cRecvThread
7FF41FFFF700 140686485616384 1471746: CDistributorBase::cSendThread
7FF41F7FE700 140686477223680 1471747: CDistributorBase::cRecvThread
7FF41EFFD700 140686468830976 1471748: CDistributorBase::cSendThread
7FF41DFFB700 140686452045568 1471752: CRowStreamLookAhead
From Jake:
Not sure what the root cause is, but it's crashing during a Smart Join, whilst spilling.
As the RHS of this join is pretty big and it's spilling quite a bit, it may be better to use a standard join rather than a smart join.
If the bug is with Smart Join as it appears, using a standard join will workaround the problem.