Unexplained Process Termination, bare metal, 8.12.0, SORT, consistent?

Environment

Bare metal 8.12.0-1

Description

FIDO, 8.12.0-1, consistent failures that started back in February when 8.12.0-1 was deployed (8.10.10 previously).  This does include a global SORT for a index build.

I have some suspicions it could be data related, because day-to-day groupings present with the same IP address reporting the error (of late, for many days, consistently 10.194.83.18).

0012B226 USR 2023-04-13 00:15:25.845 1637012 198236 "================================================"
0012B227 USR 2023-04-13 00:15:25.845 1637012 198236 "Program: 10.194.83.18:/mnt/disk1/HPCCSystems/bin/thorslave_lcr"
0012B228 USR 2023-04-13 00:15:25.845 1637012 198236 "Signal: 6 Aborted"
0012B229 USR 2023-04-13 00:15:25.845 1637012 198236 "Fault IP: 00007FBED0437387" 0012B22A USR 2023-04-13 00:15:25.845 1637012 198236 "Accessing: 000001F40018FA94" 0012B22B PRG 2023-04-13 00:15:25.845 1637012 198236 "Backtrace:"

 

In W20230214-230300, internal_8.10.10-1, and a spot-checked group of seven I restored between Jan 25 and Feb 14, were successful.

In W20230215-230318, internal_8.12.0-1, and apparently every similar WU after that, failed.  That includes dozens of WUs, as recent as W20230413-000415.

 

Conclusion

None

Activity

Show:

Gavin Halliday April 18, 2023 at 1:53 PM

fix released. It is data related, and the change is likely to do with the way very unusual unicode characters are converted to strings.

Gavin Halliday April 18, 2023 at 10:36 AM

Some digging later...

The record that it crashes on has an incoming utf8 field with the value:

0x02 0x00 0x00 0x00 0xe3 0x85 0xa4 0xe3 0x85 0xa4

I will see if I can write a test case using that value.

Gavin Halliday April 18, 2023 at 9:32 AM

Thanks . I edited the disassembly to remove the exception code. Which suggests it s failing on the destruction of vTR7 (since 3 more rtlDataAttr are destroyed afterwards). There is nothing particularly different about that variable, or the call that initialises it.

Mark Kelly April 17, 2023 at 7:00 PM

rax            0x0      0
rbx            0x7f71d0003b30   140126797708080
rcx            0xffffffffffffffff       -1
rdx            0x6      6
rsi            0x1b761c 1799708
rdi            0x6e861  452705
rbp            0x7f71e16ec510   0x7f71e16ec510 <CStreamFileOwner::prefetchRow()>
rsp            0x7f70aa8a8410   0x7f70aa8a8410
r8             0x0      0
r9             0x27     39
r10            0x8      8
r11            0x202    514
r12            0x7f71c4057c22   140126596725794
r13            0x7f71d0003bf8   140126797708280
r14            0x1829ef0        25337584
r15            0x7f71c1be977a   140126558525306
rip            0x7f71e16efa82   0x7f71e16efa82 <CDiskReadSlaveActivity::CDiskPartHandler::nextRow()+146>
eflags         0x202    [ IF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0

Mark Kelly April 17, 2023 at 7:00 PM
Edited

I see 35 calls to rtlDataAttr::~rtlDataAttr() starting at the address:

(gdb) x/200i 0x00007f71c1be94c8
   0x7f71c1be94c8:      lea    -0x60(%rbp),%rax
   0x7f71c1be94cc:      mov    %rax,%rdi
   0x7f71c1be94cf:      callq  0x7f71c1ac8660 <_ZN11rtlDataAttrD1Ev@plt>
   0x7f71c1be94d4:      lea    -0x50(%rbp),%rax
   0x7f71c1be94d8:      mov    %rax,%rdi
   0x7f71c1be94db:      callq  0x7f71c1ac8660 <_ZN11rtlDataAttrD1Ev@plt>
   0x7f71c1be94e0:      lea    -0x40(%rbp),%rax
   0x7f71c1be94e4:      mov    %rax,%rdi
   0x7f71c1be94e7:      callq  0x7f71c1ac8660 <_ZN11rtlDataAttrD1Ev@plt>
   0x7f71c1be94ec:      mov    %ebx,%eax
   0x7f71c1be94ee:      jmpq   0x7f71c1be9768
.......
   0x7f71c1be9768:      add    $0x878,%rsp
   0x7f71c1be976f:      pop    %rbx
   0x7f71c1be9770:      pop    %r12
   0x7f71c1be9772:      pop    %r13
   0x7f71c1be9774:      pop    %r14
   0x7f71c1be9776:      pop    %r15
   0x7f71c1be9778:      pop    %rbp
   0x7f71c1be9779:      retq
   

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Components

Assignee

Reporter

Priority

Pull Request URL

Affects versions

Created April 13, 2023 at 3:40 PM
Updated April 19, 2023 at 11:55 AM
Resolved April 19, 2023 at 11:55 AM