fileservices plugin is not threadsafe

Description

The despray.ecl regression test caused a core that was not immediately reproducible in roxie.

Here's the excerpt from the roxie logs.

00000610 2016-06-01 17:08:48.363 6942 10242 "Loading dll (libW20160601-170839.so) from location /var/lib/HPCCSystems/queries/myroxie/libW20160601-170839.so"
00000611 2016-06-01 17:08:48.404 6942 10242 "Create aggregate strand processor 0"
00000612 2016-06-01 17:08:48.404 6942 10242 "Create aggregate strand processor 0"
00000613 2016-06-01 17:08:48.404 6942 10242 "Create aggregate strand processor 0"
00000614 2016-06-01 17:08:48.404 6942 10242 "Create aggregate strand processor 0"
00000615 2016-06-01 17:08:48.404 6942 10242 "Create aggregate strand processor 0"
00000616 2016-06-01 17:08:48.404 6942 10242 "Create aggregate strand processor 0"
00000617 2016-06-01 17:08:48.404 6942 10242 "Create aggregate strand processor 0"
00000618 2016-06-01 17:08:48.404 6942 10242 "Create aggregate strand processor 0"
00000619 2016-06-01 17:08:48.404 6942 10242 "Create aggregate strand processor 0"
0000061A 2016-06-01 17:08:48.404 6942 10242 "Create aggregate strand processor 0"
0000061B 2016-06-01 17:08:48.404 6942 10242 "Create aggregate strand processor 0"
0000061C 2016-06-01 17:08:48.404 6942 10242 "Create aggregate strand processor 0"
0000061D 2016-06-01 17:08:49.104 6942 7016 "[W20160601-170839] Unload received for channel 1"
0000061E 2016-06-01 17:08:55.356 6942 7105 "roxie: Dequeued workunit request 'W20160601-170851'"
0000061F 2016-06-01 17:08:55.411 6942 10242 "Loading dll (libW20160601-170851.so) from location /var/lib/HPCCSystems/queries/myroxie/libW20160601-170851.so"
00000620 2016-06-01 17:08:55.616 6942 7091 "[W20160601-170851] Unload received for channel 1"
00000621 2016-06-01 17:09:03.354 6942 7105 "roxie: Dequeued workunit request 'W20160601-170857'"
00000622 2016-06-01 17:09:03.408 6942 10242 "Loading dll (libW20160601-170857.so) from location /var/lib/HPCCSystems/queries/myroxie/libW20160601-170857.so"
00000623 2016-06-01 17:09:03.453 6942 31855 "Despray: ~::persons"
00000624 2016-06-01 17:09:03.453 6942 31856 "Despray: ~::persons"
00000625 2016-06-01 17:09:03.454 6942 31859 "Despray: ~::persons"
00000627 2016-06-01 17:09:03.454 6942 31857 "Despray: ~::persons"
00000626 2016-06-01 17:09:03.454 6942 31858 "Despray: ~::persons"
00000628 2016-06-01 17:09:03.456 6942 31860 "Despray: ~::persons"
00000629 2016-06-01 17:09:03.458 6942 31862 "Despray: ~::persons"
0000062A 2016-06-01 17:09:03.460 6942 31861 "Despray: ~::persons"
0000062B 2016-06-01 17:09:03.461 6942 31863 "Despray: ~::persons"
0000062C 2016-06-01 17:09:03.465 6942 31855 "fileservices using esp URL: http://.:8010/FileSpray"
0000062D 2016-06-01 17:09:03.481 6942 31860 "================================================"
0000062E 2016-06-01 17:09:03.481 6942 31860 "Signal: 11 Segmentation fault"
0000062F 2016-06-01 17:09:03.481 6942 31860 "Fault IP: 00007F057ED2F6FE"
00000630 2016-06-01 17:09:03.481 6942 31860 "Accessing: 0000000000000018"
00000631 2016-06-01 17:09:03.481 6942 31860 "Registers:"
00000002 2016-06-01 17:10:09.197 31946 31946 "Roxie restarting: restarts = 1 build = internal_6.0.0-1"
00000003 2016-06-01 17:10:09.197 31946 31946 "RoxieMemMgr: Setting memory limit to 1073741824 bytes (4096 pages)"
00000004 2016-06-01 17:10:09.198 31946 31946 "Transparent huge pages used for roxiemem heap"
00000005 2016-06-01 17:10:09.198 31946 31946 "Memory released to OS in 8192k blocks"
00000006 2016-06-01 17:10:09.198 31946 31946 "RoxieMemMgr: 4096 Pages successfully allocated for the pool - memsize=1073741824 base=0x7fc518600000 alignment=262144 bitmapSize=128"
00000007 2016-06-01 17:10:09.199 31946 31946 "Current Hardware Info: CPUs=4, speed=2041 MHz, Mem=5842 MB , primDisk=0 GB, primFree=0 GB, secDisk=0 GB, secFree=0 GB, NIC=0"
00000008 2016-06-01 17:10:09.200 31946 31946 "Process affinity is set to use core(s) 0,1,2,3"
00000009 2016-06-01 17:10:09.203 31946 31951 "Background copy thread 0x7fc5598bf900 starting"
0000000A 2016-06-01 17:10:09.203 31946 31952 "HandleCloser thread 0x7fc5598bf900 starting"
0000000B 2016-06-01 17:10:09.205 31946 31946 "Loaded DLL /opt/HPCCSystems/plugins/libredis.so"
0000000C 2016-06-01 17:10:09.205 31946 31946 "Current reported version is redis plugin 1.0.0"
0000000D 2016-06-01 17:10:09.209 31946 31946 "Loaded DLL /opt/HPCCSystems/plugins/libv8embed.so"
0000000E 2016-06-01 17:10:09.210 31946 31946 "Current reported version is V8 JavaScript Embed Helper 1.0.0"
0000000F 2016-06-01 17:10:09.210 31946 31946 "Compatible version V8 JavaScript Embed Helper 1.0.0"
00000010 2016-06-01 17:10:09.211 31946 31946 "Loaded DLL /opt/HPCCSystems/plugins/libbridgerscorelib.so"
00000011 2016-06-01 17:10:09.211 31946 31946 "Current reported version is BRIDGERSCORELIB 1.0.0"
00000012 2016-06-01 17:10:09.211 31946 31946 "Compatible version BRIDGERSCORELIB 1.0.0"
00000013 2016-06-01 17:10:09.211 31946 31946 "Loaded DLL /opt/HPCCSystems/plugins/libcmslib.so"
00000014 2016-06-01 17:10:09.212 31946 31946 "Current reported version is CMSLIB 1.0.04"
00000015 2016-06-01 17:10:09.212 31946 31946 "Compatible version CMSLIB 1.0.04"
00000016 2016-06-01 17:10:09.254 31946 31946 "Loaded DLL /opt/HPCCSystems/plugins/libpyembed.so"

the core file generated was 0 size as well.

Conclusion

None

Activity

Show:

Richard Chapman June 2, 2016 at 3:32 PM

I was able to provoke a crash in debugger (though not very reproducible):

I think the problem is that fileservices uses ws_fs which is not threadsafe.

getHttpClientContext could be made threadsafe easily enough, but not sure how much of the rest of ws_fs is threadsafe.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Components

Assignee

Reporter

Priority

Compatibility

Point

Fix versions

Created June 2, 2016 at 2:54 PM
Updated June 9, 2016 at 10:59 AM
Resolved June 9, 2016 at 10:59 AM