Thor may core if calls to Python from multiple jobs
Environment
Description
Conclusion
Activity
Jacob Cobbett-Smith February 11, 2015 at 7:03 PM
@Richard Chapman corrected target to match PR = 5.0.6.
PR is merged, issue should now be closed.
Richard Chapman February 11, 2015 at 5:02 PM
I don't think it's the fact that it's the master, but rather the fact that (it seems) some python modules don't like it if the python system is unloaded then reloaded. In light of that, I have implemented a fix along the same lines that we did for Java (which similarly didn't like being unloaded).
Jacob Cobbett-Smith February 11, 2015 at 2:55 PM
This is the same as the runtime java issue.. wasn't that 'fixed' by ensuring that once loaded, it was never released?
@Richard Chapman - Could the same be done here?
The master + slave load (and unload) a list of plugins per query at the moment.. it gets that list from the workunit.
It looks like Roxie loads all plugins in the configured plugin directory at startup...
and doesn't use the plugins associated per query at all (as far as I can see)
@Richard Chapman - can you confirm that's the way Roxie works and whether Thor should do the same ?
(Looks like hthor loads list of plugins per query (from workunit) like Thor does, but the issue is moot there, since the process is new each time.
Richard Chapman February 10, 2015 at 3:09 PM
It appears that the thormaster is NOT preloading any plugins, meaning that the pyembed plugin is not loaded until the workunit dll that references it is loaded. The python embed plugin doesn't work properly in such cases (the python interpreter has to be initialized on the main thread) and I suspect that is the cause of the problem here.
@Jacob Cobbett-Smith will have to look at what might change in thormaster to address this - I suspect the workaround is to make sure that the python code is only called on hthor or on thor slaves (in this instance adding a nothor around the call to GetNumberOfProminentFeatures would probably achieve the former).
Richard Chapman February 10, 2015 at 2:57 PMEdited
Not sure that it will help but here's a snippet of the stack trace from the master
#84 0x00007fecb0341c61 in pyembed::Python27EmbedScriptContext::callFunction (this=0x7fea58248170) at /home/rchapman/hpcc/plugins/pyembed/pyembed.cpp:1379
#85 0x00007fecb055024c in user1(ICodeContext*, unsigned long long) () from /var/lib/HPCCSystems/queries/mythor/V1728870610_libW20150210-145959.so
#86 0x00007fecb0550837 in cAc3::sendResult(void const*) () from /var/lib/HPCCSystems/queries/mythor/V1728870610_libW20150210-145959.so
#87 0x00007fecb8d30864 in CResultActivityMaster::process (this=0x1b55080) at /home/rchapman/hpcc/thorlcr/activities/result/thresult.cpp:73
#88 0x00007fecb8f8e54e in CMasterActivity::main (this=0x1b55080) at /home/rchapman/hpcc/thorlcr/graph/thgraphmaster.cpp:404
@Jacob Cobbett-Smith Would you expect to be calling plugin code from the master? Does the master load/unload/preload plugins in the same way that the slaves do?
Hi,
I'm facing the following issue which was already raised by a colleague of mine in the forums:
http://hpccsystems.com/bb/viewtopic.php?f=10&t=1553&sid=4f1e8d4a8b04eacecd5198f7692c48f3
I'm re-raising the issue here as there was no update in the forum and the issue is urgent for the task that I'm required to do. Please let me know if i can provide anymore details/logs that can help in resolving this.
Thanks,
Anil