Error receiving actinit data for graph and sometimes watchdog lose connectivity with slaves

Environment

Centos on VMWare

Description

HPCC cluster lose often connectivity with node slaves and sometimes we have this issue : System error: 0: Graph graph19[788], SLAVE #4 [10.52.96.48:20100]: Error receiving actinit data for graph: 788

Do you have an idea?

Thank you for your help,

Zahir,

Conclusion

None

Activity

Jacob Cobbett-Smith February 8, 2017 at 2:49 PM

There was an email chain about this, my reply was:

The 1st error reported in the master log in the Zap report is:
> 00001CD3 2017-02-07 09:44:40.546 9599 9599 " - graph(graph19, 788) : Cannot write temp::oid::saltdev::it1, file already exists (missing OVERWRITE attribute?)"

The workunit starts aborting due to that error then.
Subsequently the :
> 00001D6D 2017-02-07 09:44:53.864 9599 6394 "ERROR: 0: /mnt/disk1/jenkins/workspace/LN-Candidate-with-Plugins-6.2.0-1/LN/centos-7.0-x86_64/HPCC-Platform/thorlcr/graph/thgraphmaster.cpp(2142) : Graph graph19[788], SLAVE #3 [10.52.96.190:20100]: Error receiving actinit data for graph: 788"

.. error is seen.

The failure is due to the 1st error, the 2nd error is a red-herring.
The 1st error ["Cannot write temp::oid::saltdev::it1, file already exists (missing OVERWRITE attribute?)"], should have been reported to the workunit, that is a bug.

I can investigate why that is, but given the primary error (missing OVERWRITE?), you should be able to resolve the problem..

So the actual problem is that the correct (more comprehensible) error wasn't reported up to the workunit.
This is similar to .

Duplicate
Pinned fields
Click on the next to a field label to start pinning.

Details

Components

Assignee

Reporter

Priority

Affects versions

Created February 7, 2017 at 1:45 PM
Updated February 8, 2017 at 3:00 PM
Resolved February 8, 2017 at 3:00 PM