PIPE "Failed to create process ..." error provides insufficient debug information

Environment

6.4.2-rc3, 16-way, 4-instances, 1 slave/instance, 4 channels/slave, AWS m4.2xlarge for slave instances.

Description

One of our Thors sometimes experiences the following error.

<Error><source>eclagent</source><code>10003</code><message>System error: 10003: Graph graph13[142], pipethrough[148]: SLAVE #2 [10.53.56.43:20100]: Failed to create process in /var/lib/HPCCSystems/thor1/ for : /bin/bash -c "IFS=, read file REST;echo ${file} | MY_HDRROWCNT=1 HPCC_WUID=W20190507-033846 HPCC_NODE=1 HPCC_NODES=16 /bin/bash /opt/HPCCSystems/scripts/bin/read_ext_file_py.sh"</message></Error>

In the log file the message appears as follows:

005F98A1 2019-05-07 03:44:11.100 13359 9979 "ERROR: PipeWriterThread.3 - activity(ch=0, pipethrough, 148) : Graph graph13[142], pipethrough[148]: Failed to create process in /var/lib/HPCCSystems/thor1/ for : /bin/bash -c "IFS=, read file REST;echo ${file} | MY_HDRROWCNT=1 HPCC_WUID=W20190507-033846 HPCC_NODE=1 HPCC_NODES=16 /bin/bash /opt/HPCCSystems/scripts/bin/read_ext_file_py.sh""

When rerun during working hours, the workunit completes without issues.

None of the HPCC logs, nor the system message logs, provide any clues concerning the cause of the error. It would be great if the failing "errno" and the failing system call were available for review in one of the logs. This issue only occurs in a production system and the failure only occurs at certain times when the system is busy and support staff are not generally available. It seems likely that the underlying issue is a resource issue of some type, however we don't have any way to narrow down the possible root issues. Because this is a production system, our ability to "just try" a change is limited.

Conclusion

None

Activity

Show:

Gavin Halliday June 12, 2019 at 1:13 PM

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Priority

Compatibility

Minor

Fix versions

Affects versions

Created May 7, 2019 at 11:44 AM
Updated June 12, 2019 at 1:13 PM
Resolved June 11, 2019 at 8:44 AM

Flag notifications