Atlassian uses cookies to improve your browsing experience, perform analytics and research, and conduct advertising. Accept all cookies to indicate that you agree to our use of cookies on your device. Atlassian cookies and tracking notice, (opens new window)
One of our Thors sometimes experiences the following error.
<Error><source>eclagent</source><code>10003</code><message>System error: 10003: Graph graph13[142], pipethrough[148]: SLAVE #2 [10.53.56.43:20100]: Failed to create process in /var/lib/HPCCSystems/thor1/ for : /bin/bash -c "IFS=, read file REST;echo ${file} | MY_HDRROWCNT=1 HPCC_WUID=W20190507-033846 HPCC_NODE=1 HPCC_NODES=16 /bin/bash /opt/HPCCSystems/scripts/bin/read_ext_file_py.sh"</message></Error>
In the log file the message appears as follows:
005F98A1 2019-05-07 03:44:11.100 13359 9979 "ERROR: PipeWriterThread.3 - activity(ch=0, pipethrough, 148) : Graph graph13[142], pipethrough[148]: Failed to create process in /var/lib/HPCCSystems/thor1/ for : /bin/bash -c "IFS=, read file REST;echo ${file} | MY_HDRROWCNT=1 HPCC_WUID=W20190507-033846 HPCC_NODE=1 HPCC_NODES=16 /bin/bash /opt/HPCCSystems/scripts/bin/read_ext_file_py.sh""
When rerun during working hours, the workunit completes without issues.
None of the HPCC logs, nor the system message logs, provide any clues concerning the cause of the error. It would be great if the failing "errno" and the failing system call were available for review in one of the logs. This issue only occurs in a production system and the failure only occurs at certain times when the system is busy and support staff are not generally available. It seems likely that the underlying issue is a resource issue of some type, however we don't have any way to narrow down the possible root issues. Because this is a production system, our ability to "just try" a change is limited.
One of our Thors sometimes experiences the following error.
<Error><source>eclagent</source><code>10003</code><message>System error: 10003: Graph graph13[142], pipethrough[148]: SLAVE #2 [10.53.56.43:20100]: Failed to create process in /var/lib/HPCCSystems/thor1/ for : /bin/bash -c "IFS=, read file REST;echo ${file} | MY_HDRROWCNT=1 HPCC_WUID=W20190507-033846 HPCC_NODE=1 HPCC_NODES=16 /bin/bash /opt/HPCCSystems/scripts/bin/read_ext_file_py.sh"</message></Error>
In the log file the message appears as follows:
005F98A1 2019-05-07 03:44:11.100 13359 9979 "ERROR: PipeWriterThread.3 - activity(ch=0, pipethrough, 148) : Graph graph13[142], pipethrough[148]: Failed to create process in /var/lib/HPCCSystems/thor1/ for : /bin/bash -c "IFS=, read file REST;echo ${file} | MY_HDRROWCNT=1 HPCC_WUID=W20190507-033846 HPCC_NODE=1 HPCC_NODES=16 /bin/bash /opt/HPCCSystems/scripts/bin/read_ext_file_py.sh""
When rerun during working hours, the workunit completes without issues.
None of the HPCC logs, nor the system message logs, provide any clues concerning the cause of the error. It would be great if the failing "errno" and the failing system call were available for review in one of the logs. This issue only occurs in a production system and the failure only occurs at certain times when the system is busy and support staff are not generally available. It seems likely that the underlying issue is a resource issue of some type, however we don't have any way to narrow down the possible root issues. Because this is a production system, our ability to "just try" a change is limited.