Problems with some LOOP and child query activities initializations

Description

Loop and child query activities that need initializing with meta info, do so by serializing that info from the master.
There was an assumption that that info. was not dynamic apart from in the case of some disk based input activities.
Even then, the initialization process was not well defined.

In the case seen here a keyed join had a dynamic filename and was dependent on onStart being called. It hand't and it caused the thor master to crash.

In other cases found, the activity could be initialized once but not reinitialized.

Original description of problem:

We have a complex LOOP script that runs successfully on Roxie and HThor, but is failing on Thor with the following error:
"Failed to receive reply from thor"
This gives the IP address of the Thor Master.

We've been unable to generate a concise, simple script to reproduce this error for troubleshooting.
However, we've heard that LOOP had problems like this in early HPCC versions, on Thor, even when the same logic worked fine on Roxie and Hthor.

Conclusion

None

Activity

Show:

Jacob Cobbett-Smith September 5, 2016 at 10:08 AM

Attaching additional PR
Prev (merged) PR = https://github.com/hpcc-systems/HPCC-Platform/pull/9027

Gavin Halliday September 1, 2016 at 1:30 PM

There is a #if in ECL - but isn't often required since any constants in the query tend to be folded anyway.

James Wiltshire August 15, 2016 at 9:03 PM

Thanks so much, Jake.
We're planning to move to a 6.x platform, won't be soon.
No problem, in any case - we've taken a different broader approach that gets around the issue, once we identified the cause.

Side questions -
What's the best analog to #ifdef in ECL? Is there one?

Jacob Cobbett-Smith August 15, 2016 at 10:50 AM
Edited

Thanks for the example, with that I can reproduce it and confirm that it is related to the dynamic filename generated for getIndexFileName in the keyedjoin.
I'm not sure what the correct fix is yet, but it will require a platform change and the fix is only likely to be pushed to the current build major.minor version, which is 6.0 at the moment.

You can workaround the dynamic name in keyedjoin issue by defining both alternate keyedjoins, each pointing to one of the conditional indexes and then use an IF to pick which keyed join.
I've adjust you're example to demonstrate:

import Std; filename := '~someflatfile'; keyname := IF(Std.System.thorlib.Daliserver() = '192.168.1.108', '~somekey' , '~someotherkey'): lVal := RECORD INTEGER id := 0; STRING val := ''; END; lValKey := RECORD lVal; Unsigned8 fpos {virtual(fileposition)} ; END; dsVal := DATASET(10000, TRANSFORM(lVal, SELF.id := COUNTER; SELF.val := 'val=' + COUNTER;), DISTRIBUTED); saveit := OUTPUT(SORT(DISTRIBUTE(dsVal, id), id, LOCAL), , filename, OVERWRITE); dsKey := DATASET(filename, lValKey, THOR); //valKey := INDEX(DATASET([], lValKey), {id}, {val, fpos}, keyname); valKey1 := INDEX(DATASET([], lValKey), {id}, {val, fpos}, '~somekey'); valKey2 := INDEX(DATASET([], lValKey), {id}, {val, fpos}, '~someotherkey'); buildit := BUILDINDEX(INDEX(dsKey,{id}, {val, fpos}, keyname), SORTED, OVERWRITE); dsQry := DATASET([{100, ''}], lVal); lVal loopBody(dataset(lVal) inVal, unsigned4 c) := FUNCTION j1 := JOIN(inVal, valKey1, LEFT.id = RIGHT.id, TRANSFORM(lVal, SELF := RIGHT)); j2 := JOIN(inVal, valKey2, LEFT.id = RIGHT.id, TRANSFORM(lVal, SELF := RIGHT)); RETURN IF(Std.System.thorlib.Daliserver() = '192.168.1.108', j1, j2); END; dsLoop := LOOP(dsQry, 1, loopBody(rows(left), counter)); SEQUENTIAL( saveit, buildit, OUTPUT(dsLoop, named('dsLoop')); );

So here, the condition in the loop is on 2 keyedjoin expression 'j1' and 'j2'. Each use a different index (valKey1 and valKey2) that point to their respective index names.

James Wiltshire August 11, 2016 at 11:42 PM
Edited

I've isolated this further.
The offending piece, when LOOP is involved:
Later resolving of Std.System.thorlib.Daliserver().

Our code needs to switch between scenarios, based on which Dali is running.
This has worked fine for months, generally.
But with LOOP, this generates the odd "Failed to receive reply from thor" error.
Which kinda makes sense.

To reproduce this:
---------------------------------------------
filename := '<flatfilename>';
keyname :=
IF(Std.System.thorlib.Daliserver() = 'XX.XX.XX.XX',
'<keyname>'
, '<otherkeyname>';
lVal := RECORD
INTEGER id := ;
STRING val := '';
END;
lValKey := RECORD
lVal;
Unsigned8 fpos{virtual(fileposition)};
END;

dsVal := DATASET(10000, TRANSFORM(lVal,
SELF.id := COUNTER;
SELF.val := 'val=' + COUNTER;
),
DISTRIBUTED);

saveit := OUTPUT(SORT(DISTRIBUTE(dsVal, id), id, LOCAL), , filename, OVERWRITE);
dsKye := DATASET(filename, lValKey, THOR);
valKey := INDEX(DATASET([], lValKey), {id}, {val, fpos}, keyname);
buildit := BUILDINDEX(INDEX(dsKey,{id},{val, fpos}, keyname), SORTED, OVERWRITE);

dsQry := DATASET([{100, ''}], lVal);

lVal loopBody(dataset(lVal) inVal, unsigned4 c) := function
iter := JOIN(inVal, valKey,
LEFT.id = RIGHT.id,
TRANSFORM(lVal, SELF := RIGHT)
);
END;
dsLoop := LOOP(dsQry, 1, loopBody(rows(left), counter));
SEQUENTIAL(
saveit, buildit,
OUTPUT(dsLoop, named('dsLoop')
);
------------------------------------------------
Run as above on Thor, the script gives that "Failed to receive reply from thor" error.
If the keyname is assigned without the conditional IF, the script runs fine.

Open to other suggestions for conditional STRING assignment like this, based on "current environment".
(We're thinking something like C's #ifdef - is there a good analogue in ECL?)

Also, as in my previous comments...
The above script, as is, runs fine on Hthor.
And the above script, minus the LOOP, runs fine on Thor or Hthor.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Components

Assignee

Reporter

Priority

Compatibility

Point

Fix versions

Affects versions

Created August 5, 2016 at 9:56 PM
Updated September 9, 2016 at 9:14 AM
Resolved September 9, 2016 at 9:14 AM

Flag notifications