Problems with some LOOP and child query activities initializations
Description
Conclusion
Activity
Jacob Cobbett-Smith September 5, 2016 at 10:08 AM
Attaching additional PR
Prev (merged) PR = https://github.com/hpcc-systems/HPCC-Platform/pull/9027
Gavin Halliday September 1, 2016 at 1:30 PM
There is a #if in ECL - but isn't often required since any constants in the query tend to be folded anyway.
James Wiltshire August 15, 2016 at 9:03 PM
Thanks so much, Jake.
We're planning to move to a 6.x platform, won't be soon.
No problem, in any case - we've taken a different broader approach that gets around the issue, once we identified the cause.
Side questions -
What's the best analog to #ifdef in ECL? Is there one?
Jacob Cobbett-Smith August 15, 2016 at 10:50 AMEdited
Thanks for the example, with that I can reproduce it and confirm that it is related to the dynamic filename generated for getIndexFileName in the keyedjoin.
I'm not sure what the correct fix is yet, but it will require a platform change and the fix is only likely to be pushed to the current build major.minor version, which is 6.0 at the moment.
You can workaround the dynamic name in keyedjoin issue by defining both alternate keyedjoins, each pointing to one of the conditional indexes and then use an IF to pick which keyed join.
I've adjust you're example to demonstrate:
import Std;
filename := '~someflatfile';
keyname := IF(Std.System.thorlib.Daliserver() = '192.168.1.108', '~somekey' , '~someotherkey'):
lVal := RECORD
INTEGER id := 0;
STRING val := '';
END;
lValKey := RECORD
lVal;
Unsigned8 fpos {virtual(fileposition)} ;
END;
dsVal := DATASET(10000, TRANSFORM(lVal, SELF.id := COUNTER; SELF.val := 'val=' + COUNTER;), DISTRIBUTED);
saveit := OUTPUT(SORT(DISTRIBUTE(dsVal, id), id, LOCAL), , filename, OVERWRITE);
dsKey := DATASET(filename, lValKey, THOR);
//valKey := INDEX(DATASET([], lValKey), {id}, {val, fpos}, keyname);
valKey1 := INDEX(DATASET([], lValKey), {id}, {val, fpos}, '~somekey');
valKey2 := INDEX(DATASET([], lValKey), {id}, {val, fpos}, '~someotherkey');
buildit := BUILDINDEX(INDEX(dsKey,{id}, {val, fpos}, keyname), SORTED, OVERWRITE);
dsQry := DATASET([{100, ''}], lVal);
lVal loopBody(dataset(lVal) inVal, unsigned4 c) := FUNCTION
j1 := JOIN(inVal, valKey1, LEFT.id = RIGHT.id, TRANSFORM(lVal, SELF := RIGHT));
j2 := JOIN(inVal, valKey2, LEFT.id = RIGHT.id, TRANSFORM(lVal, SELF := RIGHT));
RETURN IF(Std.System.thorlib.Daliserver() = '192.168.1.108', j1, j2);
END;
dsLoop := LOOP(dsQry, 1, loopBody(rows(left), counter));
SEQUENTIAL(
saveit, buildit,
OUTPUT(dsLoop, named('dsLoop'));
);
So here, the condition in the loop is on 2 keyedjoin expression 'j1' and 'j2'. Each use a different index (valKey1 and valKey2) that point to their respective index names.
James Wiltshire August 11, 2016 at 11:42 PMEdited
I've isolated this further.
The offending piece, when LOOP is involved:
Later resolving of Std.System.thorlib.Daliserver().
Our code needs to switch between scenarios, based on which Dali is running.
This has worked fine for months, generally.
But with LOOP, this generates the odd "Failed to receive reply from thor" error.
Which kinda makes sense.
To reproduce this:
---------------------------------------------
filename := '<flatfilename>';
keyname :=
IF(Std.System.thorlib.Daliserver() = 'XX.XX.XX.XX',
'<keyname>'
, '<otherkeyname>';
lVal := RECORD
INTEGER id := ;
STRING val := '';
END;
lValKey := RECORD
lVal;
Unsigned8 fpos{virtual(fileposition)};
END;
dsVal := DATASET(10000, TRANSFORM(lVal,
SELF.id := COUNTER;
SELF.val := 'val=' + COUNTER;
),
DISTRIBUTED);
saveit := OUTPUT(SORT(DISTRIBUTE(dsVal, id), id, LOCAL), , filename, OVERWRITE);
dsKye := DATASET(filename, lValKey, THOR);
valKey := INDEX(DATASET([], lValKey), {id}, {val, fpos}, keyname);
buildit := BUILDINDEX(INDEX(dsKey,{id},{val, fpos}, keyname), SORTED, OVERWRITE);
dsQry := DATASET([{100, ''}], lVal);
lVal loopBody(dataset(lVal) inVal, unsigned4 c) := function
iter := JOIN(inVal, valKey,
LEFT.id = RIGHT.id,
TRANSFORM(lVal, SELF := RIGHT)
);
END;
dsLoop := LOOP(dsQry, 1, loopBody(rows(left), counter));
SEQUENTIAL(
saveit, buildit,
OUTPUT(dsLoop, named('dsLoop')
);
------------------------------------------------
Run as above on Thor, the script gives that "Failed to receive reply from thor" error.
If the keyname is assigned without the conditional IF, the script runs fine.
Open to other suggestions for conditional STRING assignment like this, based on "current environment".
(We're thinking something like C's #ifdef - is there a good analogue in ECL?)
Also, as in my previous comments...
The above script, as is, runs fine on Hthor.
And the above script, minus the LOOP, runs fine on Thor or Hthor.
Loop and child query activities that need initializing with meta info, do so by serializing that info from the master.
There was an assumption that that info. was not dynamic apart from in the case of some disk based input activities.
Even then, the initialization process was not well defined.
In the case seen here a keyed join had a dynamic filename and was dependent on onStart being called. It hand't and it caused the thor master to crash.
In other cases found, the activity could be initialized once but not reinitialized.
Original description of problem: