Introduce a common syntax for parallel operations

Description

I propose adding the following syntax extensions.

ds' := UNORDERED(ds) - implies the that order of ds is not required

The following attributes available on all activities:

  • PARALLEL[(n)] execute this activity in parallel. The exact meaning may depend on the activity. Optional number of strands.

  • ORDERED(bool) - is the order of any output relied on.

  • UNORDERED - the order of output rows for all outputs cannot be relied on. (Synonym for ORDERED(FALSE))

The following are relevant on a subset of activities:

  • STABLE(bool) - indicates if the order of the input rows are significant.

  • UNSTABLE - equivalent to STABLE(false)

  • ALGORITHM(x) - allow the algorithm to be set independently of the STABLE flag

Note: It is more common to know that the order of the input dataset is not required at the point of use (and fits in better with reusing definitions) - which is why UNORDERED(ds) is useful.

Derived order (e.g., from HPCC-10144) is annotated by adding an unordered attribute to an activity.

Other related changes:
SORT, UNSTABLE implies that the input dataset is unordered.

(+)(a,b[,options]) as a functional form of the append operator.
filter(ds, c1, c2, options) as a functional form of a filter.

Great care will be needed to ensure that the options aren't lost of activities when they are optimized - especially IF and filter.

Conclusion

None

Activity

Show:
Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Components

Assignee

Reporter

Priority

Compatibility

Major

Fix versions

Pull Request URL

Created February 5, 2016 at 9:50 AM
Updated March 1, 2016 at 11:57 AM
Resolved March 1, 2016 at 11:57 AM