Document LIKELY()/UNLIKELY()

Description

Add documentation for the new LIKELY syntax, and include a justification why it can help.

Conclusion

None

Activity

Show:

Jim DeFabia December 14, 2016 at 2:48 PM

@lorraine The PR has merged, so this should be resolved

Richard Chapman November 21, 2016 at 12:29 PM

Looks reasonable to me

Stuart Ort November 15, 2016 at 7:12 PM

can you comment on 's post so can work on this

Shamser Ahmed August 11, 2016 at 9:18 AM

Please can you comment on this explanation of LIKELY/UNLIKEY hint.

Shamser Ahmed August 11, 2016 at 9:17 AM
Edited

LIKELY (<filter-condition>, [<likelihood-probability>])
UNLIKELY (<filter-condition>)

LIKELY/UNLIKELY hint may be wrapped around a filter-condition to indicate to the code-generator the likelihood that the filter-condition will filter the record. LIKELY specifies that the filter condition is likely to match most records. UNLIKELY specifies that very few records are likely to be matched.

Specific probability value may be provided for LIKELY. The probability value is decimal value between 0 and 1. The closer this value is to 1.0 the more likely that the filter-condition is likely to match a record. The closer the value is to 0.0 the less likely the filter-condition is to match records.

HPCC code generator makes use of the likelihood information to produce better code.

At present, the code generator makes use of the LIKELY/UNLIKELY hint together with the count of usage, to work out the cost of spilling and the cost of re-filtering the dataset every-time it is used. Spills will only be generated when the cost of spilling is lower than the cost of re-filtering the dataset every-time it is used.

For example, say there is a large dataset of people – millions of records. A filter is created to retain all records where the age is less than 100. The filter is expected to retain 99.9% of records. This filter result is used by 3 different activities. The cost of spilling the results of the filter is likely to be significantly higher than the simply re-filtering the input dataset every-time it used. LIKELY may be used to share this likelihood information with the code generator so that it may make sensible decisions regarding when to spill:

PeopleYoungerThan100 := AllPeople( LIKELY(age < 100, 0.999) );
// It’s really not worth spilling PeopleYoungerThan100

PeopleOlderThan100 := AllPeople( UNLIKELY(age>100) );
// Probably worth spilling even if PeopleOlderThan100 is used by just a couple of activities

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Components

Assignee

Reporter

Priority

Compatibility

Point

Fix versions

Pull Request URL

Created June 17, 2016 at 3:25 PM
Updated December 14, 2016 at 3:16 PM
Resolved December 14, 2016 at 3:16 PM