Unexpected JOIN LIMIT behavior

Description

We're seeing some unexpected behavior with the way default LIMITs are applied to keyed JOINs. The specific case that we had was with a JOIN, KEEP(10000) similar to j4 that we added a non-keyed condition to, that then failed, similar to j2. This code seems to behave the same between 7.2 and 6.4 so nothing new just the first time I've ran into an error because of it. 

layout := { unsigned proxid, unsigned rcid, }; ds := dataset([{3,1},{3,2}], layout); ds1 := dataset([{3,1}], layout); ds10k := normalize(ds1, 10001, transform(layout, self.rcid := counter, self.proxid := left.proxid)); keyname := '~temp::bipv2::dustin::limit_keep'; k := index(ds10k, {proxid}, {ds10k}, keyname); inDs := nofold(dataset([{3,0}], layout)); j := join(inDs, k, left.proxid = right.proxid and (left.rcid = right.rcid), transform(layout, self := right), keep(10000), limit(10000)); j2 := join(inDs, k, left.proxid = right.proxid and (left.rcid = right.rcid), transform(layout, self := right), keep(10000)); j3 := join(inDs, k, left.proxid = right.proxid, transform(layout, self := right), keep(10000), limit(10000)); j4 := join(inDs, k, left.proxid = right.proxid, transform(layout, self := right), keep(10000)); j5 := join(inDs, k, left.proxid = right.proxid, transform(layout, self := right)); fnOut := '~temp::bipv2::dustin::limit_keep_out'; sequential( // buildindex(k, overwrite); // output(j); // error "JOIN limit exceeded(10000)" // output(j2); // error "more than 10000 match candidates in keyed join" // output(j3); // error "JOIN limit exceeded(10000)" // output(j4); // works output(j5); // error "more than 10000 match candidates in keyed join" );

Conclusion

None

Activity

Show:

Gavin Halliday June 11, 2019 at 10:40 AM

please see the discussion above.  Please can we update the documentation for when the implicit limit is added so that it is clear.

Richard Chapman June 10, 2019 at 1:56 PM

I agree, this sounds like a documentation issue.

Gavin Halliday June 10, 2019 at 12:36 PM

The rules seem to be that the implicit limit is added if:

there is no limit, and no atmost, and it is not a left only join and
(these is no keep limit, or the join has a postfilter).

I think the idea was that a keep with a post filter could still end up reading all the matches - which could be painful.

I am inclined to leave it as it is, but make it clear in the documentation.  Alternatively it could be changes so that keep with a postfilter no longer adds an implicit limit.  Any votes?  ?

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Components

Assignee

Reporter

Priority

Compatibility

Minor

Fix versions

Due date

Created May 15, 2019 at 6:43 PM
Updated June 18, 2019 at 8:17 AM
Resolved June 18, 2019 at 8:17 AM

Flag notifications