-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Range Predicates Unable to Avoid Filter Computation Due to Null Column Values #14694
Comments
I don't fully follow. To solve |
FWIW the data structure underlying the range index allows passing in a RoaringBitmap representing an existing filter, so the range predicate is implicitly intersected with the filter, skipping over any part of the index not in the filter. This could only help here. |
I lie, it turns out that intersection pushdown was never implemented for |
I think what we actually need to do here is to invest some time to find a proper solution that treats null values natively. They should be understood by dictionaries and min/max values |
Say we have this column:
event_timestamp
, which may have null values. These values are expected to be in "mostly" non-decreasing order over time.If we have queries with predicates such as
event_timestamp IS NOT NULL AND event_timestamp BETWEEN x and y
, then because the default value can practically only be an extremal value (0 or MAX_VALUE), we end up unnecessarily computing the range predicate for segments with non-null values in the range[x, y]
.At high qps, this starts becoming a bottleneck even with a range index. For queries where the other filters are selective, I am planning to try and disable the range index altogether for one of our tables so we could rely on the Scan based filter which only runs for the filtered out docs.
Another related optimization is to early terminate the
AndFilterOperator
if any of the other filters have turned out empty. To do this,BlockDocIdSet
can add a new method which can return the cardinality of the underlying docs. This may not always be possible, so the method could also return -1 indicating that cardinality is unknown at the moment.Edit: I may be missing something since I didn't get a chance to take a deeper look into this.
pinot/pinot-core/src/main/java/org/apache/pinot/core/operator/filter/AndFilterOperator.java
Lines 52 to 59 in 1ed25c0
The text was updated successfully, but these errors were encountered: