Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Range Predicates Unable to Avoid Filter Computation Due to Null Column Values #14694

Open
ankitsultana opened this issue Dec 21, 2024 · 5 comments
Labels

Comments

@ankitsultana
Copy link
Contributor

ankitsultana commented Dec 21, 2024

Say we have this column: event_timestamp, which may have null values. These values are expected to be in "mostly" non-decreasing order over time.

If we have queries with predicates such as event_timestamp IS NOT NULL AND event_timestamp BETWEEN x and y, then because the default value can practically only be an extremal value (0 or MAX_VALUE), we end up unnecessarily computing the range predicate for segments with non-null values in the range [x, y].

At high qps, this starts becoming a bottleneck even with a range index. For queries where the other filters are selective, I am planning to try and disable the range index altogether for one of our tables so we could rely on the Scan based filter which only runs for the filtered out docs.

Another related optimization is to early terminate the AndFilterOperator if any of the other filters have turned out empty. To do this, BlockDocIdSet can add a new method which can return the cardinality of the underlying docs. This may not always be possible, so the method could also return -1 indicating that cardinality is unknown at the moment.

Edit: I may be missing something since I didn't get a chance to take a deeper look into this.

protected BlockDocIdSet getTrues() {
Tracing.activeRecording().setNumChildren(_filterOperators.size());
List<BlockDocIdSet> blockDocIdSets = new ArrayList<>(_filterOperators.size());
for (BaseFilterOperator filterOperator : _filterOperators) {
blockDocIdSets.add(filterOperator.getTrues());
}
return new AndDocIdSet(blockDocIdSets, _queryOptions);
}

@Jackie-Jiang
Copy link
Contributor

I don't fully follow. To solve event_timestamp IS NOT NULL AND event_timestamp BETWEEN x and y, we do need to evaluate both side right? Or are all value NULL here?

@ankitsultana
Copy link
Contributor Author

Yeah I realized this needs more context.

Problem

Consider that we have a Realtime table with a single Kafka partition, that has event_timestamp values in increasing order. But event_timestamp can also be null occasionally. Visually the segments will look as follows:

Figure-1:

range of event_timestamp values per segment for only the non-null values
+------------------+------------------+------------------+------------------+
| S0: [0, 1000]    | S1: [1000, 2000] | S2: [2000, 3000] | S3: [3000, 4000] | 
+------------------+------------------+------------------+------------------+

Figure-2:

range of event_timestamp values per segment when you also consider the default null value (0 in this case)
+------------------+------------------+------------------+------------------+
| S0: [0, 1000]    | S1: [0, 2000]    | S2: [0, 3000]    | S3: [0, 4000]    | 
+------------------+------------------+------------------+------------------+

From a customer perspective, if they have range filters on the event_timestamp value (say event_timestamp BETWEEN 2500 AND 2800), they usually expect that they will only be evaluating the segments whose min/max values have overlap with the range predicate in the query. And technically this is what happens today, but what is easily overlooked is that when you can have nulls in the column, then the default null values end up becoming the min or max of almost every segment, as can be seen in Figure-2.

This means that segment pruning will no longer work and the range predicate will have to be evaluated for every segment. Even with a range index, the cost of evaluating these predicates ends up dominating the CPU of quite a few workloads I have seen internally.

image

Factors Affecting Prominence of this Issue

  1. When you have a large number of segments
  2. When segment pruning via bloom filters, etc. is not sufficient
  3. etc. etc.

Optimizing When Other Selective and/or Efficient Predicates

While segment pruning is not easy to do (see section below), we can still try to limit the number of times the range predicate is evaluated per-query when the other filters are quite selective. Example: uuid_col1 = uuid1 AND uuid_col2 = uuid2 ... AND event_timestamp BETWEEN x AND y.

To achieve this, we should short circuit the AndFilter computation when the currently evaluated filters have already let to 0 matching rows. This is something we should do regardless of this issue since it can also help many other cases. (e.g. multiple text_match predicates in a query)

Possible Long Term Fix?

If we could set the expectation on the users that they should add event_timestamp IS NOT NULL to their query alongside their range predicates to achieve better segment pruning, then we might have some possible solutions to implement. But most of the options I can think of will look awkward.

@richardstartin
Copy link
Member

FWIW the data structure underlying the range index allows passing in a RoaringBitmap representing an existing filter, so the range predicate is implicitly intersected with the filter, skipping over any part of the index not in the filter. This could only help here.

@richardstartin
Copy link
Member

I lie, it turns out that intersection pushdown was never implemented for between but I implemented it here just now, it should be available soon with an upgrade.

@gortiz
Copy link
Contributor

gortiz commented Jan 7, 2025

I think what we actually need to do here is to invest some time to find a proper solution that treats null values natively. They should be understood by dictionaries and min/max values

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants