-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Filter row group using runtime filter before prepare next row group in FileReader #54868
Conversation
Signed-off-by: trueeyu <lxhhust350@qq.com>
RETURN_IF_ERROR(cur_row_group->prepare()); | ||
} | ||
break; | ||
} while (true); | ||
|
||
return Status::OK(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
An infinite loop may occur when _row_group_size
is 0 or _cur_row_group_idx
exceeds _row_group_size
.
You can modify the code like this:
Status FileReader::get_next(ChunkPtr* chunk) {
while (_cur_row_group_idx < _row_group_size) { // Add condition check here
const auto& cur_row_group = _row_group_readers[_cur_row_group_idx];
auto ret = _update_rf_and_filter_group(cur_row_group);
if (ret.ok() && ret.value()) {
_group_reader_param.stats->parquet_filtered_row_groups += 1;
_row_group_readers[_cur_row_group_idx] = nullptr; // Ensure the current reader is nullified after filtering
_cur_row_group_idx++;
continue;
} else if (ret.status().is_end_of_file()) {
_group_reader_param.stats->parquet_filtered_row_groups +=
(_row_group_size - _cur_row_group_idx);
_row_group_readers.assign(_row_group_readers.size(), nullptr);
_cur_row_group_idx = _row_group_size;
break;
} else {
// do nothing, ignore the error code
}
RETURN_IF_ERROR(cur_row_group->prepare());
break;
}
return Status::OK();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not one good suggestion
@@ -321,6 +321,8 @@ struct HdfsScannerContext { | |||
|
|||
int64_t connector_max_split_size = 0; | |||
|
|||
OlapRuntimeScanRangePruner* rf_scan_range_pruner = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we definitely have to remove "Olap"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, i will remove it next pr.
Signed-off-by: trueeyu <lxhhust350@qq.com>
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 29 / 34 (85.29%) file detail
|
Why I'm doing:
If one
FileReader
scan multi row groups, if runtime filter is updated, theFilterReader
will use the out of date runtime filter to filter next row group, so the filtering effect is not good.What I'm doing:
Try to filter row group using the newest runtime filter before prepare next row group.
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: