Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: reproducer for the empty snapshot file issue #25835

Merged
merged 2 commits into from
Jan 15, 2025

Conversation

praveen-influx
Copy link
Contributor

No issue yet, this is a reproducer test for empty snapshot files being generated under some very strict settings,

  • flush interval: 1s
  • snapshot size: 1
  • force snapshotting mem threshold: 2000 bytes

The good news is, after investigating this code there is no data loss or any possibility of inconsistent query results. However you see these snapshot files with 0 rows in them. Loading them (during restart) does not cause any panics. It loads the file and ignores it as it does not have any pointers to parquet files. It would be good to fix this nevertheless.

@praveen-influx praveen-influx force-pushed the praveen/reproduce-empty-snapshot-issue branch from 3904be7 to 3f53df1 Compare January 14, 2025 19:48
@praveen-influx praveen-influx force-pushed the praveen/reproduce-empty-snapshot-issue branch 2 times, most recently from 07cdb30 to 3e904dc Compare January 15, 2025 11:19
@praveen-influx praveen-influx marked this pull request as ready for review January 15, 2025 11:20
@praveen-influx praveen-influx force-pushed the praveen/reproduce-empty-snapshot-issue branch from 3e904dc to 3f52610 Compare January 15, 2025 12:04
@@ -356,6 +356,13 @@ impl Catalog {
pub fn inner(&self) -> &RwLock<InnerCatalog> {
&self.inner
}

pub fn table_id(&self, db_id: &DbId, table_name: Arc<str>) -> Option<TableId> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really used in the test or the code change, but it was useful to get the parquet files (as get_files method needed db_id and table_id) when debugging. I can remove it, just left it there as it might be handy.

@praveen-influx praveen-influx force-pushed the praveen/reproduce-empty-snapshot-issue branch from 3f52610 to b93cb5d Compare January 15, 2025 12:12
// force_snapshot) snapshot runs snapshot_tracker will check
// wal_periods are empty so it won't trigger a snapshot in the first
// place.
if !persist_jobs_empty {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is only needed here to fix the issue but I've also made clearing the snapshots further down conditional (whether persist jobs were empty or not). I did wonder if I can work around the problem further up the call stack to avoid snapshotting (calling this function) in the first place but knowing whether query buffer is empty isn't straight forward. If there is a better way to achieve this without scattering checks on a flag that would be desirable.

@praveen-influx praveen-influx force-pushed the praveen/reproduce-empty-snapshot-issue branch from b93cb5d to 2d0ac24 Compare January 15, 2025 12:17
@praveen-influx praveen-influx force-pushed the praveen/reproduce-empty-snapshot-issue branch from 2d0ac24 to c5e750c Compare January 15, 2025 12:22
Copy link
Contributor

@hiltontj hiltontj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, and the comment with explanation is very helpful so thanks for adding that.

I was trying to reason if the same logic would apply to the catalog. That is, if there is nothing in the buffer, then should we ignore persisting the catalog. But I think the answer is no: there could have been catalog only operations in the WAL, e.g., from CreateTable, that did not result in anything in the buffer...

@praveen-influx praveen-influx force-pushed the praveen/reproduce-empty-snapshot-issue branch from c5e750c to 0014ff1 Compare January 15, 2025 19:19
@praveen-influx
Copy link
Contributor Author

I was trying to reason if the same logic would apply to the catalog. That is, if there is nothing in the buffer, then should we ignore persisting the catalog. But I think the answer is no

There is already a guard to write new catalog file only when catalog has been updated (see here). I think the condition this PR addresses should be mutually exclusive i.e, cannot have both a no-op (written only when there's nothing in wal buffer) and also a catalog op at the same time.

@praveen-influx praveen-influx merged commit 4eccc38 into main Jan 15, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants