Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce num wal files to keep #25801

Merged
merged 3 commits into from
Jan 12, 2025

Conversation

praveen-influx
Copy link
Contributor

This commit allows a configurable number of wal files to be left behind in OS. This is necessary as enterprise replicas rely on these files.

closes: #25788

Copy link
Member

@pauldix pauldix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good, left some nitpicky name comments and a couple other things.

/// Number of wal files to keep behind, wal flush does not clear the wal files immediately
/// instead they are only deleted when num wal files count exceed this keep behind size
#[clap(
long = "num-wal-files-to-keep",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be renamed to snapshotted-wal-files-to-keep. It's not the actual total number of WAL files, it's the number that have been snapshotted. So if you have 200 files that haven't been snapshotted and this setting is 50, you'd have 250 total files at that point.

use tokio::sync::Mutex;
use tokio::sync::{oneshot, OwnedSemaphorePermit, Semaphore};

// struct WalObjectStoreArgs {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

@@ -25,6 +36,11 @@ pub struct WalObjectStore {
added_file_notifiers: parking_lot::Mutex<Vec<Arc<dyn WalFileNotifier>>>,
/// Buffered wal ops go in here along with the state to track when to snapshot
flush_buffer: Mutex<FlushBuffer>,
/// oldest, last and latest wal file nums are used to keep track of what wal files
/// to remove from the OS
num_wal_files_to_keep: u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, rename this so it indicates that it's the number of snapshotted wal files to keep.

all_wal_file_paths
.iter()
.filter(|path| {
if let Some(last_wal_sequence_number) = last_wal_sequence_number {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I'm not sure of looking through this, does the persisted snapshot have the last known wal file number saved, or is it the wal file number it's snapshotting to? It should have both, but the one we care about here is the one it's snapshotting to. Can you verify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The persisted snapshot file carries the wal file number that came in the current wal contents, that's why @hiltontj had to use >= instead of >. SnapshotDetails holds the last_wal_file_sequence_number (which is based on wal period we're snapshotting). If we used the one coming in SnapshotDetails we can then say load files only > this wal file number

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so we need to update so that we're also keeping a record of what WAL file we're snapshotting to. That way we initialize with the correct one (i.e. not the latest WAL file written). In the case of a forced snapshot, those numbers will be the same, but in the case of normal snapshots, the snapshot number will be < latest wal number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed it in commit - good spot, thanks.

}

if path_count == paths.len() {
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we sort before this so that paths always returns sorted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from existing code, I thought it's doing the right already - yes I can sort it.

#[derive(Debug)]
struct WalFileRemover {
oldest_wal_file: Option<WalFileSequenceNumber>,
last_wal_sequence_number: Option<WalFileSequenceNumber>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to last_snapshotted_wal_sequence_number for clarity. The latest wal file number will always be >= this one.

This commit allows a configurable number of wal files to be left behind
in OS. This is necessary as enterprise replicas rely on these files.

closes: #25788
@praveen-influx praveen-influx force-pushed the praveen/delete-wal-background branch 2 times, most recently from 80a61d5 to 9096eee Compare January 11, 2025 23:49
@praveen-influx praveen-influx force-pushed the praveen/delete-wal-background branch from 9096eee to 5e76230 Compare January 11, 2025 23:57
@praveen-influx praveen-influx marked this pull request as ready for review January 11, 2025 23:58
@praveen-influx praveen-influx force-pushed the praveen/delete-wal-background branch from 2f6f29a to a03f3fb Compare January 12, 2025 00:24
Copy link
Member

@pauldix pauldix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think it looks good. Let's get it tested downstream!

@praveen-influx praveen-influx merged commit 5096344 into main Jan 12, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WAL snapshotting should leave behind some WAL files
2 participants