-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nexus] Support Bundle background task #7063
Conversation
entries.sort_by(|a, b| a.file_name().cmp(&b.file_name())); | ||
|
||
for entry in &entries { | ||
// Remove the "/tmp/..." prefix from the path when we're storing it in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this answers my question about where the tmpdir is being created. What I still don't understand is if that's the best location for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just dug this up from my system which I based on some discussion in chat:
2024-11-04.21:41:38 zfs create -o atime=off -o sync=disabled rpool/faketmpfs
2024-11-04.21:41:56 zfs set quota=20G rpool/faketmpfs
2024-11-04.21:42:39 zfs set mountpoint=/faketmpfs rpool/faketmpfs
2024-11-04.21:43:24 zfs snapshot rpool/faketmpfs@empty
Wondering if we should create a dataset similar to this on the active m.2 device if it's missing at sled-agent startup, or if it's found rewind the snapshot to the empty state.
Maybe @jclulow or @davepacheco could weigh in on this (which is where I think I got the above from) vs storing the temporary bundle collection in /tmp
where we may have limited memory due to the VMM reservoir and control plane processes. Or perhaps I am being overly cautious...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not to speed on any of this so I'm not sure what to suggest, but I'm happy to talk through it if it's helpful. Generally I avoid using /tmp
/tmpfs for just about anything. When figuring out where to store stuff I think about what data's being stored, how big it might be, and what its lifecycle is (i.e., how do we make sure we don't leak it, does it need to survive reboot, can it become incorrect or inconsistent with some other copies, etc.) to figure out where it should go. I'm not sure how helpful that is but I'm happy to talk through it if that's useful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was covered a bit in https://rfd.shared.oxide.computer/rfd/0496#_storage
this RFD proposes storing support bundles within the Nexus zone filesystem during collection, but transferring them into a dataset within a dedicated U.2 while the bundle is active, and accessible for download by the Nexus API.
That was my intent, at least - that this data would go through the following steps:
- Collected into the Nexus transient zone filesystem (limited to one bundle at a time, thanks to the structure of this background task).
- Aggregated into a zipfile by the Nexus collection task.
- Transferred to a U.2, which may be running on a separate sled from Nexus entirely. It's only at this point that the bundle is made "active" and is durable.
I do think it's a mistake for this to be using /tmp
-- this code is running in a Nexus zone, so I think I need to be using /var/tmp
to actually be using that transient zone storage, which was my goal, rather than using a RAM-backed filesystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That all makes sense I think. The only catch with the root filesystem is that it will disappear if the system reboots and then there might be some database-level thing to clean up (e.g., mark the bundle failed? I don't know).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That all makes sense I think. The only catch with the root filesystem is that it will disappear if the system reboots and then there might be some database-level thing to clean up (e.g., mark the bundle failed? I don't know).
The current implementation of support bundles assigns them to a single Nexus, and it's the responsibility of that individual Nexus to perform collection.
- If the Nexus hasn't been expunged: When it reboots, it'll see the bundle in state
SupportBundleState::Collecting
, and will restart collection.
omicron/nexus/src/app/background/tasks/support_bundle_collector.rs
Lines 336 to 344 in d23826e
let result = self | |
.datastore | |
.support_bundle_list_assigned_to_nexus( | |
opctx, | |
&pagparams, | |
self.nexus_id, | |
vec![SupportBundleState::Collecting], | |
) | |
.await; |
- If either Nexus was expunged, or the support bundle storage was expunged, then the reconfigurator has an execution step which invokes
support_bundle_fail_expunged
:
omicron/nexus/db-queries/src/db/datastore/support_bundle.rs
Lines 219 to 221 in 0603f0b
/// Marks support bundles as failed if their assigned Nexus or backing | |
/// dataset has been destroyed. | |
pub async fn support_bundle_fail_expunged( |
This will update database records, and possibly re-assign the Nexus which "owns" the bundle so that storage on the final U.2 can be freed:
omicron/nexus/src/app/background/tasks/support_bundle_collector.rs
Lines 229 to 247 in d23826e
// Monitors all bundles that are "destroying" or "failing" and assigned to | |
// this Nexus, and attempts to clear their storage from Sled Agents. | |
async fn cleanup_destroyed_bundles( | |
&self, | |
opctx: &OpContext, | |
) -> anyhow::Result<CleanupReport> { | |
let pagparams = DataPageParams::max_page(); | |
let result = self | |
.datastore | |
.support_bundle_list_assigned_to_nexus( | |
opctx, | |
&pagparams, | |
self.nexus_id, | |
vec![ | |
SupportBundleState::Destroying, | |
SupportBundleState::Failing, | |
], | |
) | |
.await; |
(this shares some logic for the "bundle was deleted manually by an operator" and "bundle was failed because something was expunged" pathways)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the drive-by comment - I don't have much context here, but I skimmed over support_bundle_fail_expunged
and had a question. It looks like that's checking the current blueprint for expunged zones/datasets specifically. How will that interact with dropping expunged entities from the blueprint altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the drive-by comment - I don't have much context here, but I skimmed over
support_bundle_fail_expunged
and had a question. It looks like that's checking the current blueprint for expunged zones/datasets specifically. How will that interact with dropping expunged entities from the blueprint altogether?
Filed #7319 - I think this could result in us "missing" marking a bundle as failed if we quickly transition a zone from "expunged" to "pruned" without execution ever completing successfully in-between.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the tempdir side of this, fixed in 53dcbe6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One optional suggestion, but otherwise nothing further from me. I defer to those who know more about Nexus on the tmp
questions.
let mut reader = BufReader::new(std::fs::File::open(&src)?); | ||
|
||
loop { | ||
let buf = reader.fill_buf()?; | ||
let len = buf.len(); | ||
if len == 0 { | ||
break; | ||
} | ||
zip.write_all(&buf)?; | ||
reader.consume(len); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my first suggestion led us astray. We can simplify this further and still avoid reading the whole file into memory with io::copy
call. Sorry about that.
let mut file = std::fs::File::open(&src)?;
std::io::copy(&mut file, zip)?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Looks like this is using buffered writers under the hood too: https://doc.rust-lang.org/beta/src/std/io/copy.rs.html#70-90
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patched up in a54097c
PR 3 / ??? This PR aims to re-use the support bundle management logic in sled-agent/src/support_bundle/storage.rs for both the real and simulated sled agent. It accomplishes this goal with the following: 1. It creates a trait, `LocalStorage`, that abstracts access to storage. The "real" sled agent accesses real storage, the simulated sled agent can access the simulated storage APIs. 2. Reduce the usage of unnecessary async mutexes to make lifetimes slightly more manageable. This happens to align with our guidance in RFD 400 (https://rfd.shared.oxide.computer/rfd/400#no_mutex), but has a fall-out impact in replacing `.await` calls throughout Omicron. As an end result of this PR, tests in subsequent PRs (e.g. #7063) can rely on the simulated sled agent to respond realistically to support bundle requests, rather than using a stub implementation.
PR 5 / ??? Implements support bundle APIs for accessing storage bundles. Range request support is only partially implemented as-is -- follow-up support is described in #7356 Builds atop the API skeleton in: - #7008 Uses the support bundle datastore interfaces in: - #7021 Relies on the background task in: - #7063
PR 4 / ???
Adds a background task to manage support bundle lifecycle, and perform collection
In this PR: