-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On-demand log-spacemap flush; zpool condense
command
#16747
base: master
Are you sure you want to change the base?
Conversation
Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Normally, log spacemaps are flushed out to the metaslabs when the pool is exported. For large logs, this can lead to export taking an inordinate amount of time. This commit adds a "mode" parameter for the log spacemap "flushall" operation, and functions for starting and stopping it in a particular mode. The existing behaviour of flushing everything is now the "export" mode. Then, we add a new "request" mode, that can be triggered externally. This mode differs in that it only flushes spacemaps that were dirtied before the current transaction, stopping when the only dirty ones remaining, if any, are newer. This commit only adds the behaviours and sets up the entry points; the next commit will add something to call them. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
zpool condense
command.zpool condense
command
The idea is to have a single command that could signal to any background cleanup task that it should do its work faster, or care less about not getting in the way of user IO, or whatever. This adds the the `zpool condense` command, the `ZFS_IOC_POOL_CONDENSE` ioctl and counters so userspace can get progress. Because the target could be anything, there's no particular unit, just a total number of items to condense and count of how many done. It also adds a `log-spacemap` condense target, which calls the "request log flush" function. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
6c94611
to
0aa7b7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be I've missed something, but what will make transactions move and more metaslabs flushed if the pool is idle?
Also, if the pool is idle, will it flush only 5 transactions per transaction? I worry about the amount of new dirty metaslabs/transactions it may produce until it finally converge.
if (argc < 1) { | ||
(void) fprintf(stderr, gettext("missing pool name argument\n")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zpool sync
we allow without arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I didn't want an "all pools" mode here, because I don't know what might be there in the future, and it might mean different things for different pools. Same way zpool scrub
requires a pool arg.
if (cb.type == POOL_CONDENSE_TYPES) { | ||
(void) fprintf(stderr, gettext("missing condense target\n")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be nice to have "all" or allow multiple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure! I figure we could add it when the second one comes along. Comma-separated might be nice, like wait targets etc.
boolean_t | ||
spa_flush_all_logs_requested(spa_t *spa) | ||
void | ||
spa_log_flushall_start(spa_t *spa, spa_log_flushall_mode_t mode, uint64_t txg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In both the field and function names you just say "log", but you've already mention that aside of spacemap logs we might get dedup logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, maybe I wasn't clear. These are just the controls for log spacemap flushall. The next thing would be something separate, just triggered from zfs_ioc_condense()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not saying that this function will handle other logs. Just that it is not very specific for externally visible name. Even more for spa_log_flushall_mode
actually. But looking now there are other names around it that are also not metaslab-specific, so I don't insist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see what you mean. Yeah, just following the surrounding style... :/
@amotin mm, you may be right. It didn't come up in testing, but we hadn't gone out of our way to stop pool activity. (Also I wrote this last year, so probably didn't know about this at the time!) I'll study it and post an update soon, thanks! |
Right, I think I've swapped back in everything I need. So yes, you're right - when the pool is idle, nothing is pushing things along (same for dedup log, incidentally). The 5s timeout will see some flushed out, but that's all. This is existing behaviour, so I'm ok with that, I think. So it seems to me that there's two questions. If the operator has requested spacemap log flush, should we push the sync along a bit? I think it's reasonable to say yes, in theory. Similar idea to the Then, what should the amount be. I forget why we chose minimum 5; probably it was just a number that was easy to see vs 1. Of course, we should use more the quieter things are. Is that just a much higher minimum? Maybe a % of the total, or of the amount dirty? Or is it more like, some larger minimum + inverse of the change rate, so we always get a big amount, and more if there's space to do it. I'll ask around. Let me know if you have any thoughts. |
@robn I think once user requested condense, we should do it as fast as possible, since user likely waits for it to reboot, export, whatever. This operation makes no sense to do just routinely. The only limitation is not hurt other workload too much. The amount of sync is a good question. I'd guess we don't want to extend transaction group for too long, neither consume to much memory on dirty data, etc. But my last trip to spacemaps was some time ago, so I don't have specific recommendations. |
[Sponsors: Klara, Inc., Wasabi Technology, Inc.]
Motivation and Context
Normally, log spacemaps are flushed out to the metaslabs when the pool is exported. For large logs, this can lead to export taking an inordinate amount of time.
This PR adds a an on-demand variant of the log spacemap flush, and a
zpool condense
command to trigger it. With it, an operator can request that log spacemaps be flushed ahead of time, so that there is relatively little work to be done at export time.Description
There's two halves to this.
First, we add a "mode" to the existing "flushall" behaviour in
spa_flush_metaslabs()
. The traditional behaviour is now "export mode", and flushes all logs. Some new functions are added to start and stop the flush with a given mode. Then we add a "request" mode, for use by the operator. This follows the same logic of walking the logs and flushing them out, but skips any that were modified after the flush request was made.The second part is the addition of the
zpool condense
command, and support library and ioctl additions. This takes a-t <target>
parameter, which is the "thing" to condense, flush, garbage-collect or otherwise accelerate background processing for. It's designed so it could be wired up to any similar background process in the future. In particular, I had the dedup log in mind while putting it together.All the trimmings you'd expect are there. Condense operations can be cancelled, restoring the flushing behaviour to its original pace or schedule. They can be waited on, via
condense -w
orwait -t condense
. The latter combines all condense targets into one signal value, theoretically allowing multiple things to be condensed at the same time, and wait until they're all finished. Without a second or maybe even third target it's unclear to me if this it what the user will expect, but I don't think this is far off and it can be changes when the next thing gets hooked up.I've included a kstat exposing the pool "unflushed" counters. We used this in our initial investigations. I thought it was going to be useful for the ZTS test, but it ended up being too difficult to control reliably. So nothing here uses it directly, but it doesn't hurt anything to have it there and may help someone, so why not.
How Has This Been Tested?
ztest
gets support, and has had many tens of runs without issue. ZTS test has been added, and the entire suite run to successful completion.Types of changes
Checklist:
Signed-off-by
.