Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TXG timestamp database #16853

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

oshogbo
Copy link
Contributor

@oshogbo oshogbo commented Dec 11, 2024

Motivation and Context

This feature enables tracking of when TXGs are committed to disk, providing an estimated timestamp for each TXG.

With this information, it becomes possible to perform scrubs based on specific date ranges, improving the granularity of data management and recovery operations.

Description

To achieve this, we implemented a round-robin database that keeps track of time. We separate the tracking into minutes, days, and years. We believe this provides the best resolution for time management. This feature does not track the exact time of each transaction group (txg) but provides an estimate. The txg database can also be used in other scenarios where mapping dates to transaction groups is required.

How Has This Been Tested?

  • Create pool
  • write data
  • wait some time
  • write data
  • wait some time
  • try to scrub different times

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@oshogbo oshogbo force-pushed the oshogbo/scrub_data_range branch 7 times, most recently from 2a20b11 to 364f813 Compare December 11, 2024 14:01
@amotin amotin added the Status: Code Review Needed Ready for review and testing label Dec 11, 2024
@oshogbo oshogbo force-pushed the oshogbo/scrub_data_range branch from 364f813 to 891c8f2 Compare December 11, 2024 15:50
This feature enables tracking of when TXGs are committed to disk,
providing an estimated timestamp for each TXG.

With this information, it becomes possible to perform scrubs based
on specific date ranges, improving the granularity of
data management and recovery operations.

Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
@amotin
Copy link
Member

amotin commented Dec 12, 2024

It crashes on VERIFY(!dmu_objset_is_dirty(dp->dp_meta_objset, txg)).

@amotin
Copy link
Member

amotin commented Dec 12, 2024

This reminds me we recently added ddp_class_start into the new dedup table entries format to be able to prune DDT based on time. I wonder if we could save some space would we have this mechanism back then.

Comment on lines +8441 to +8452
ret = sscanf(timestr, "%4d-%2d-%2d %2d:%2d", &tm.tm_year, &tm.tm_mon,
&tm.tm_mday, &tm.tm_hour, &tm.tm_min);
if (ret < 3) {
fprintf(stderr, gettext("Failed to parse the date.\n"));
usage(B_FALSE);
}

// Adjust struct
tm.tm_year -= 1900;
tm.tm_mon -= 1;

return (timegm(&tm));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if strptime() or something else specialized would be better.

Comment on lines +789 to +797
static const spa_feature_t txg_log_time_deps[] = {
SPA_FEATURE_EXTENSIBLE_DATASET,
SPA_FEATURE_NONE
};
zfeature_register(SPA_FEATURE_TXG_TIMELOG,
"com.klaraystems:txg_log_time", "txg_log_time",
"Log history of txg.",
ZFEATURE_FLAG_PER_DATASET | ZFEATURE_FLAG_READONLY_COMPAT,
ZFEATURE_TYPE_BOOLEAN, txg_log_time_deps, sfeatures);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why do we need this feature at all. I don't see a problem in pool being imported by some implementation not supporting this feature. Sure some TXGs won't be recorded, but so what? This functionality seems to be a best effort any way.

And if we need this for some reason, why is it ZFEATURE_FLAG_PER_DATASET? So far it feels like pool-wide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it makes sense I will drop it.

Comment on lines +4826 to +4829
/* Load time log */
error = spa_load_txg_log_time(spa);
if (error != 0)
return (spa_vdev_err(rvd, VDEV_AUX_CORRUPT_DATA, EIO));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could instead delete it and start from scratch. Not a big deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hym, maybe. But I feel that when we can't load part of it, then something funky is happening and should be investigated.

@@ -10229,6 +10343,7 @@ spa_sync(spa_t *spa, uint64_t txg)
}

spa_sync_rewrite_vdev_config(spa, tx);
spa_sync_time_logger(spa, tx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you are dirtying the pool too late in a sync process. It does not need to be that late. You could move it earlier, somewhere around brt_pending_apply().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

typedef struct {
int rrd_head; /* head (beginning) */
int rrd_tail; /* tail (end) */
size_t rrd_length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why length is size_t, while head/tail are int. They all address the same array and limited by RRD_MAX_ENTRIES. Plus you are writing these data directly to pool, while int and size_t (and might be hrtime_t?) might have different sizes on different platforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greate point, thanks!

Comment on lines +141 to +144
if (data == NULL || mindiff > rrd_abs(tv - cur->rrdd_time)) {
data = cur;
mindiff = rrd_abs(tv - cur->rrdd_time);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For scrub we might want strict rounding down for start time and rounding up for end time, not the closest. It is better to scrub more rather than less.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a good idea. WIll do.

Comment on lines +3373 to +3384
VERIFY0(zap_add(spa_meta_objset(spa),
DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_TXG_LOG_TIME_MINUTES,
1, sizeof (spa->spa_txg_log_time.dbr_minutes),
&spa->spa_txg_log_time.dbr_minutes, tx));
VERIFY0(zap_add(spa_meta_objset(spa),
DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_TXG_LOG_TIME_DAYS,
1, sizeof (spa->spa_txg_log_time.dbr_days),
&spa->spa_txg_log_time.dbr_days, tx));
VERIFY0(zap_add(spa_meta_objset(spa),
DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_TXG_LOG_TIME_MONTHS,
1, sizeof (spa->spa_txg_log_time.dbr_months),
&spa->spa_txg_log_time.dbr_months, tx));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are storing the data as array of bytes. Where is byteswap handling? We already have BRT non-endian safe that must be fixed, please don't add more.

@amotin amotin added the Status: Revision Needed Changes are required for the PR to be accepted label Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing Status: Revision Needed Changes are required for the PR to be accepted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants