Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDiff in the hot DB #39

Open
wants to merge 8 commits into
base: unstable
Choose a base branch
from
Open

HDiff in the hot DB #39

wants to merge 8 commits into from

Conversation

dapplion
Copy link
Owner

Partial implementation of https://hackmd.io/6WI3idBfR2q2AQyMUfYMyg

state_root,
state_slot: Some(summary.slot),
// Delete the diff as descendants of this state will never be used.
prune_hot_diff: true,
Copy link
Collaborator

@michaelsproul michaelsproul Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to change this to not delete the diff ancestors of the finalized state (use closest_layer_slots somewhere here).

We need to know the ancestor diffs of the new finalized state, so one approach would be:

  • Reorder migrate_database (hot to freezer migration) prior to pruning heads
  • Keep hot states which are either after the split, or before the split and descended from it according to the diff hierarchy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to rewrite the prune_abandoned_forks code, but if we don't want to do that now, we can take the approach above

@michaelsproul
Copy link
Collaborator

Regarding state summaries stored during state advance (infamous beacon_node/beacon_chain/src/block_verification.rs:1487):

Option 1 (current strategy) is to commit eagerly:

  • Means storing more on disk in the hot-path of state advance (potentially annoying due to latency from disk I/O and diff computation). However this is mitigated by the state advance routine usually doing this work prior to block import.
  • Saves memory because we don't need to hold potentially thousands of states in memory until the block is verified and we decide to commit it.

Option 2 (old strategy from ages ago) is to commit on import:

  • Uses a lot more memory because we need to keep the states around in memory. Naively for 1000 epoch boundary states this could be like 1000 x 0.1 GB = 100 GB = death by OOM.
  • The advantage of this approach is that we could avoid computing the diffs for the intermediate states until after the block has been imported & added to fork choice. However this is probably not worth it.

Option 3 (don't store intermediate states at all):

  • Would need to recompute them (and re-do the epoch transition).

.viable_heads::<T::EthSpec>(head_slot)
.iter()
.map(|node| (node.root, node.slot))
.collect()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use to implement the DB downgrade (recreate the headtracker after it has been deleted)

// TODO(hdiff): Is it necessary to do this read tx now? Also why is it necessary to
// check that the summary exists at all? Are double writes common? Can this txn
// lock deadlock with the `do_atomically` call?
let txn_lock = chain.store.hot_db.begin_rw_transaction();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this lock is a race with the write in BeaconChain::import_block:

If we were to write a non-temporary state in import_block in between setting state_already_exists (false) and the write of the temporary state in this function, we can corrupt the DB:

  • There is a state that is not temporary (required by some fully-imported block),
  • But it is marked temporary due to the race condition here
  • Temporary states risk being deleted by pruning -> invalid DB due to deletion of canonical state.

The lock prevents this case by preventing the interleaving of the read in this function with the write in the import function. However this is a bad abstraction forced upon us by LevelDB which lacks proper ACID transactions. If we move away from LevelDB eventually we can maybe have proper transactions, see:

@@ -735,11 +639,14 @@ impl<E: EthSpec, Hot: ItemStore<E>, Cold: ItemStore<E>> BackgroundMigrator<E, Ho
store.pruning_checkpoint_store_op(new_finalized_checkpoint),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in c8d82f0

StateSummariesDAG::new(state_summaries, parent_block_roots)
};

// From the DAG compute the list of roots that ascend from finalized root up to the split
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// From the DAG compute the list of roots that ascend from finalized root up to the split
// From the DAG compute the list of roots that descend from finalized root up to the split

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with 4fa32cc

};

// From the DAG compute the list of roots that ascend from finalized root up to the split
// slot. And run `migrate_database` with it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// slot. And run `migrate_database` with it
// slot.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with 4fa32cc

Comment on lines 601 to 602
// keep those on the finalized canonical chain. Note that there may be lingering
// forks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// keep those on the finalized canonical chain. Note that there may be lingering
// forks.
// keep those on the finalized canonical chain. Checking the state root ensures we avoid lingering forks.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with 4fa32cc

// Abandoned forks will never be used, so we can safely delete the diffs
prune_hot_diff: true,
}]
// Abandoned forks will never be used, so we can safely delete the diffs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting this now includes:

  • Abandoned forks (not descended from finalized)
  • Old stuff that wasn't pruned previously but is getting pruned now (older than previous finalization)
  • Finalized states from the canonical chain which are not required by the HDiff mechanism.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comments

@@ -723,9 +629,7 @@ impl<E: EthSpec, Hot: ItemStore<E>, Cold: ItemStore<E>> BackgroundMigrator<E, Ho
let persisted_head = PersistedBeaconChain {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could remove this write, and probably delete the PersistedBeaconChain altogether.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with 9ce4b33, but to fully remove PersistedBeaconChain we need to have a way to compute the genesis state root and genesis block root when booting from an existing DB. The issue is that while the state is in the DB it is keyed by root which we don't know.


pub fn hot_hdiff_start_slot(&self) -> Slot {
// TODO(hdiff): read the start slot from somewhere
todo!();
Copy link
Collaborator

@michaelsproul michaelsproul Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking maybe we could recycle the anchor_slot for use here, as it is fairly useless and not used for much. The only load-bearing use of the anchor_slot that I can find is in try_prune_execution_payloads where it is used to halt the pruning process. However, this is not necessary as we could either:

  1. Keep iterating back to Bellatrix, or
  2. Stop iterating back once we find a payload that is missing, or
  3. Use the oldest_block_slot to determine when to stop iterating back. This is my preferred option, as it is also compatible with storing execution payloads older than the anchor_slot, i.e. Store execution payloads during backfill if --prune-payloads false sigp/lighthouse#6510

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conclusion: we can recycle the anchor_slot field and set it to the epoch-aligned slot of the snapshot state (not the slot of the checkpoint block, which might be older due to skipped slots).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed on 5ecf9db

@dapplion dapplion force-pushed the tree-states-hot branch 7 times, most recently from 06f6177 to f4d44e4 Compare December 24, 2024 18:44
@dapplion dapplion changed the base branch from tree-states-archive to unstable December 24, 2024 19:23
@dapplion
Copy link
Owner Author

dapplion commented Dec 24, 2024

Branch with commit history https://github.com/dapplion/lighthouse/pull/new/tree-states-hot-backup

squashed and rebase on top of sigp/unstable

@dapplion
Copy link
Owner Author

I have done manual testing and this branch works on mainnet:

  1. starting the node from genesis
  2. starting the node with fresh DB and checkpoint sync (not aligned epoch to hdiff snapshot)
  3. starting the node from an existing DB with at least 100 slots of history

All of them were broken originally, there's a lot of commits https://github.com/dapplion/lighthouse/pull/new/tree-states-hot-backup to make them work. I have also rebased to unstable, it was quite easy.

For 3. I had to add a new column to store the new hot state summaries. Otherwise when diffing it will try to read a V22 summary and break.

@dapplion
Copy link
Owner Author

@michaelsproul If we stop updating the head_tracker in the persisted head, how can we downgrade safely? If you run an old version of Lighthouse it will read the persisted head which contains an empty head tracker.

///
/// This column is populated after DB schema version 23 superseding `BeaconStateSummary`. The
/// new column is necessary to have a safe migration without data loss.
#[strum(serialize = "bs3")]
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[strum(serialize = "bs3")]
#[strum(serialize = "hss")]

@dapplion
Copy link
Owner Author

dapplion commented Jan 9, 2025

RE pruning routine:

Iterate summaries from finalized state jumping by diff base root, and keep only those. required_finalized_diff_state_slots = to the diff path of the finalized state

@dapplion
Copy link
Owner Author

dapplion commented Jan 9, 2025

@dapplion
Copy link
Owner Author

dapplion commented Jan 9, 2025

loop on each block and state to prune and log on separate lines so there's not a single giant dump of potentially megabytes

"states_to_prune" => ?states_to_prune,

@dapplion
Copy link
Owner Author

dapplion commented Jan 9, 2025

change to "not copying state to freezer DB"

debug!(store.log, "Pruning finalized state"; "slot" => slot);
pruning not happening here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants