Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

borg2: reconsider chunks_healthy approach #8559

Open
ThomasWaldmann opened this issue Nov 23, 2024 · 1 comment
Open

borg2: reconsider chunks_healthy approach #8559

ThomasWaldmann opened this issue Nov 23, 2024 · 1 comment
Assignees
Milestone

Comments

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Nov 23, 2024

borg 1.x approach

borg 1.x archived regular file items have:

  • .chunks: a list of (chunkid, plaintext_size) tuples, referencing file content chunks
  • .chunks_healthy: same thing for an item that got its chunks list patched with all-zero replacement chunks because the correct chunk is missing in the repo. chunks_healthy has the original, correct chunk ids. the all-zero replacement chunks are stored in the repo.

When doing it that way, all borg code reading file content does not necessarily need special casing for missing chunks. If there is a NEW missing chunk, it might crash though, but then users will run borg check --repair and that will "patch" the chunks list with a new all-zero replacement chunk and it won't crash anymore.

If a missing chunk reappears (e.g. because a new backup has created it again), borg check --repair will notice that and put the correct chunk id from the .chunks_healthy list back to the chunks list.

That approach worked, but has some issues:

  • some places need to deal with both lists, e.g. borg2 compact, check
  • as long as .chunks is not patched, places not dealing with missing chunks might crash
  • requires repair, create, repair sequence to fix missing chunks (if create re-creates the missing chunks)

borg2 approach

New, better approach for borg2:

  • make the places reading file content (chunks) deal with missing chunks. they can either fill in a dynamically created (not stored) all-zero bytestring (length is known) or raise an IOError. IIRC, borg mount has already an option for that.
  • do NOT have .chunks_healthy in the archived item (not needed) and also never modify .chunks (it will always contain the correct (chunkid, size) tuples, even for missing chunks.

Pros of this approach:

  • if a previously missing chunk reappears, all items referencing it will be immediately healed, no double-run of borg check --repair needed.
  • if there is a new missing chunk, borg will have some defined behaviour (IOError or zero-bytes) and not just crash, without requiring borg check --repair to achieve that behaviour.
  • code that needed to deal with .chunks and .chunks_healthy will get simpler.
  • no need to store all-zero patch chunks in the repo
  • it's simpler overall

Cons:

  • we can't track "new missing" / "new reappeared" chunks ("new" since last repair), we can only track the overall count of missing chunks. but guess that is good enough.

Transfer from borg 1.x to 2:

  • If there is a .chunks_healthy list for an archived borg 1.x item, this is the one it would use for .chunks of the borg2 item - because this has all the correct chunk ids.
  • Also, it would transfer chunks from borg1 repo to borg2 repo using the .chunks_healthy list, if present (and thus not transfer any all-zero replacement chunks). It might encounter missing chunks in the borg1 repo when doing that and would silently skip them.
@ThomasWaldmann ThomasWaldmann added this to the 2.0.0rc1 milestone Nov 23, 2024
@ThomasWaldmann ThomasWaldmann modified the milestones: 2.0.0rc1, 2.0.0b15 Nov 23, 2024
@ThomasWaldmann ThomasWaldmann self-assigned this Nov 25, 2024
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Nov 25, 2024
Well, it's not totally removed, some code in Item, Archive and
borg transfer -from-borg1 needs to stay in place, so that we
can pick the CORRECT chunks list that is in .chunks_healthy
for all-zero-replacement-chunk-patched items when transferring
archives from borg1 to borg2 repos.

FUSE fs read: IOError or all-zero result

Other reads: TODO
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Nov 25, 2024
Well, it's not totally removed, some code in Item, Archive and
borg transfer --from-borg1 needs to stay in place, so that we
can pick the CORRECT chunks list that is in .chunks_healthy
for all-zero-replacement-chunk-patched items when transferring
archives from borg1 to borg2 repos.

FUSE fs read: IOError or all-zero result

Other reads: TODO
@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Nov 25, 2024

With the new way, the code has always to expect missing chunks when reading item.chunks:

  • borg mount / FUSE fs
  • borg extract (fetch_many)
  • borg export-tar (fetch_many)
  • borg recreate (fetch_many)
  • borg diff (fetch_many)

Also check:

  • borg transfer --from-borg1 - so it uses the right chunks list and transfers the right chunks.

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Dec 28, 2024
Well, it's not totally removed, some code in Item, Archive and
borg transfer --from-borg1 needs to stay in place, so that we
can pick the CORRECT chunks list that is in .chunks_healthy
for all-zero-replacement-chunk-patched items when transferring
archives from borg1 to borg2 repos.

transfer: do not transfer replacement chunks, deal with missing chunks in other_repo

FUSE fs read: IOError or all-zero result
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Dec 28, 2024
Well, it's not totally removed, some code in Item, Archive and
borg transfer --from-borg1 needs to stay in place, so that we
can pick the CORRECT chunks list that is in .chunks_healthy
for all-zero-replacement-chunk-patched items when transferring
archives from borg1 to borg2 repos.

transfer: do not transfer replacement chunks, deal with missing chunks in other_repo

FUSE fs read: IOError or all-zero result
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Jan 4, 2025
Well, it's not totally removed, some code in Item, Archive and
borg transfer --from-borg1 needs to stay in place, so that we
can pick the CORRECT chunks list that is in .chunks_healthy
for all-zero-replacement-chunk-patched items when transferring
archives from borg1 to borg2 repos.

transfer: do not transfer replacement chunks, deal with missing chunks in other_repo

FUSE fs read: IOError or all-zero result
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Jan 8, 2025
Well, it's not totally removed, some code in Item, Archive and
borg transfer --from-borg1 needs to stay in place, so that we
can pick the CORRECT chunks list that is in .chunks_healthy
for all-zero-replacement-chunk-patched items when transferring
archives from borg1 to borg2 repos.

transfer: do not transfer replacement chunks, deal with missing chunks in other_repo

FUSE fs read: IOError or all-zero result
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant