borg2: reconsider chunks_healthy approach #8559

ThomasWaldmann · 2024-11-23T16:27:55Z

borg 1.x approach

borg 1.x archived regular file items have:

.chunks: a list of (chunkid, plaintext_size) tuples, referencing file content chunks
.chunks_healthy: same thing for an item that got its chunks list patched with all-zero replacement chunks because the correct chunk is missing in the repo. chunks_healthy has the original, correct chunk ids. the all-zero replacement chunks are stored in the repo.

When doing it that way, all borg code reading file content does not necessarily need special casing for missing chunks. If there is a NEW missing chunk, it might crash though, but then users will run borg check --repair and that will "patch" the chunks list with a new all-zero replacement chunk and it won't crash anymore.

If a missing chunk reappears (e.g. because a new backup has created it again), borg check --repair will notice that and put the correct chunk id from the .chunks_healthy list back to the chunks list.

That approach worked, but has some issues:

some places need to deal with both lists, e.g. borg2 compact, check
as long as .chunks is not patched, places not dealing with missing chunks might crash
requires repair, create, repair sequence to fix missing chunks (if create re-creates the missing chunks)

borg2 approach

New, better approach for borg2:

make the places reading file content (chunks) deal with missing chunks. they can either fill in a dynamically created (not stored) all-zero bytestring (length is known) or raise an IOError. IIRC, borg mount has already an option for that.
do NOT have .chunks_healthy in the archived item (not needed) and also never modify .chunks (it will always contain the correct (chunkid, size) tuples, even for missing chunks.

Pros of this approach:

if a previously missing chunk reappears, all items referencing it will be immediately healed, no double-run of borg check --repair needed.
if there is a new missing chunk, borg will have some defined behaviour (IOError or zero-bytes) and not just crash, without requiring borg check --repair to achieve that behaviour.
code that needed to deal with .chunks and .chunks_healthy will get simpler.
no need to store all-zero patch chunks in the repo
it's simpler overall

Cons:

we can't track "new missing" / "new reappeared" chunks ("new" since last repair), we can only track the overall count of missing chunks. but guess that is good enough.

Transfer from borg 1.x to 2:

If there is a .chunks_healthy list for an archived borg 1.x item, this is the one it would use for .chunks of the borg2 item - because this has all the correct chunk ids.
Also, it would transfer chunks from borg1 repo to borg2 repo using the .chunks_healthy list, if present (and thus not transfer any all-zero replacement chunks). It might encounter missing chunks in the borg1 repo when doing that and would silently skip them.

The text was updated successfully, but these errors were encountered:

Well, it's not totally removed, some code in Item, Archive and borg transfer -from-borg1 needs to stay in place, so that we can pick the CORRECT chunks list that is in .chunks_healthy for all-zero-replacement-chunk-patched items when transferring archives from borg1 to borg2 repos. FUSE fs read: IOError or all-zero result Other reads: TODO

Well, it's not totally removed, some code in Item, Archive and borg transfer --from-borg1 needs to stay in place, so that we can pick the CORRECT chunks list that is in .chunks_healthy for all-zero-replacement-chunk-patched items when transferring archives from borg1 to borg2 repos. FUSE fs read: IOError or all-zero result Other reads: TODO

ThomasWaldmann · 2024-11-25T14:36:10Z

With the new way, the code has always to expect missing chunks when reading item.chunks:

Also check:

borg transfer --from-borg1 - so it uses the right chunks list and transfers the right chunks.

Well, it's not totally removed, some code in Item, Archive and borg transfer --from-borg1 needs to stay in place, so that we can pick the CORRECT chunks list that is in .chunks_healthy for all-zero-replacement-chunk-patched items when transferring archives from borg1 to borg2 repos. transfer: do not transfer replacement chunks, deal with missing chunks in other_repo FUSE fs read: IOError or all-zero result

ThomasWaldmann added this to the 2.0.0rc1 milestone Nov 23, 2024

ThomasWaldmann added the cmd: check label Nov 23, 2024

ThomasWaldmann modified the milestones: 2.0.0rc1, 2.0.0b15 Nov 23, 2024

ThomasWaldmann self-assigned this Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

borg2: reconsider chunks_healthy approach #8559

borg2: reconsider chunks_healthy approach #8559

ThomasWaldmann commented Nov 23, 2024 •

edited

Loading

ThomasWaldmann commented Nov 25, 2024 •

edited

Loading

borg2: reconsider chunks_healthy approach #8559

borg2: reconsider chunks_healthy approach #8559

Comments

ThomasWaldmann commented Nov 23, 2024 • edited Loading

borg 1.x approach

borg2 approach

Transfer from borg 1.x to 2:

ThomasWaldmann commented Nov 25, 2024 • edited Loading

ThomasWaldmann commented Nov 23, 2024 •

edited

Loading

ThomasWaldmann commented Nov 25, 2024 •

edited

Loading