Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a zimdiff2 for incremental updates #29

Closed
benoit74 opened this issue Jan 1, 2020 · 2 comments
Closed

Implement a zimdiff2 for incremental updates #29

benoit74 opened this issue Jan 1, 2020 · 2 comments

Comments

@benoit74
Copy link
Collaborator

benoit74 commented Jan 1, 2020

Hello,

I'm wondering what happens to the work on incremental updates performed during GSoC 2013 summarised here. I do not find any follow up on the work performed at that time.

At the same time, it seems that the incremental updates are still not available (and it's an issue for me in France, downloading 40G every year is still difficult nowadays in many rural areas, so I could only imagine what is the situation in other countries).

I found that an alternative for implementing incremental updates seems to be situated at the protocol level with zsync but it still seems to be only an idea.

On my side, I have another idea I tried to PoC. It is based on the fact that a lot of entries are indeed not altered between two versions of the same archive (images, probably already detected by something like zsync) and for others only very small changes are done (and could hence be easily compressed with bsdiff, not by zsync due to the fact that entries are already compressed with LZMA with usually a different cluster location).

I began the implementation of this PoC and first figures are very promising (a 4x reduction at the very least, i.e. an incremental archive could be 4 times smaller than the full new version of the archive). But before spending more time on this PoC (which was fun to develop anyway) I would like to have more information from your side.

If I could help on this incremental updates topic, I would be more than happy.

@kelson42
Copy link
Contributor

kelson42 commented Feb 6, 2020

Salut Benoit

Sorry for being a bit late with my a feedback on your ticket. Although this is one of the most interesting topics around the ZIM format, this came not at the best time for me.

The work of the GSoC 2013 has been properly integrated to the zimtools. You can find the tools zimdiff and zimpatch.

We don't have follow up on this because this was not the priorities and because this topic needs quite a bit of resources to be done properly.

Last year I came over zsync and had the same thoughts as you: this generic tools might be simply good enough to do the job properly. An other big advantages is that on the backend side it seems Mirrorbrain does handle it. Therefore this is the most promising path for the moment to me and for now we are blocked with kiwix/container-images#37

Without knowing much about your PoC, this looks pretty much like the zimdiff/zimpatch approach!

If you are seriously interested by working on this topic, I'm available for a call so we can discuss this a bit more naturally.

@kelson42 kelson42 self-assigned this Feb 6, 2020
@kelson42
Copy link
Contributor

No feedback, I close the ticket. @benoit74 Please feel free to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants