Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ZipInputStream cannot read STORED method from ZipZap #155

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

liuliu
Copy link

@liuliu liuliu commented Dec 23, 2016

This PR tried to fix the problem with ZipZap generated STORED format with ZipInputStream.

ZipInputStream is a widely used mechanism from JDK that supports reading Zip files. Since it deals with a stream of data, there are limitations in its implementation. For example, ZipInputStream cannot leverage the data from central file header to offset its payload. This is OK for deflated stream because deflated stream is self-descriptive, thus, by inflating the stream, it will naturally reach the end of the stream, and then pick up the data descriptor from the end of the payload for crc32 etc. However, if the payload is in STORED format, ZipInputStream relies on the local file header to provide information such as the uncompressed size, which in ZipZap case, skipped.

This works fine for other Zip file reader implementation or Java's own ZipFile since most of them are using central file header as the source of truth.

This PR fixed the problem by having the correct uncompressed / compressed size / crc32 code at the local file header if STORED format is used, and in that case, don't use data descriptor.

However, since we can only get the payload length at zero cause for data block payload, it is left to be desired about how to implement this for NSOutputStream / CGDataConsumerRef.

Please advice. Also, cc @rsanchezsaez

@pixelglow
Copy link
Owner

Excellent idea, proposal and implementation!

This change is actually generally useful as a slight optimization for smaller zip files and follows neatly on from @ed84124.

Only a couple of issues to fix. Please rebase all your changes to a single commit:

  • Minor nit: line 108 comment should mention data and no compression.

  • Issue: lines 278 - 304: logic here is sound but embeds another dependency on dataBlock and compressionLevel == 0 in the data writing code, which is hard to figure out when we next update this to support e.g. always outputting size + crc in local header. Prefer to separate out the header update logic in a separate phase before the data writing i.e. a separate if (!_compressionLevel && _dataBlock) {...} code. Once this code updates the header with the size + crc, then it should set a flag to inhibit the data descriptor write at line 337, instead of having to recheck _dataBlock, generalPurposeBitFlag etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants