Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch() stream capable? #1

Open
mindplay-dk opened this issue Dec 11, 2023 · 4 comments
Open

fetch() stream capable? #1

mindplay-dk opened this issue Dec 11, 2023 · 4 comments

Comments

@mindplay-dk
Copy link

Hi,

do you know if there's any way to add body streams from fetch calls with this library?

(e.g. downloading a list of files and compressing them on the fly.)

@hpp2334
Copy link
Owner

hpp2334 commented Dec 12, 2023

Now there are only some simple utility functions, such as adding text files, streaming serializing javascript objects into JSON strings, and adding them as files.
I think, if necessary, a utility API can be added in the form of the following:

function createFetchDataGenerator(factory: () => Promise<Response<unknown>>)

Users probably use it this way.

zip.addFile('a.bin', createFetchDataGenerator(() => fetch('xxx')))

@mindplay-dk
Copy link
Author

Looks reasonable. :-)

createFetchDataGenerator could internally use a promise pool with a few threads - to keep the line busy when downloading many small files, keeping a few concurrent fetches going.

alternatively, for more control, maybe something like createFetchDataPool(4), which would return the createFetchDataGenerator function bound to a promise pool of e.g. 4 concurrent promises?

(it's usually a good idea to give people the option - some browsers limit the total number of connections, so if you app needs 1 or 2 connections, you might want 4 or 5 concurrent promises, or you might want no more than 2 or 3 to ease the server load, requirements are often different here.)

@hpp2334
Copy link
Owner

hpp2334 commented Dec 12, 2023

As far as I know, the files in the zip are added one after another. For example, there is no way to add 50% of the content of a.bin first, then add 60% of the content of b.bin, and then add the remaining 50% of the content of a.bin. All the contents of a.bin must be added first, and then all the contents of b.bin. So actually concurrency doesn't make sense, requests should be made one after the other when considering "streaming".

If "streaming" and memory usage are not considered (for example, the requested files are small), A third-party AsyncTaskPool can be used to request and read the total content and then package it into zip. for example:

const asyncTaskPool = new AsyncTaskPool({concurrent: 4})
const zip = new StreamZip();
// fetch files and add them to zip
toRequests.forEach(([addr, filename]) => {
    asyncTaskPool.add(() => fetch(addr).then(resp => {
        // read total buffer
        const buf = resp.arrayBuffer()
        zip.add(filename, createBytesDataGenerator(new Uint8Array(buf)))
    }))
})

@mindplay-dk
Copy link
Author

As far as I know, the files in the zip are added one after another. For example, there is no way to add 50% of the content of a.bin first, then add 60% of the content of b.bin, and then add the remaining 50% of the content of a.bin.

Yes, that's not what I'm asking for. 🙂

But you can start opening the fetch requests ahead of time - if you're downloading a lot of smaller files, that makes sense, since otherwise you will be waiting to connect/request/open each individual file, which introduces a lot of waiting, which is going to be very slow. Opening a few requests ahead of time, before you start adding them to the zip stream, will make this a lot faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants