Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Async iterator over ls.stream #117

Open
1 task done
ccollie opened this issue Jun 8, 2022 · 2 comments
Open
1 task done

[Feature] Async iterator over ls.stream #117

ccollie opened this issue Jun 8, 2022 · 2 comments
Labels
Enhancement new feature or improvement Needs Triage needs an initial review

Comments

@ccollie
Copy link

ccollie commented Jun 8, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Calling ls.stream over a large store results large memory hit for all entries

Expected Behavior

An iterator which returns entries as needed

@ccollie ccollie added the Needs Triage needs an initial review label Jun 8, 2022
@wraithgar
Copy link
Member

I don't think this will make a difference either way, you're just changing how the stream is consumed. We think the real memory hog here is

const MAX_SINGLE_READ_SIZE = 64 * 1024 * 1024
, and have plans to fix it soon.

@darcyclarke darcyclarke added the Enhancement new feature or improvement label Sep 13, 2022
@yury-kozlov
Copy link

I also stumbled on a similar issue (on windows) when npm cache contains lots of packages, I got this error:

Uncaught Error Error: EMFILE: too many open files,
Emitted 'error' event on Minipass instance at:
at (program) (internal/process/promises:265:12)

The reason for the issue is that cacache\lib\entry-index.js\lsStream function tries to asynchronously load all files and only then awaits the results.
When working with streams I would expect a different behavior: load data in portions (one-by-one or in chunks of X) instead of reading all files "in parallel".

As a workaround, I adjusted the source code locally and used regular for loops instead of those map functions:

  • await Promise.all(buckets.map(async (bucket) =>
  • await Promise.all(subbuckets.map(async (subbucket) =>
  • await Promise.all(subbucketEntries.map(async (entry) =>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement new feature or improvement Needs Triage needs an initial review
Projects
None yet
Development

No branches or pull requests

4 participants