Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encode is a crazy memory hog #27

Open
Domiii opened this issue Aug 26, 2021 · 4 comments
Open

encode is a crazy memory hog #27

Domiii opened this issue Aug 26, 2021 · 4 comments

Comments

@Domiii
Copy link

Domiii commented Aug 26, 2021

Due to unoptimized algorithm (as also discussed in #12), encode is a memory hog (I have not looked at decode yet).
I decided to post this as a separate issue, since the other issue's title does not capture the problem, and the discussion mostly focusses on execution speed, not on memory issues.

In my case, I am sending data with socket.io, and this is my journey:

  • I am sending about 100M values (nested in one object)
  • Initially it crashed on me because during encode it ran out of memory. I had to increase node's RAM limit to --max-old-space-size=8192.
  • The final buffer size is 298,406,623
  • It turns out that the recursive _encode call itself required 4GB of additional memory (even though, as mentioned above, buffer size is less than 300MB total).
    • It went from 1.2GB in the beginning to 5.2GB in the end. Afterwards, all mem pressure disappeared again. I'm rather confident, the problem is in the encode algorithm itself.
    • NOTE: I measured this via process.memoryUsage(). All three (rss, heapTotal, and heapUsed) show the same trend.

Possible Solution

I strongly suggest to heed manast's suggestion to use a direct buffer allocation approach. In case that buffer size is unknown, just run the algorithm once to compute buffer size and index positions, then re-run to actually populate, rather than using the current approach of creating temporary utility objects. This should come at a much lower memory (and probably CPU) cost, than the current version.

I know the owner currently does not have time to work on this, but one can dream :)

@Domiii Domiii changed the title encode its up all memory encode eats up all memory Aug 26, 2021
@Domiii Domiii changed the title encode eats up all memory encode is a crazy memory hog Aug 26, 2021
@darrachequesne
Copy link
Owner

Thanks for the detailed report 👍

What are the 100M values? Plain strings? Could you please share some code reproducing the issue?

Did you try with another messagepack implementation like @msgpack/msgpack or what-the-pack? Do you encounter the same behavior?

@Domiii
Copy link
Author

Domiii commented Aug 27, 2021

  1. The values are mostly objects in arrays and they are nested a few times (some 5 to 7 layers deep). The raw values are mostly numbers, and some strings. (But there is no circular references; I'm rather certain.)
  2. I don't think other msgpack implementations have a socket.io parser, do they?
  3. I cannot really re-produce an isolated sample right now (timewise)

But I can offer a few more insights regarding defers. I just ran a sample:

  • 23.8M values
  • Increase in memory: 1.6G between before the recursive _encode call and after (from 1.2 -> 2.8)
  • bytes.length ~ 53M
  • defers.length ~ 11.5M
  • final buf size ~ 141M

It does not seem impossible that the defers array is the culprit.

Do you want to try to create your own sample with some dummy arrays containing a ton of strings?

@joshxyzhimself
Copy link

Isn't it better to use compression algorithms (like gzip) to encode data of huge sizes (e.g. 10MB and above)? On client-side (browser) maybe pako can decode it on the main thread or on the worker thread.

uWebSockets.js is a great alternative to socket.io and other web frameworks too (e.g. express, koa, hapi, ws)

@Domiii
Copy link
Author

Domiii commented Jun 2, 2022

@joshxyzhimself The issue here is with encode, not with encryption or the transport layer. encode has a memory leak causing it to gobble up 4+gb of memory (and then crash) to encode only 200+MB of data (arrays, objects, strings, numbers).

@darrachequesne To answer your question:
Things are working after switching to a custom parser around @msgpack/msgpack for socket.io.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants