`encode` is a crazy memory hog #27

Domiii · 2021-08-26T15:41:57Z

Due to unoptimized algorithm (as also discussed in #12), encode is a memory hog (I have not looked at decode yet).
I decided to post this as a separate issue, since the other issue's title does not capture the problem, and the discussion mostly focusses on execution speed, not on memory issues.

In my case, I am sending data with socket.io, and this is my journey:

I am sending about 100M values (nested in one object)
Initially it crashed on me because during encode it ran out of memory. I had to increase node's RAM limit to --max-old-space-size=8192.
The final buffer size is 298,406,623
It turns out that the recursive _encode call itself required 4GB of additional memory (even though, as mentioned above, buffer size is less than 300MB total).
- It went from 1.2GB in the beginning to 5.2GB in the end. Afterwards, all mem pressure disappeared again. I'm rather confident, the problem is in the encode algorithm itself.
- NOTE: I measured this via process.memoryUsage(). All three (rss, heapTotal, and heapUsed) show the same trend.

Possible Solution

I strongly suggest to heed manast's suggestion to use a direct buffer allocation approach. In case that buffer size is unknown, just run the algorithm once to compute buffer size and index positions, then re-run to actually populate, rather than using the current approach of creating temporary utility objects. This should come at a much lower memory (and probably CPU) cost, than the current version.

I know the owner currently does not have time to work on this, but one can dream :)

The text was updated successfully, but these errors were encountered:

darrachequesne · 2021-08-26T21:38:13Z

Thanks for the detailed report 👍

What are the 100M values? Plain strings? Could you please share some code reproducing the issue?

Did you try with another messagepack implementation like @msgpack/msgpack or what-the-pack? Do you encounter the same behavior?

Domiii · 2021-08-27T14:17:17Z

The values are mostly objects in arrays and they are nested a few times (some 5 to 7 layers deep). The raw values are mostly numbers, and some strings. (But there is no circular references; I'm rather certain.)
I don't think other msgpack implementations have a socket.io parser, do they?
I cannot really re-produce an isolated sample right now (timewise)

But I can offer a few more insights regarding defers. I just ran a sample:

23.8M values
Increase in memory: 1.6G between before the recursive _encode call and after (from 1.2 -> 2.8)
bytes.length ~ 53M
defers.length ~ 11.5M
final buf size ~ 141M

It does not seem impossible that the defers array is the culprit.

Do you want to try to create your own sample with some dummy arrays containing a ton of strings?

joshxyzhimself · 2022-06-02T05:08:58Z

Isn't it better to use compression algorithms (like gzip) to encode data of huge sizes (e.g. 10MB and above)? On client-side (browser) maybe pako can decode it on the main thread or on the worker thread.

uWebSockets.js is a great alternative to socket.io and other web frameworks too (e.g. express, koa, hapi, ws)

Domiii · 2022-06-02T13:34:07Z

@joshxyzhimself The issue here is with encode, not with encryption or the transport layer. encode has a memory leak causing it to gobble up 4+gb of memory (and then crash) to encode only 200+MB of data (arrays, objects, strings, numbers).

@darrachequesne To answer your question:
Things are working after switching to a custom parser around @msgpack/msgpack for socket.io.

Domiii changed the title ~~encode its up all memory~~ encode eats up all memory Aug 26, 2021

Domiii changed the title ~~encode eats up all memory~~ encode is a crazy memory hog Aug 26, 2021

Domiii mentioned this issue Aug 26, 2021

fix notepack's encoding problems Domiii/dbux#570

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`encode` is a crazy memory hog #27

`encode` is a crazy memory hog #27

Domiii commented Aug 26, 2021 •

edited

Loading

darrachequesne commented Aug 26, 2021

Domiii commented Aug 27, 2021 •

edited

Loading

joshxyzhimself commented Jun 2, 2022

Domiii commented Jun 2, 2022

encode is a crazy memory hog #27

encode is a crazy memory hog #27

Comments

Domiii commented Aug 26, 2021 • edited Loading

Possible Solution

darrachequesne commented Aug 26, 2021

Domiii commented Aug 27, 2021 • edited Loading

joshxyzhimself commented Jun 2, 2022

Domiii commented Jun 2, 2022

`encode` is a crazy memory hog #27

`encode` is a crazy memory hog #27

Domiii commented Aug 26, 2021 •

edited

Loading

Domiii commented Aug 27, 2021 •

edited

Loading