Enable bulk-memory by default #22873

dschuff · 2024-11-06T23:17:17Z

Remove libbulkmemory _emscripten_memcpy_js and fold memcpy and memset into libc
- Use bulk memcpy/memset for Oz builds, but keep ASan behavior the same.
Move the zero-length check in memcpy from C into assembly, and add one for memset
Remove the use of -mno-bulk-memory at compile time (enabling it in object files)
Temporarily set the Safari version required to use bulk memory to a 14.1 (which has the effect of enabling it by default without enabling the other 14.1 features by default). This will be reverted when nontrapping-fptoint and bigint are also enabled by default.

See #23184

sbc100

Nice! So awesome to see this land!

system/lib/libc/emscripten_memcpy.c

system/lib/libc/emscripten_memcpy_bulkmem.S

system/lib/libc/emscripten_memset.c

dschuff · 2024-11-07T00:35:56Z

@sbc100 @tlively @aheejin @kripken for opinions:
The lowering pass currently refuses to run on a module that has atomics enabled, because we shouldn't be lowering away copy/fill if atomics are being used since we need passive segments anyway. The current logic runs the lowering pass anytime the link command line doesn't ask for a new enough browser version. That combination causes any test that links a library built with pthreads to fail. For example libfreetype is always built with -pthread, (I'm not sure why) so if you link a non-pthreads binary with libfreetype, the lowering pass will error out. Previous to this PR I think the result of linking any atomics-using library into a non-atomic output is that it will silently create a binary that uses atomics even though it wasn't asked for. Is the behavior we want? If so, I guess the easiest thing would be just to make the lowering pass not do anything if atomics are enabled. Another would be to make the behavior stricter so that you can't link atomics-using code into a non-atomics-using output (to avoid the surprise behavior); but that could possibly be annoying. A third option would be to make it explicit inside emcc and not run the lowering pass if atomics are enabled. I like the idea of the 2nd option, but it would require fixing the libraries that are compiled with threads (e.g. would we need a freetype-mt? maybe not, I don't actually see any atomic instructions inside libfreetype). And it could break users who are doing this linking (the fix would be to explicitly enable atomics or threads at link time; would that have bad consequences? I guess they'd have more stuff linked into their binary?)

edit: I should clarify that the case that would fail here with the current logic is when the build targets Safari 14.1 (the current default, but would be done manually in the future). The default behavior going forward would actually be the same as it is now, except that bulk memory would be enabled (i.e. the lowering pass would not be run, and linking an atomics-using library would cause the resulting binary to have atomics).

More generally speaking, or feature-enabling code in emscripten is a little inconsistent. e.g. with the current logic enabling WASM_BIGINT will automatically cause bulk memory and nontrapping-fp to be enabled because it implicitly causes Safari 15 to be selected, and then that selection determines whether the lowering passes run. I find that a little surprising, and there currently isn't a way to override it other then selecting a different browser version manually. Also the -mbulk-memory et al. feature flags don't work at link time to select features at a fine grain, only the browser versions work. That seems kind of bad to me, but might also be a pain to fix, for not much benefit.

sbc100 · 2024-11-07T01:37:21Z

The lowering pass currently refuses to run on a module that has atomics enabled, because we shouldn't be lowering away copy/fill if atomics are being used since we need passive segments anyway.

Why not just allow the pass to run in this case? The pass related to memory.copy and memory.fill only right? The passive segements and memory.init would be unaffected, no?

sbc100 · 2024-11-07T01:39:37Z

Previous to this PR I think the result of linking any atomics-using library into a non-atomic output is that it will silently create a binary that uses atomics even though it wasn't asked for. Is the behavior we want?

Thats a good question.. I can't remember the conclusion but I do remember that updated the spec such that browser would allow atomic instruction even in single threaded builds, so maybe this was why we did that. But if we want to support older browsers we would still want to be able to lower those atomics away at link time I think. How many such libraries do we have? Its certainly nice to be able to build libraries just once and avoid the -mt variants if we can.

dschuff · 2024-11-07T18:00:54Z

I can't remember the conclusion but I do remember that updated the spec such that browser would allow atomic instruction even in single threaded builds, so maybe this was why we did that. But if we want to support older browsers we would still want to be able to lower those atomics away at link time I think. How many such libraries do we have? Its certainly nice to be able to build libraries just once and avoid the -mt variants if we can.

Ah, right I forgot about allowing atomics in single-threaded builds. That makes sense then, that the default behavior for the default targets (which support bulk and atomics) would be to just allow atomics to pass through anytime. So for older browsers we currently do not support lowering away atomics at link time (only at compile time). Our current default target (Safari 14.1) actually does support atomics, but I guess we don't support targeting even older browsers while linking in atomic-using libraries. There aren't many emscripten bulitin library variants built this way (freetype might be the only one actually) so maybe this isn't a big problem. If we want to lean further into this direction, we could potentially even remove some '-mt' variants and just always link atomic versions.

Why not just allow the pass to run in this case? The pass related to memory.copy and memory.fill only right? The passive segements and memory.init would be unaffected, no?

We could do it, it would just be a pessimization for no reason; there are no targets AFAIK that support passive segments but not memory.copy/fill.

dschuff · 2024-11-07T21:31:37Z

After some discussion with @sbc100 we might do the following:

Keep the atomics passthrough linking behavior as-is. The consequence will (continue to) be that users targeting browsers older than Safari 14.1 will fail to load their module if they link libraries with atomics. So far this hasn't been a problem.
Make the lowering pass not do anything (or just run as normal) if atomics are enabled. This would make it harmless to run the pass when there are atomics (even if it's not useful).
Make the lowering pass error out if there are any other uses of bulk memory, e.g. passive segments or table operations. The reason is that it would be incorrect remove the bulk-memory feature if there are still uses of bulk memory, (and it would definitely be a bug for someone to run this pass on a module with other bulk memory usage).

dschuff · 2024-11-08T18:46:00Z

OK, I think the bulk memory part of this patch is ready; the tests are working and the comments so far are addressed.

As written, this PR does not update the default features, it only builds with build memory and turns on the lowering. This means the lowering will run by default. This isn't a state we want to release with, but it means we can land the changes for nontrapping-fp and updating the default separately. The other option is to add to this PR to enable the nontrapping-fp lowering (once we have test_sse fixed), and then update the defaults.
I maybe lean toward landing separately, but don't have a strong opinion.

system/lib/libc/emscripten_memcpy_bulkmem.S

tools/feature_matrix.py

sbc100 · 2024-11-22T01:52:11Z

tools/feature_matrix.py

@@ -156,6 +163,8 @@ def apply_min_browser_versions():
  if settings.PTHREADS:
    enable_feature(Feature.THREADS, 'pthreads')
    enable_feature(Feature.BULK_MEMORY, 'pthreads')
+  if settings.WASM_WORKERS or settings.SHARED_MEMORY:


Maybe else here? otherwise the feature will be add twice in pthread mode (which also enables SHARED_MEMORY)

Done. Although I don't think it actually matters, it should be safe to enable it more than once.

tools/system_libs.py

sbc100

lgtm!

This should fix the recent CI failures on the test-node-compat bot. Once #22873 lands this can, of course, be removed.

This should fix the recent CI failures on the test-node-compat bot. Once emscripten-core#22873 lands this can, of course, be removed.

sbc100 · 2024-12-19T19:29:28Z

system/lib/libc/emscripten_memcpy.c

@@ -8,21 +8,25 @@
 #include "libc.h"
 #include "emscripten_internal.h"

+#if !defined(__wasm_bulk_memory__)
+#error "This file must be compile with bulk memory enabled"
+#endif


I think we can just drop these 3 lines. I'm not sure what they achieve.

Ah yes, they were more useful when I had inline asm in this file.

system/lib/libc/emscripten_memset.c

test/test_other.py

dschuff · 2024-12-20T21:58:03Z

OK, I think this is actually completely working now. And I just realized @kripken doesn't have any comments so +cc just in case.

kripken

lgtm otherwise

kripken · 2024-12-20T22:04:17Z

test/code_size/hello_world_wasm2js.js

@@ -15,6 +15,7 @@ function e(b) {
            for (var v, p = 0, t = a, w = f.length, y = a + (3 * w >> 2) - ("=" == f[w - 2]) - ("=" == f[w - 1]); p < w; p += 4) a = m[f.charCodeAt(p + 1)], 
            v = m[f.charCodeAt(p + 2)], c[t++] = m[f.charCodeAt(p)] << 2 | a >> 4, t < y && (c[t++] = a << 4 | v >> 2), 
            t < y && (c[t++] = v << 6 | m[f.charCodeAt(p + 3)]);
+            return c;


Any idea what this is? Looks odd.

No, that does look odd. Hard to know why without knowing which function this actually is.

It's plausible that this is some function that in some way uses bulk memory.

Hmm, based on the charCodeAt and << 6 etc., this looks like base64Decode:

emscripten/src/base64Decode.js

Lines 38 to 58 in 4cbcb26

function base64Decode(b64) {

#if ENVIRONMENT_MAY_BE_NODE

if (typeof ENVIRONMENT_IS_NODE != 'undefined' && ENVIRONMENT_IS_NODE) {

var buf = Buffer.from(b64, 'base64');

return new Uint8Array(buf.buffer, buf.byteOffset, buf.byteLength);

}

#endif

#if ASSERTIONS

assert(b64.length % 4 == 0);

#endif

var b1, b2, i = 0, j = 0, bLength = b64.length, output = new Uint8Array((bLength*3>>2) - (b64[bLength-2] == '=') - (b64[bLength-1] == '='));

for (; i < bLength; i += 4, j += 3) {

b1 = base64ReverseLookup[b64.charCodeAt(i+1)];

b2 = base64ReverseLookup[b64.charCodeAt(i+2)];

output[j] = base64ReverseLookup[b64.charCodeAt(i)] << 2 | b1 >> 4;

output[j+1] = b1 << 4 | b2 >> 2;

output[j+2] = b2 << 6 | base64ReverseLookup[b64.charCodeAt(i+3)];

}

return output;

}

That doesn't use bulk memory AFAICT. Strange that it changed.

Anyhow, it must be a weird closure quirk, as x() is called once and the return value isn't used.

tools/feature_matrix.py

Co-authored-by: Alon Zakai <alonzakai@gmail.com>

…for one step

dschuff · 2024-12-21T00:15:31Z

I did find one other issue that I won't get fixed today, so documenting it here.
The current Binaryen lowering pass only removes memory.copy/fill but it doesn't remove the datacount section. I had thought this was OK because we don't include passive segments unless we are building with shared memory. But I didn't realize that the linker includes the datacount section unconditionally (when bulk-memory is enabled), and causes old Node to reject the modules. There are a couple of possible options to fix.

Make the linker not include a datacount section unless it also includes passive segments. It's valid to include when bulk-memory is enabled, but it's not needed in unless there are passive segments. In theory an engine could use the datacount section to make data section handling more memory-efficient or something, and it's only 5 bytes, but that's (probably?) not a big benefit.
Make the Binaryen lowering also remove the datacount section. This isn't necessarily valid in general (assuming the pass doesn't also remove passive segments) but would be for the Emscripten use case.

Either would be straightforward to implement, I haven't thought about which yet.

sbc100 · 2024-12-22T19:49:39Z

2. Make the Binaryen lowering also remove the datacount section. This isn't necessarily valid in general (assuming the pass doesn't also remove passive segments) but would be for the Emscripten use case.

I think this makes the most sense. The lowering pass is really only designed to llvm output and I think if there are any bulk-memory usages outside of bulk-memory-core (such as active segements) then its ok for the binaryn pass to simply error out. This means it should also be fine to strip the data count section.

Actually I think either approach is fine but that binaryen change will be simpler.

dschuff · 2025-01-06T18:04:29Z

I'm pretty sure test_offset_converter wasm64_4g is flake, I'm going to land this.

sbc100 · 2025-01-06T18:05:48Z

Yay!

sbc100 · 2025-01-06T18:09:40Z

Did we forget to update the Changelog for this? I wonder if we should just combine all 3 of the new features into a single changelog entry for 4.0?

dschuff · 2025-01-06T19:42:45Z

Ah yes we did. I can roll that into #23312

sbc100 · 2025-01-06T20:12:15Z

It looks like we still have this in emcc.py:

    if '-mbulk-memory' not in user_args:                                                             
      flags.append('-mbulk-memory')

dschuff added 4 commits November 6, 2024 14:57

toward always using bulkmem and lowering

96bf43a

remove libbulkmem, fix for memory64, various fixes

d38642b

more fixes

40e4fec

add lowering pass, fix some tests

ffc87f7

sbc100 reviewed Nov 6, 2024

View reviewed changes

system/lib/libc/emscripten_memcpy.c Outdated Show resolved Hide resolved

system/lib/libc/emscripten_memcpy_bulkmem.S Show resolved Hide resolved

system/lib/libc/emscripten_memset.c Outdated Show resolved Hide resolved

simplify return

8aa1812

require bulk memory

c099418

dschuff added 5 commits November 8, 2024 17:00

fix warnings

ec3483a

Merge branch 'main' into always-bulkmem

48c1f06

Merge branch 'main' into always-bulkmem

6a82b9b

Merge branch 'main' into always-bulkmem

aeabee7

Update safari version, update pass name, fixes

51ffac7

sbc100 reviewed Nov 22, 2024

View reviewed changes

system/lib/libc/emscripten_memcpy_bulkmem.S Show resolved Hide resolved

dschuff commented Nov 22, 2024

View reviewed changes

tools/feature_matrix.py Show resolved Hide resolved

add node versions

76eee63

sbc100 reviewed Nov 22, 2024

View reviewed changes

tools/system_libs.py Show resolved Hide resolved

sbc100 approved these changes Nov 22, 2024

View reviewed changes

dschuff added 6 commits November 25, 2024 13:38

rebaseline codesize tests

cd2d0af

review suggestions

747201c

Merge branch 'main' into always-bulkmem

b4aba23

rebaseline code size

4714852

Merge branch 'main' into always-bulkmem

e140c7d

add suggested comment

9da8ecc

sbc100 added a commit that referenced this pull request Dec 10, 2024

Default to -mno-bulk-memory-opt (#23126)

82182e6

This should fix the recent CI failures on the test-node-compat bot. Once #22873 lands this can, of course, be removed.

dschuff mentioned this pull request Dec 16, 2024

Updating default features to include WASM_BIGINT, bulk-memory, and nontrapping-fptoint #23184

Open

hedwigz pushed a commit to hedwigz/emscripten that referenced this pull request Dec 18, 2024

Default to -mno-bulk-memory-opt (emscripten-core#23126)

7d68c2f

This should fix the recent CI failures on the test-node-compat bot. Once emscripten-core#22873 lands this can, of course, be removed.

Merge branch 'main' into always-bulkmem

17b47b5

dschuff changed the title ~~Always use bulk memory at compile time~~ Enable bulk-memory by default Dec 19, 2024

sbc100 approved these changes Dec 19, 2024

View reviewed changes

dschuff added 8 commits December 19, 2024 13:14

review comments

c668e90

tweak test_wasm_features

7fc041b

Use MIN_NODE_VERSION in test-node-compat

47a4908

Merge branch 'main' into always-bulkmem

8523375

remove extra blank lines

22600c1

enable printing all features in test_dwarf

c20eee9

Add node version required for threads and mutable globals

216ea09

Merge branch 'main' into always-bulkmem

679927b

move node environment from job to step

8399076

kripken reviewed Dec 20, 2024

View reviewed changes

dschuff and others added 3 commits December 20, 2024 14:26

review suggestion

3caad8d

Co-authored-by: Alon Zakai <alonzakai@gmail.com>

Make CI param for extra cflags, so node version can be injected only …

7bbbb67

…for one step

actually fix the node config by moving the env var to the intended place

c3c901b

dschuff added 2 commits January 3, 2025 16:24

Merge branch 'main' into always-bulkmem

4a35ba2

Merge branch 'main' into always-bulkmem

5f62d73

dschuff merged commit 8f63761 into emscripten-core:main Jan 6, 2025
27 of 29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable bulk-memory by default #22873

Enable bulk-memory by default #22873

dschuff commented Nov 6, 2024 •

edited

Loading

sbc100 left a comment

dschuff commented Nov 7, 2024 •

edited

Loading

sbc100 commented Nov 7, 2024

sbc100 commented Nov 7, 2024

dschuff commented Nov 7, 2024

dschuff commented Nov 7, 2024 •

edited

Loading

dschuff commented Nov 8, 2024

sbc100 Nov 22, 2024

dschuff Nov 26, 2024

sbc100 left a comment

sbc100 Dec 19, 2024

dschuff Dec 19, 2024

dschuff commented Dec 20, 2024

kripken left a comment

kripken Dec 20, 2024

dschuff Dec 20, 2024

dschuff Dec 20, 2024

kripken Dec 20, 2024

kripken Dec 20, 2024

dschuff commented Dec 21, 2024

sbc100 commented Dec 22, 2024

dschuff commented Jan 6, 2025

sbc100 commented Jan 6, 2025

sbc100 commented Jan 6, 2025

dschuff commented Jan 6, 2025

sbc100 commented Jan 6, 2025

	function base64Decode(b64) {
	#if ENVIRONMENT_MAY_BE_NODE
	if (typeof ENVIRONMENT_IS_NODE != 'undefined' && ENVIRONMENT_IS_NODE) {
	var buf = Buffer.from(b64, 'base64');
	return new Uint8Array(buf.buffer, buf.byteOffset, buf.byteLength);
	}
	#endif

	#if ASSERTIONS
	assert(b64.length % 4 == 0);
	#endif
	var b1, b2, i = 0, j = 0, bLength = b64.length, output = new Uint8Array((bLength*3>>2) - (b64[bLength-2] == '=') - (b64[bLength-1] == '='));
	for (; i < bLength; i += 4, j += 3) {
	b1 = base64ReverseLookup[b64.charCodeAt(i+1)];
	b2 = base64ReverseLookup[b64.charCodeAt(i+2)];
	output[j] = base64ReverseLookup[b64.charCodeAt(i)] << 2 \| b1 >> 4;
	output[j+1] = b1 << 4 \| b2 >> 2;
	output[j+2] = b2 << 6 \| base64ReverseLookup[b64.charCodeAt(i+3)];
	}
	return output;
	}

Enable bulk-memory by default #22873

Enable bulk-memory by default #22873

Conversation

dschuff commented Nov 6, 2024 • edited Loading

sbc100 left a comment

Choose a reason for hiding this comment

dschuff commented Nov 7, 2024 • edited Loading

sbc100 commented Nov 7, 2024

sbc100 commented Nov 7, 2024

dschuff commented Nov 7, 2024

dschuff commented Nov 7, 2024 • edited Loading

dschuff commented Nov 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbc100 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dschuff commented Dec 20, 2024

kripken left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dschuff commented Dec 21, 2024

sbc100 commented Dec 22, 2024

dschuff commented Jan 6, 2025

sbc100 commented Jan 6, 2025

sbc100 commented Jan 6, 2025

dschuff commented Jan 6, 2025

sbc100 commented Jan 6, 2025

dschuff commented Nov 6, 2024 •

edited

Loading

dschuff commented Nov 7, 2024 •

edited

Loading

dschuff commented Nov 7, 2024 •

edited

Loading