Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable bulk-memory by default #22873

Merged
merged 37 commits into from
Jan 6, 2025
Merged

Conversation

dschuff
Copy link
Member

@dschuff dschuff commented Nov 6, 2024

  • Remove libbulkmemory _emscripten_memcpy_js and fold memcpy and memset into libc
    • Use bulk memcpy/memset for Oz builds, but keep ASan behavior the same.
  • Move the zero-length check in memcpy from C into assembly, and add one for memset
  • Remove the use of -mno-bulk-memory at compile time (enabling it in object files)
  • Temporarily set the Safari version required to use bulk memory to a 14.1 (which has the effect of enabling it by default without enabling the other 14.1 features by default). This will be reverted when nontrapping-fptoint and bigint are also enabled by default.

See #23184

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! So awesome to see this land!

system/lib/libc/emscripten_memcpy.c Outdated Show resolved Hide resolved
system/lib/libc/emscripten_memcpy_bulkmem.S Show resolved Hide resolved
system/lib/libc/emscripten_memset.c Outdated Show resolved Hide resolved
@dschuff
Copy link
Member Author

dschuff commented Nov 7, 2024

@sbc100 @tlively @aheejin @kripken for opinions:
The lowering pass currently refuses to run on a module that has atomics enabled, because we shouldn't be lowering away copy/fill if atomics are being used since we need passive segments anyway. The current logic runs the lowering pass anytime the link command line doesn't ask for a new enough browser version. That combination causes any test that links a library built with pthreads to fail. For example libfreetype is always built with -pthread, (I'm not sure why) so if you link a non-pthreads binary with libfreetype, the lowering pass will error out. Previous to this PR I think the result of linking any atomics-using library into a non-atomic output is that it will silently create a binary that uses atomics even though it wasn't asked for. Is the behavior we want? If so, I guess the easiest thing would be just to make the lowering pass not do anything if atomics are enabled. Another would be to make the behavior stricter so that you can't link atomics-using code into a non-atomics-using output (to avoid the surprise behavior); but that could possibly be annoying. A third option would be to make it explicit inside emcc and not run the lowering pass if atomics are enabled. I like the idea of the 2nd option, but it would require fixing the libraries that are compiled with threads (e.g. would we need a freetype-mt? maybe not, I don't actually see any atomic instructions inside libfreetype). And it could break users who are doing this linking (the fix would be to explicitly enable atomics or threads at link time; would that have bad consequences? I guess they'd have more stuff linked into their binary?)

edit: I should clarify that the case that would fail here with the current logic is when the build targets Safari 14.1 (the current default, but would be done manually in the future). The default behavior going forward would actually be the same as it is now, except that bulk memory would be enabled (i.e. the lowering pass would not be run, and linking an atomics-using library would cause the resulting binary to have atomics).

More generally speaking, or feature-enabling code in emscripten is a little inconsistent. e.g. with the current logic enabling WASM_BIGINT will automatically cause bulk memory and nontrapping-fp to be enabled because it implicitly causes Safari 15 to be selected, and then that selection determines whether the lowering passes run. I find that a little surprising, and there currently isn't a way to override it other then selecting a different browser version manually. Also the -mbulk-memory et al. feature flags don't work at link time to select features at a fine grain, only the browser versions work. That seems kind of bad to me, but might also be a pain to fix, for not much benefit.

@sbc100
Copy link
Collaborator

sbc100 commented Nov 7, 2024

The lowering pass currently refuses to run on a module that has atomics enabled, because we shouldn't be lowering away copy/fill if atomics are being used since we need passive segments anyway.

Why not just allow the pass to run in this case? The pass related to memory.copy and memory.fill only right? The passive segements and memory.init would be unaffected, no?

@sbc100
Copy link
Collaborator

sbc100 commented Nov 7, 2024

Previous to this PR I think the result of linking any atomics-using library into a non-atomic output is that it will silently create a binary that uses atomics even though it wasn't asked for. Is the behavior we want?

Thats a good question.. I can't remember the conclusion but I do remember that updated the spec such that browser would allow atomic instruction even in single threaded builds, so maybe this was why we did that. But if we want to support older browsers we would still want to be able to lower those atomics away at link time I think. How many such libraries do we have? Its certainly nice to be able to build libraries just once and avoid the -mt variants if we can.

@dschuff
Copy link
Member Author

dschuff commented Nov 7, 2024

I can't remember the conclusion but I do remember that updated the spec such that browser would allow atomic instruction even in single threaded builds, so maybe this was why we did that. But if we want to support older browsers we would still want to be able to lower those atomics away at link time I think. How many such libraries do we have? Its certainly nice to be able to build libraries just once and avoid the -mt variants if we can.

Ah, right I forgot about allowing atomics in single-threaded builds. That makes sense then, that the default behavior for the default targets (which support bulk and atomics) would be to just allow atomics to pass through anytime. So for older browsers we currently do not support lowering away atomics at link time (only at compile time). Our current default target (Safari 14.1) actually does support atomics, but I guess we don't support targeting even older browsers while linking in atomic-using libraries. There aren't many emscripten bulitin library variants built this way (freetype might be the only one actually) so maybe this isn't a big problem. If we want to lean further into this direction, we could potentially even remove some '-mt' variants and just always link atomic versions.

Why not just allow the pass to run in this case? The pass related to memory.copy and memory.fill only right? The passive segements and memory.init would be unaffected, no?

We could do it, it would just be a pessimization for no reason; there are no targets AFAIK that support passive segments but not memory.copy/fill.

@dschuff
Copy link
Member Author

dschuff commented Nov 7, 2024

After some discussion with @sbc100 we might do the following:

  1. Keep the atomics passthrough linking behavior as-is. The consequence will (continue to) be that users targeting browsers older than Safari 14.1 will fail to load their module if they link libraries with atomics. So far this hasn't been a problem.
  2. Make the lowering pass not do anything (or just run as normal) if atomics are enabled. This would make it harmless to run the pass when there are atomics (even if it's not useful).
  3. Make the lowering pass error out if there are any other uses of bulk memory, e.g. passive segments or table operations. The reason is that it would be incorrect remove the bulk-memory feature if there are still uses of bulk memory, (and it would definitely be a bug for someone to run this pass on a module with other bulk memory usage).

@dschuff
Copy link
Member Author

dschuff commented Nov 8, 2024

OK, I think the bulk memory part of this patch is ready; the tests are working and the comments so far are addressed.

As written, this PR does not update the default features, it only builds with build memory and turns on the lowering. This means the lowering will run by default. This isn't a state we want to release with, but it means we can land the changes for nontrapping-fp and updating the default separately. The other option is to add to this PR to enable the nontrapping-fp lowering (once we have test_sse fixed), and then update the defaults.
I maybe lean toward landing separately, but don't have a strong opinion.

@@ -156,6 +163,8 @@ def apply_min_browser_versions():
if settings.PTHREADS:
enable_feature(Feature.THREADS, 'pthreads')
enable_feature(Feature.BULK_MEMORY, 'pthreads')
if settings.WASM_WORKERS or settings.SHARED_MEMORY:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe else here? otherwise the feature will be add twice in pthread mode (which also enables SHARED_MEMORY)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Although I don't think it actually matters, it should be safe to enable it more than once.

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

sbc100 added a commit that referenced this pull request Dec 10, 2024
This should fix the recent CI failures on the test-node-compat bot. Once
#22873 lands this can, of course, be removed.
hedwigz pushed a commit to hedwigz/emscripten that referenced this pull request Dec 18, 2024
This should fix the recent CI failures on the test-node-compat bot. Once
emscripten-core#22873 lands this can, of course, be removed.
@dschuff dschuff changed the title Always use bulk memory at compile time Enable bulk-memory by default Dec 19, 2024
@@ -8,21 +8,25 @@
#include "libc.h"
#include "emscripten_internal.h"

#if !defined(__wasm_bulk_memory__)
#error "This file must be compile with bulk memory enabled"
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just drop these 3 lines. I'm not sure what they achieve.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, they were more useful when I had inline asm in this file.

system/lib/libc/emscripten_memset.c Outdated Show resolved Hide resolved
test/test_other.py Outdated Show resolved Hide resolved
@dschuff
Copy link
Member Author

dschuff commented Dec 20, 2024

OK, I think this is actually completely working now. And I just realized @kripken doesn't have any comments so +cc just in case.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm otherwise

@@ -15,6 +15,7 @@ function e(b) {
for (var v, p = 0, t = a, w = f.length, y = a + (3 * w >> 2) - ("=" == f[w - 2]) - ("=" == f[w - 1]); p < w; p += 4) a = m[f.charCodeAt(p + 1)],
v = m[f.charCodeAt(p + 2)], c[t++] = m[f.charCodeAt(p)] << 2 | a >> 4, t < y && (c[t++] = a << 4 | v >> 2),
t < y && (c[t++] = v << 6 | m[f.charCodeAt(p + 3)]);
return c;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea what this is? Looks odd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that does look odd. Hard to know why without knowing which function this actually is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's plausible that this is some function that in some way uses bulk memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, based on the charCodeAt and << 6 etc., this looks like base64Decode:

function base64Decode(b64) {
#if ENVIRONMENT_MAY_BE_NODE
if (typeof ENVIRONMENT_IS_NODE != 'undefined' && ENVIRONMENT_IS_NODE) {
var buf = Buffer.from(b64, 'base64');
return new Uint8Array(buf.buffer, buf.byteOffset, buf.byteLength);
}
#endif
#if ASSERTIONS
assert(b64.length % 4 == 0);
#endif
var b1, b2, i = 0, j = 0, bLength = b64.length, output = new Uint8Array((bLength*3>>2) - (b64[bLength-2] == '=') - (b64[bLength-1] == '='));
for (; i < bLength; i += 4, j += 3) {
b1 = base64ReverseLookup[b64.charCodeAt(i+1)];
b2 = base64ReverseLookup[b64.charCodeAt(i+2)];
output[j] = base64ReverseLookup[b64.charCodeAt(i)] << 2 | b1 >> 4;
output[j+1] = b1 << 4 | b2 >> 2;
output[j+2] = b2 << 6 | base64ReverseLookup[b64.charCodeAt(i+3)];
}
return output;
}

That doesn't use bulk memory AFAICT. Strange that it changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyhow, it must be a weird closure quirk, as x() is called once and the return value isn't used.

tools/feature_matrix.py Outdated Show resolved Hide resolved
@dschuff
Copy link
Member Author

dschuff commented Dec 21, 2024

I did find one other issue that I won't get fixed today, so documenting it here.
The current Binaryen lowering pass only removes memory.copy/fill but it doesn't remove the datacount section. I had thought this was OK because we don't include passive segments unless we are building with shared memory. But I didn't realize that the linker includes the datacount section unconditionally (when bulk-memory is enabled), and causes old Node to reject the modules. There are a couple of possible options to fix.

  1. Make the linker not include a datacount section unless it also includes passive segments. It's valid to include when bulk-memory is enabled, but it's not needed in unless there are passive segments. In theory an engine could use the datacount section to make data section handling more memory-efficient or something, and it's only 5 bytes, but that's (probably?) not a big benefit.
  2. Make the Binaryen lowering also remove the datacount section. This isn't necessarily valid in general (assuming the pass doesn't also remove passive segments) but would be for the Emscripten use case.

Either would be straightforward to implement, I haven't thought about which yet.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 22, 2024

2. Make the Binaryen lowering also remove the datacount section. This isn't necessarily valid in general (assuming the pass doesn't also remove passive segments) but would be for the Emscripten use case.

I think this makes the most sense. The lowering pass is really only designed to llvm output and I think if there are any bulk-memory usages outside of bulk-memory-core (such as active segements) then its ok for the binaryn pass to simply error out. This means it should also be fine to strip the data count section.

Actually I think either approach is fine but that binaryen change will be simpler.

@dschuff
Copy link
Member Author

dschuff commented Jan 6, 2025

I'm pretty sure test_offset_converter wasm64_4g is flake, I'm going to land this.

@dschuff dschuff merged commit 8f63761 into emscripten-core:main Jan 6, 2025
27 of 29 checks passed
@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2025

Yay!

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2025

Did we forget to update the Changelog for this? I wonder if we should just combine all 3 of the new features into a single changelog entry for 4.0?

@dschuff
Copy link
Member Author

dschuff commented Jan 6, 2025

Ah yes we did. I can roll that into #23312

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2025

It looks like we still have this in emcc.py:

    if '-mbulk-memory' not in user_args:                                                             
      flags.append('-mbulk-memory')   

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants