-
-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Add globalPreload to ts-node/esm for node 20 #2009
base: main
Are you sure you want to change the base?
Conversation
Not sure what kind of test should be added for this, since it just returns a string. Just comparing against a fixture feels kind of redundant? |
Hm, this definitely needs a bit more work, because it breaks on node versions that don't have the off-thread loader hooks. (Ie, everything before v20.) I'm not sure the best idiomatic approach to that in this project, but I'm not sure how to detect the situation where it's required, other than sniffing the version. |
If it helps, in node 19 vs node 20, this file: export function globalPreload() {
console.log('globalPreload');
return '';
}
export function getGlobalPreloadCode() {
console.log('getGlobalPreloadCode');
return '';
} logs, in node 19:
and in node 20:
… so, you could probably intercept console.log at module level, catch the experimental warning (and suppress it), and use it to indicate whether you were in 20+ or not? |
@ljharb ooh sneaky, that might work. I'll look into that. |
Would this allow Eagerly awaiting the fix on this so we can support development on Node 20 🙏 |
Historically I use version sniffing for this stuff, so I'd go with that. It's happened several times before that ts-node needs to implement multiple behaviors depending on the version of node or ts. Something to keep in mind: In typechecking mode, is the typechecking work being repeated on both threads? Keeping in mind that typechecking one file involves parsing the others, so CJS files typecheck with type info from ESM files and vice-versa. Repeated typechecking work is not a dealbreaker, but it's something we should at least document in this thread. |
Codecov Report
... and 3 files with indirect coverage changes 📢 Have feedback on the report? Share it here. |
Added a commit to @cspotcode As far as I can tell, with this change, I haven't looked into type checking, but my guess is, if |
99a5b73
to
6b9daf0
Compare
Got some time to dig into this. The issue with |
Aha, of course. The
The only way this can work on node 20 is for |
Ok, got Verified that typechecking does not get run twice in the presence of an off-thread loader, which in hindsight makes sense, since either the source is being loaded and transpiled once in the loader thread, or once in the main (only) thread, but never both. The whole point is that the @cspotcode PTAL when you get a chance :) |
Thanks, I haven't had a chance to take a look yet, but WRT the double-typechecking: |
I've been trying to throw some complicated scenarios at it, but I'm not sure how to go about triggering the situation you're thinking of here. The userland program isn't ever loaded or typechecked by TS in the loader thread. So if there's double-typechecking happening, it seems to me that it'd be either (a) the loading of ts files from within ts-node itself (or its deps), or (b) already a problem without a loader-thread (ie, if double-checking is happening in a single-threaded loader environment). Maybe there's something I'm missing? But it seems like this can't possibly make the problem significantly worse. If you have a test or example I can poke at, I'm happy to try to dig in further. |
Er, rather, it's not executed on the loader thread. Obviously the code is loaded in the loader thread. But the typechecking happens when it compiles it to JS, and that only happens in one place. What actually ends up in the main thread is JavaScript going straight to the node VM. |
If double-typecheck is happening, the only visible side-effect would be higher CPU usage. So it's not a problem, per-se, it just means potentially a performance regression on node 20. If code on the main thread does The scenario I imagine is:
Where The loader thread's Then the main thread does |
The scenario above could also happen if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks. I wanted to give a single, complete review after testing this locally, but I've been busy. These are the notes I have so far.
- We should move as much of the logic into
child-loader.ts
andesm.ts
, out of the.mjs
files which are meant to be shims as thin as possible. globalPreload
should be part of the hooks returned by ourcreateEsmHooks
API
esm.mjs
Outdated
|
||
// Affordance for node 20, where load() happens in an isolated thread | ||
const offThreadLoader = versionGteLt(process.versions.node, '20.0.0'); | ||
export const globalPreload = () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this hook be included in the return value of https://typestrong.org/ts-node/api/index.html#createEsmHooks?
That API is meant for anyone wanting to wrap our loader in their own logic. They'd do --loader my-loader.mjs
with their my-loader.mjs
exporting all the functions they get from calling createEsmHooks()
, perhaps wrapping them to implement new behavior.
This API predates support for multiple loaders, but also it should be possible to compose loaders in code so that end-users don't have to pass --loader
twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything is just as composable as it was, with the exception that globalPreload
is a bit weird to stack. (You have to append all the strings together wrapped in IIFE's so they don't clobber each others' vars.)
But multiple loaders have been supported longer than globalPreload
was needed, so that might not matter.
child-loader.mjs
Outdated
const require = createRequire(fileURLToPath(import.meta.url)); | ||
|
||
// TODO why use require() here? I think we can just `import` | ||
/** @type {import('./dist/child-loader')} */ | ||
const childLoader = require('./dist/child/child-loader'); | ||
export const { resolve, load, getFormat, transformSource } = childLoader; | ||
|
||
// On node v20, we cannot lateBind the hooks from outside the loader thread | ||
// so it has to be done here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, can we lateBind by sending config through the MessagePort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's effectively what's happening here. It's calling bootstrap to late-bind the hooks with the state passed in on the loader URL.
I suppose it could get that information via the message port, but if we have the config we need on the URL, I'm not sure what that gets us?
src/child/spawn-child.ts
Outdated
const child = spawn( | ||
process.execPath, | ||
[ | ||
'--require', | ||
require.resolve('./child-require.js'), | ||
'--loader', | ||
// Node on Windows doesn't like `c:\` absolute paths here; must be `file:///c:/` | ||
pathToFileURL(require.resolve('../../child-loader.mjs')).toString(), | ||
loaderURL.toString(), | ||
require.resolve('./child-entrypoint.js'), | ||
`${argPrefix}${compress(state)}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: can we include state only once to reduce the risk we hit CommandLine
length limit on Windows?
Consider how this will look with forthcoming register()
API. If register()
will allow sending the state payload in-memory, akin to current lateBind
logic, then consider using a pattern that looks similar today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good call, I guess child-entrypoint should just pull it from the loader URL if it needs it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so the tricky bit here is that the child-entrypoint is modifying and re-stashing the value in the state's argv for child processes, so storing it only in the loader url execArgv involves a bit more refactoring. Can definitely be done, but I'd recommend putting it off for a second commit (if not second PR) just to ensure edge cases are handled properly.
Thinking through the double-checking scenario, I think you're right, but I don't think there's much to be done about it. The good news is, once source-returning commonjs loaders land in node, and ts-node starts using that instead, then the problem goes away (along with several others!), since Cleaned up where the |
The ideal until node supports CJS-via-loaders is that we use the message port to delegate all compilation into the loader thread. But that requires more work and it's not necessary to get this merged. |
I was actually going to suggest something similar, having been in this code a little bit now. Using the Then at least on node versions from 14.17 up, you could use the same approach for all loaders, whether they're running in a separate thread or not. It does add a tiny bit of unnecessary serialization overhead if loaders are already on the main thread and can just call the function directly, but not much. If the intermediate child process spawned by |
Does
I'm not too worried about that. We should use the same approach as with the
When will the port be absent? My understanding is that it's always present w/off-thread loaders. Are there any non-EOLed node versions that don't have it?
This seems not a big deal. Main and worker threads are making a blocking call to |
It is always present with off-thread loaders. But it seems I was completely mistaken about this, and while diagnostics_channel is synchronous, it doesn't (as yet) cross over to the loader thread, so you do still need to proxy through the globalPreload context.port, at least until we get |
Technically speaking, it's not present on 16.0 through 16.11, which doesn't EOL until September 2023, but in those cases we don't bother to use a globalPreload, so it's not an issue. |
It is synchronous: nodejs/node#46826 |
He is right, |
For the sake of anyone else reading along: A combination of The goal is to make blocking calls from the main thread to the loader thread, using So it is possible today for Sounds like the question is whether bootstrapping must be async. Do we have to |
At least according to the docs in the PR, |
Updated to remove line number source map tests for repl and eval. (If you'd rather take on the refactoring to try to get it to work, that's fine, but imo that should be a separate PR at least.) Added
Would be good to add some tests for |
bb2e9f1
to
be3b34f
Compare
This removes support for keeping import assertions, which were broken in swc at some point, and unconditionally transpiled into import attributes. (Ie, `import/with` instead of `import/assert`.) No version of node supports import attributes with this syntax yet, so anyone using swc to import json in ESM is out of luck no matter what. And swc 1.3.83 broke the option that ts-node was using. The position of the swc project is that experimental features are not supported, and may change in patch versions without warning, making them unsafe to rely on (as evidenced here, and the reason why this behavior changed unexpectedly in the first place). Better to just not use experimental swc features, and let it remove import assertions rather than transpile them into something that node can't run. Fix: TypeStrong#2056
As of node v20, loader hooks are executed in a separate isolated thread environment. As a result, they are unable to register the `require.extensions` hooks in a way that would (in other node versions) make both CJS and ESM work as expected. By adding a `globalPreload` method, which *does* execute in the main script environment (but with very limited capabilities), these hooks can be attached properly, and `--loader=ts-node/esm` will once again make both cjs and esm typescript programs work properly.
When running `ts-node --esm`, a child process is spawned with the `child-loader.mjs` loader, `dist/child/child-entrypoint.js` main, and `argv[2]` set to the base64 encoded compressed configuration payload. `child-loader.mjs` imports and re-exports the functions defined in `src/child/child-loader.ts`. These are initially set to empty loader hooks which call the next hook in line until they are defined by calling `lateBindHooks()`. `child-entrypoint.ts` reads the config payload from argv, and bootstraps the registration process, which then calls `lateBindHooks()`. Presumably, the reason for this hand-off is because `--loader` hooks do not have access to `process.argv`. Unfortunately, in node 20, they don't have access to anything else, either, so calling `lateBindHooks` is effectively a no-op; the `child-loader.ts` where the hooks end up getting bound is not the same one that is being used as the actual loader. To solve this, the following changes are added: 1. An `isLoaderThread` flag is added to the BootstrapState. If this flag is set, then no further processing is performed beyond binding the loader hooks. 2. `callInChild` adds the config payload to _both_ the argv and the loader URL as a query param. 3. In the `child-loader.mjs` loader, only on node v20 and higher, the config payload is read from `import.meta.url`, and `bootstrap` is called, setting the `isLoaderThread` flag. I'm not super enthusiastic about this implementation. It definitely feels like there's a refactoring opportunity to clean it up, as it adds some copypasta between child-entrypoint.ts and child-loader.mjs. A further improvement would be to remove the late-binding handoff complexity entirely, and _always_ pass the config payload on the loader URL rather than on process.argv.
When an error is thrown in the loader thread, it must be passed through the comms channel to be printed in the main thread. Node has some heuristics to try to reconstitute errors properly, but they don't function very well if the error has a custom inspect method, or properties that are not compatible with JSON.stringify, so the TSErrors raised by the source transforms don't get printed in any sort of useful way. This catches those errors, and creates a new error that can go through the comms channel intact. Another possible approach would be to update the shape of the errors raised by source transforms, but that would be a much more extensive change with further reaching consequences.
Set the default `options.pretty` value based on stderr rather than stdout, as this is where errors are printed. The loader thread does not get a process.stderr.isTTY set, because its "stderr" is actually a pipe. If `options.pretty` is not set explicitly, the GlobalPreload's `context.port` is used to send a message from the main thread indicating the state of stderr.isTTY. Adds `Service.setPrettyErrors` method to enable setting this value when needed.
The @cspotcode/source-map-support module does not function properly on Node 20, resulting in incorrect stack traces. Fortunately, the built-in source map support in Node is now quite reliable. This does have the following (somewhat subtle) changes to error output: - When a call site is in a method defined within a constructor function, it picks up the function name *as well as* the type name and method name. So, in tests where a method is called and throws within the a function constructor, we see `Foo.Foo.bar` instead of `Foo.bar`. - Call sites displays show filenames instead of file URLs. - The call site display puts the `^` character under the `throw` rather than the construction point of the error object. This is closer to how normal un-transpiled JavaScript behaves, and thus somewhat preferrable, but isn't possible when all we have to go on is the Error stack property, so it is a change. I haven't been able to figure out why exactly, but the call sites appear to be somewhat different in the repl/eval contexts as a result of this change. It almost seems like the @cspotcode/source-map-support was applying source maps to the vm-evaluated scripts, but I don't see how that could be, and in fact, there's a comment in the code stating that that *isn't* the case. But the line number showing up in an Error.stack property is `1` prior to this change (matching the location in the TS source) and is `2` afterwards (matching the location in the compiled JS). An argument could be made that specific line numbers are a bit meaningless in a REPL anyway, and the best approach is to just make those tests accept either result. One possible approach to provide built-in source map support for the repl would be to refactor the `appendCompileAndEvalInput` to diff and append the *input* TS, and compile within the `runInContext` method. If the transpiled code was prepended with `process.setSourceMapsEnabled(true);`, then Error stacks and call sites would be properly source mapped by Node internally.
This also adds a type for the loader hooks API v3, as globalPreload is scheduled for removal in node v21, at which point '--loader ts-node/esm' will no longer work, and '--import ts-node/import' will be the way forward.
8f1f73f
to
3fd7b4f
Compare
3fd7b4f
to
b614b1b
Compare
…g it in here to quiet the CI on other PRs
…g it in here to quiet the CI on other PRs
This fix (or at least |
As of node v20, loader hooks are executed in a separate isolated thread environment. As a result, they are unable to register the
require.extensions
hooks in a way that would (in other node versions) make both CJS and ESM work as expected.By adding a
globalPreload
method, which does execute in the main script environment (but with very limited capabilities), these hooks can be attached properly, and--loader=ts-node/esm
will once again make both cjs and esm typescript programs work properly.