Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

child_process.execFile may resolve with incomplete output #56430

Open
jcplist opened this issue Jan 2, 2025 · 8 comments
Open

child_process.execFile may resolve with incomplete output #56430

jcplist opened this issue Jan 2, 2025 · 8 comments

Comments

@jcplist
Copy link

jcplist commented Jan 2, 2025

Version

v20.18.0

Platform

Linux .. 6.8.0-49-generic #49~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Nov  6 17:42:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

child_process

What steps will reproduce the bug?

script.js:

import cp from 'child_process';
import util from 'util';

const throttle = Number(process.argv[2]);
const execFile = util.promisify(cp.execFile);

let i = 0;
const ps = [];
while (i < throttle) {
  const p = execFile(
    'python3',
    ['-c', 'import time; time.sleep(0.9); print(\'Hello, World!\')'],
    { timeout: 1000 },
  );
  ps.push(p);
  i += 1;
};

ps.forEach(async (p) => {
  try {
    const { stdout, stderr } = await p;
    if (!stdout) {
      console.log(p.child.killed);
    }
  }
  catch (e) {
    console.log(e);
  }
});

In command line, run:

node script.js 3000

How often does it reproduce? Is there a required condition?

The bug is always reproducible on a 32-core computer with low background cpu usage.

What is the expected behavior? Why is that the expected behavior?

Assuming no promise is rejected. Nothing should be logged to the terminal, and the main program should exit with code 0, indicating all subprocesses resolved successfully with complete stdout output.

What do you see instead?

Many subprocesses resolve with an empty stdout, and their child.killed attributes are logged in the terminal. The logged output consists of multiple consecutive true values, followed by multiple consecutive false values.

Additional information

Related Issue

The root cause behind the bug is the same as https://github.com/orgs/nodejs/discussions/47062. The discussion had several mistakes, which results in a different conclusion.

Root Cause

The internal kill function of execFile destroys stdout and stderr streams, leading to a race condition when the following requirement is met:

  • The child process exits with code 0, but some data remains unread by Node.js.

When this occurs, the race condition manifests as one of two scenarios:

  1. The timeout triggers before child._handle.onexit completes:

    • The kill function destroys the IO streams.
    • child._handle.onexit triggers with exitCode 0.
    • The Promise resolves with incomplete stdout, and child.killed is true since it is valid to send a signal to a zombie process.
  2. child._handle.onexit completes before the timeout triggers:

    • child._handle.onexit triggers with exitCode 0, but the close event is not emitted since streams are still open.
    • The timeout triggers, the kill function destroys the IO streams, the child.kill function is effectively a no-op because child._handle is cleared.
    • The Promise resolves with incomplete stdout, and child.killed is false since child.kill function did nothing.
Why I think this is a bug
  1. Unlike the conclusion of the discussion, there is no way in client code to know if the resolved output is complete.
  2. The timeout (kill) behavior of execFile is different from spawn, the latter one does not destroy the output streams, which means that it is possible to have a valid implementation that resolves the promise correctly. (Actually, this is our current workaround.)
@bnoordhuis
Copy link
Member

Nothing should be logged to the terminal, and the main program should exit with code 0, indicating all subprocesses resolved successfully with complete stdout output.

It feels like your unspoken assumption here is:

  • my child processes sleep for 0.9 seconds
  • my timeout is 1 second
  • therefore all my child processes wake up and exit before my timeout expires

That assumption is wrong though: your child processes sleep for at least 0.9 seconds but they can definitely sleep longer; and at that point there's intrinsically a race between waking up and getting killed.

Yes, node stops reading stdio when it receives the "child exited" signal from the operating system. What would you have it do instead? Sit around waiting for output that may never come? Even though the child process exited, its children may still be alive - and keeping the stdio alive. In the limit, the stdout and stderr streams may never see EOF.

Unless you have a specific suggestion how to improve execFile(), I think the conclusion here should be that there's nothing fundamentally wrong with it; it's working as designed. Maybe it's the wrong tool for the job for you but that's why spawn() exists.

@jcplist
Copy link
Author

jcplist commented Jan 3, 2025

That assumption is wrong though: your child processes sleep for at least 0.9 seconds but they can definitely sleep longer; and at that point there's intrinsically a race between waking up and getting killed.

In the case the child process is killed by the signal, we will see the reject path instead.

Yes, node stops reading stdio when it receives the "child exited" signal from the operating system.

This is not correct, each stream increases child._closesNeeded.

What would you have it do instead? Sit around waiting for output that may never come? Even though the child process exited, its children may still be alive - and keeping the stdio alive. In the limit, the stdout and stderr streams may never see EOF.

This is what other parts of node do, spawn does this, execFile without timeout does this. If the timeout behavior of execFile is intentional, I would expect it is well documented.

Unless you have a specific suggestion how to improve execFile(), I think the conclusion here should be that there's nothing fundamentally wrong with it; it's working as designed. Maybe it's the wrong tool for the job for you but that's why spawn() exists.

In my opinion, node should either

  1. Document that there is no guarantee of the resolved stdout and stderr if execFile is used with a timeout.
  2. Don't destroy output streams in execFile's internal kill, or make it an option.
  3. Reject the promise or have an attribute indicating the timeout of execFile is triggered

@bnoordhuis
Copy link
Member

This is what other parts of node do, spawn does this, execFile without timeout does this.

Well, of course. spawn() gives you streams that you can close (or not) at your leisure, exec() and execFile() give you buffers. Apples and oranges.

Document that there is no guarantee of the resolved stdout and stderr if execFile is used with a timeout.

I had a look at child_process.md just now and I don't think it's particularly unclear but if you think it can be improved, go ahead and send a pull request.

Don't destroy output streams in execFile's internal kill, or make it an option.

Backwards incompatible change so DOA, and since you're the first one AFAIK asking for this, and node doesn't add features until there's broad demand, I'd say a new option is unlikely to get accepted.

Reject the promise or have an attribute indicating the timeout of execFile is triggered

It does. It rejects the promise and has .killed set to true and with .signal set to 'SIGTERM' or whatever you set .killSignal to.

Maybe your confusion is caused by child processes catching and handling SIGTERM? Try sending SIGKILL instead.

@jcplist
Copy link
Author

jcplist commented Jan 6, 2025

Reject the promise or have an attribute indicating the timeout of execFile is triggered

It does. It rejects the promise and has .killed set to true and with .signal set to 'SIGTERM' or whatever you set .killSignal to.

Maybe your confusion is caused by child processes catching and handling SIGTERM? Try sending SIGKILL instead.

I'm a bit confused, do you mean that "timeout triggers" is the same as "child process killed by the signal"?

The promise may resolve even if the timeout is triggered, that's why the output may be incomplete.

@bnoordhuis
Copy link
Member

All .timeout does is send a signal of your choosing to the child process. Whether that signal terminates the child depends on the signal and if the child catches it or not.

SIGTERM usually terminates processes but can be caught or ignored.

SIGKILL is special in that it cannot be caught and always terminates the recipient immediately.

But keep in mind the time window between "timeout expires" and "process receives signal" wherein the target process may choose to exit on its own accord.

@jcplist
Copy link
Author

jcplist commented Jan 6, 2025

All .timeout does is send a signal of your choosing to the child process. Whether that signal terminates the child depends on the signal and if the child catches it or not.

SIGTERM usually terminates processes but can be caught or ignored.

SIGKILL is special in that it cannot be caught and always terminates the recipient immediately.

This is not related to the provided example as python does not catch SIGTERM by default.

But keep in mind the time window between "timeout expires" and "process receives signal" wherein the target process may choose to exit on its own accord.

I guess what you mean is:

  1. timeout expires
  2. child process exited
  3. signal sent to child process

, which is similar to the first scenario described in the report. In this case, promise resolves with child.killed = true.

In the second scenario, promise resolves with child.killed = false, and there is no other attribute related to the timeout.

@MiccWan
Copy link

MiccWan commented Jan 7, 2025

@bnoordhuis I believe the issue lies in the following behavior:

Under certain race conditions, the promise can be resolved with promise.child.killed === false, yet stdout is incomplete. This behavior can be consistently reproduced using the example provided above, regardless of whether the signal is SIGTERM or SIGKILL. If you run the author's example, you’ll notice occurrences of false in the logs, which is unexpected.

Why I Think It's a Bug

A user of execFile might:

  1. Use a try-catch block with await on the promise—but the promise resolves.
  2. Check if p.child.killed === true—but it’s false.

Both approaches produce false-negative results. From the user's perspective, there’s no reliable way to determine whether the result is "correct," unless the implementation properly rejects the promise in such cases.

Simplified Reproducible Example:

// Example usage: `node example.js 1000`

import cp from 'child_process';
import util from 'util';

const count = Number(process.argv[2]);
const execFile = util.promisify(cp.execFile);

await Promise.allSettled(
  Array.from({ length: count }, async () => {
    const p = execFile(
      'python3',
      ['-c', 'import time; time.sleep(0.9); print(\'Hello, World!\')'],
      { timeout: 1000, killSignal: 'SIGKILL' },
    );
    const { stdout } = await p;
    if (!stdout && !p.child.killed) {
      console.log(
        'OH NO! The promise resolved, child seems not killed, but stdout is empty!'
      );
    }
  }),
);

This example demonstrates how the promise may resolve successfully while leaving stdout incomplete, despite the process not being marked as killed.

Would you mind taking a closer look? Thank you!

@MiccWan
Copy link

MiccWan commented Jan 7, 2025

The issue also occurs with the non-promisified version of execFile, as demonstrated in the example below:

Example:

// Example usage: `node example.js 1000`

import { execFile } from 'child_process';

const count = Number(process.argv[2]);

for (let i = 0; i < count; i++) {
  const child = execFile(
    'python3',
    ['-c', 'import time; time.sleep(0.9); print(\'Hello, World!\')'],
    { timeout: 1000, killSignal: 'SIGKILL' },
    (err, stdout, stderr) => {
      if (err === null && child.killed === false && stdout === '') {
        console.log('OH NO! No error, child not killed, but stdout is empty!');
      }
    },
  );
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants