Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix non-graceful server shutdowns and improve signal handling #41

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

thegamecracks
Copy link

@thegamecracks thegamecracks commented Aug 14, 2024

This fixes the server not gracefully shutting down when using docker compose down / docker compose restart due to SIGINT being ignored by os.system().

The script will gracefully shutdown with with both SIGINT and SIGTERM. This PR does not modify the Dockerfile, so SIGINT remains the default stop signal.

When a SIGINT or SIGTERM is first received, the script will send SIGINT to the Reforger process and wait for it to exit. The subprocess will be killed if a second interrupt is sent, although in practice this won't matter since SIGKILL is sent when Docker times out.

The script will also pass-through the Reforger process's exit code unless it is killed by the script, in which case it will exit with code 1.

As a side effect of passing an argument list to subprocess.Popen, command injection is no longer possible through the environment variables ARMA_BINARY, ARMA_MAX_FPS, ARMA_PROFILE, or ARMA_WORKSHOP_DIR. The ARMA_PARAMS variable can still be used to pass custom arguments.

shlex.join() is now used when printing the server command, which may result in slightly different outputs.

Context

I had set up a server using this project's image and was planning to run docker compose restart periodically via cron jobs. However, when I tested the command, the logs would not show any indication of it shutting down and it would consistently exceed Docker's default timeout of 10 seconds before being killed. I increased this to 120s with the same results.

Bacon informed me that the script may not be passing down SIGINT to Reforger, so I looked in launch.py for clues and found os.system(). It's documentation didn't mention anything about SIGINT, but I coincidentally came across this in the subprocess docs:

https://docs.python.org/3.9/library/subprocess.html#replacing-os-system
The os.system() function ignores SIGINT and SIGQUIT signals while the command is running ...

I looked at the Dockerfile and saw it used STOPSIGNAL SIGINT, which is how I realized what was causing the issue. I later verified this behaviour with these test scripts, albeit on Python 3.12 and not 3.9 as used by Debian Bullseye.

I've updated my server to use the following image:

docker pull ghcr.io/thegamecracks/arma-reforger@sha256:f87928478a0660d72f1b580a33fb711456dc968b0725431571ea69125f3952e4

From preliminary testing, docker compose restart now gracefully shuts down the server, although it had gotten close to the 10s timeout.

P.S. I could not get ghcr.yaml's test job to work on my fork as it kept hanging on steam-query.py without any output.

https://docs.python.org/3/library/subprocess.html#replacing-os-system
os.system() ignores SIGINT, which is the stop signal set by the Dockerfile.
This fix should immediately kill the Reforger process upon being asked to stop.
This provides a chance for the Reforger process to cleanup.
If an unknown exception or a second interrupt occurs,
ensure the process is killed.
This allows omitting STOPSIGNAL from Dockerfile.
This ensures the command can actually be invoked by the user
in case some of the arguments contain whitespace or other
unsafe characters.
@thegamecracks thegamecracks changed the title Launch with subprocess Fix non-graceful server shutdowns and improve signal handling Aug 14, 2024
The purpose of this is making the program flow more obvious,
since proc.kill() already does nothing on completed processes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant