How to safely make sure that main() exit stops the loop? #91

sersorrel · 2024-10-15T09:55:55Z

Some context: I use aiorun.run(main(), stop_on_unhandled_errors=True) to make sure that main crashing will exit the program, because any exception in main probably indicates a programmer error (and a service manager can restart the process once it exits).

Unfortunately, main being cancelled (e.g. because something it awaits was cancelled) is not treated as an "unhandled error", even though from my perspective it's unexpected behaviour and I want the process to exit and be restarted, rather than sitting there doing nothing. So, my main looks like this:

async def main():
	try:
		await business_logic()
	finally:
		asyncio.get_running_loop().stop()

in an effort to ensure that, no matter what, if business_logic doesn't work properly, at least the program will exit.

However, this causes a new problem – when I ctrl-C the app, it prints an ugly traceback:

Traceback (most recent call last):
  File "/home/ubuntu/app/.venv/bin/app", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/app/src/app/server/__init__.py", line 535, in main
    aiorun.run(async_main(), stop_on_unhandled_errors=True)
  File "/home/ubuntu/app/.venv/lib/python3.10/site-packages/aiorun.py", line 352, in run
    loop.run_until_complete(
  File "/usr/lib/python3.10/asyncio/base_events.py", line 647, in run_until_complete
    raise RuntimeError('Event loop stopped before Future completed.')
RuntimeError: Event loop stopped before Future completed.

I think the flow of events here is something like:

I hit ctrl-C
aiorun's _shutdown_handler runs (with a short diversion via loop.call_soon_threadsafe) and calls loop.stop()
pending tasks are polled by asyncio to see if they can make progress (that is, this iteration of the event loop runs to completion)
the loop actually stops, and aiorun's "shutdown phase" begins
pending tasks are cancelled by aiorun, to signal that they should tidy up and exit
aiorun starts up the loop again, and waits for all the pending (cancelled) tasks
one of those tasks is main, which hits its finally clause and calls loop.stop()!
this iteration of the event loop runs to completion
the loop actually stops, again – but there's a task that didn't complete (perhaps it had more cleanup work to do than main, and would've completed given a few more event loop iterations)
ugly traceback

Is there a way I can solve this? How can main tell whether it's been cancelled because it was waiting on something that got cancelled (i.e. it must stop the loop) vs because aiorun detected a ctrl-C and is running cancelled tasks to completion (in which case aiorun will stop the loop)?

The text was updated successfully, but these errors were encountered:

cjrh · 2024-10-17T08:57:33Z

Thanks for the detailed report. I will have a look in the next few days when I get a chance.

cjrh · 2024-10-17T12:26:24Z

Ok, I had a quick look at this. There are two parts to discuss:

Interpretation of CancelledError as unexpected behaviour.
Calling loop.stop inside main

Let's deal with them in turn.

Interpretation of `CancelledError` as unexpected behaviour

So, unfortunately, the design of asyncio in Python regards CancelledError as a kind of signal, not an error. This is similar to how StopIteration is a signal, not an error. This library, and much of the ecosystem treats CancelledError this way and it is difficult to change.

Therefore, when CancelledError is found we can only ever understand it to be the result of an intentional act, not an error. The fundamental parts of aiorun assume this, and it is now difficult to change. There are hacks I could incorporate to, e.g., set a flag somewhere if shutdown has been initiated by a signal versus some other way, and somehow expose this, but I think we are all better off treating cancellation as intentional. So, if something in your system is causing spurious cancellations, I think you should make it stop doing that rather than try to catch it.

For your purposes, you can at least catch the CancelledError in your main function, and if you like, reraise it as a different exception:

def test_stop_also_on_cancellation(caplog):

    async def i_get_cancelled():
        # Force the loop to actually spin, prevent testing against any special
        # optimizations
        await asyncio.sleep(5.0)
        raise asyncio.CancelledError()

    async def main():
        try:
            await i_get_cancelled()
        except asyncio.CancelledError as e:
            raise Exception("Unexpected cancellation received") from e

    run(main(), stop_on_unhandled_errors=True)

This is a test case I wrote to investigate this issue. You can catch the CancelledError in main and raise a different exception, or do something else entirely. For your scenario, this will definitely stop main, and shutdown, if stop_on_unhandled_errors is active. Hopefully this is sufficient?

However, there remains no way to tell the difference between whether the caught CancelledError is due to the shutdown process itself (like CTRL-C say), or due to some other unidentified cancellation happening somewhere else in the system.

I could do something like the following:

Set a an attribute on the loop instance inside the signal handler, like loop._aiorun_signal,
and then you could check for this inside the exception handler.

Example:

    async def main():
        try:
            await i_get_cancelled()
        except asyncio.CancelledError as e:
            loop = asyncio.get_running_loop()
            if not getattr(loop, '_aiorun_signal', False):
                raise Exception("Unexpected cancellation received") from e

This is very messy and I don't want to include this in aiorun.

However, you can do what you want in your own code, and if this works for you, you could provide your own shutdown_handler callback that can add this annotation to the loop instance, and then you can check for it yourself in your main.

Your shutdown_handler might look something like this, which is the same one inside aiorun, but with the extra attribute set:

def _shutdown_handler(loop):
    logger.debug("Entering shutdown handler")
    loop = loop or get_event_loop()

    logger.warning("Stopping the loop")
    loop._aiorun_signal = True
    loop.stop()

In any event, if you discover some part of your system that is causing spurious cancellations that are bubbling up within your tasks, you should try to fix that behaviour regardless.

Calling `loop.stop` inside `main`

Your issue has highlighted to me that loop.stop() must not be called inside main once the shutdown handling in aiorun has begun. Thank you for highlighting this :)

As you correctly identified, once the loop has already been stopped, the remainder of the cancellation+cleanup work takes place inside a run_until_complete call, which will raise if another loop.stop is encountered. In the README, I mention that a user can call loop.stop inside main. However, I should probably add a note about this specific problematic scenario in the documentation. It happens, again as you described, if the loop.stop is called inside the finally section.

async def main():
  try:
    await something()
  finally:
    asyncio.get_running_loop().stop()

When CTRL-C is pressed, CancelledError is raised at the current await point inside every Task nested hierarchy of coroutines, and in the snippet above, this happens at await something. Then, the loop is started up again with run_until_complete, which raises when the stop is called above. This causes the messy traceback.

At the moment, I am satisfied with simply adding documentation for this I think.

sersorrel · 2024-10-17T12:54:56Z

So, if something in your system is causing spurious cancellations, I think you should make it stop doing that rather than try to catch it.

Agreed! (I need to catch it once, in order to find where the cancellation actually came from, but then I absolutely intend to make it Not Do That.)

you could provide your own shutdown_handler callback

Ahh, yes, that makes total sense and gives me exactly the data I need – I can keep track of "we are shutting down now" on the loop or elsewhere. I agree that this makes more sense than asking aiorun to expose that as loop._aiorun_signal or something.

At the moment, I am satisfied with simply adding documentation for this I think.

Sounds fine to me :) thank you very much for taking the time to look at this!

cjrh added the docs label Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to safely make sure that main() exit stops the loop? #91

How to safely make sure that main() exit stops the loop? #91

sersorrel commented Oct 15, 2024

cjrh commented Oct 17, 2024

cjrh commented Oct 17, 2024 •

edited

Loading

sersorrel commented Oct 17, 2024

How to safely make sure that main() exit stops the loop? #91

How to safely make sure that main() exit stops the loop? #91

Comments

sersorrel commented Oct 15, 2024

cjrh commented Oct 17, 2024

cjrh commented Oct 17, 2024 • edited Loading

Interpretation of CancelledError as unexpected behaviour

Calling loop.stop inside main

sersorrel commented Oct 17, 2024

cjrh commented Oct 17, 2024 •

edited

Loading

Interpretation of `CancelledError` as unexpected behaviour

Calling `loop.stop` inside `main`