-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuing the discussion from StackOverflow #2
Comments
I don't think this would be a very serious issue, treating all synchronous functions as atomic, this is just a legacy habit, caused by the unintentional design of previous implementations, similar legacy and possibly soon-to-be-discarded things are the GIL. By contrast, Haskell once replaced all the underlying libraries of type IO a with asynchronous implementations without affecting the user-level code, because it initially used an m:n lightweight thread model and made no assumptions about the atomicity of processes—unless you explicitly mark a piece of code as atomic, I believe this is also likely to be the future direction of Python and many other popular languages—if they no longer position themselves as scripting languages but hope to take on more. Of course, I acknowledge that your analysis of the current situation is accurate and very practical, but I still have a glimmer of hope for a better future, wishing that one day we will no longer need to repeat ourselves (referring to providing almost identical synchronous and asynchronous implementations for the same function). |
Paste the link of StackOverflow here |
@tomkcook I have provided my new solution in this repo. Your SO question is answered via this test case. Are you satisfied with my new solution? |
Today I tried
Just a record. |
There's a reason that it's not easy to do this. It gives you new and very subtle ways to shoot yourself in the foot.
The problem
First, let's state the problem, with a vaguely real-world example. You have a library of existing code which is all synchronous. It does lots of useful things and is basically the glue that holds your company's systems together. Somewhere near the bottom of the library is a function that retrieves a number from an external server:
Then let's say things change and, instead of getting that number from an HTTP service, you have to get it from dbus. You really don't want all of glib as a dependency so
pydbus
isn't an option. Never mind, there'sdbus_next
which is 100% pure python! Yay!Fine. Now, suppose that someone uses the top level of your library to implement a web service using quart, which runs HTTP endpoints in an asyncio loop. Now you have a problem, because as soon as the library, running in a quart endpoint, calls
get_number()
, it will raise an exception:RuntimeError: This event loop is already running
.You show up at StackOverflow asking about the problem and you get a big pile of unhelpful answers that amount to "Yeah, don't do that. Why don't you rewrite your whole library in asyncio the way God intended it to be?" But you have a big ol' pile of other synchronous code that also uses your library; rewriting the library as asyncio code means either rewriting all that other stuff as asyncio code, or maintaining two versions of your library, one asyncio and one synchronous.
The obvious solution
The obvious solution to this is to make asyncio event loops reentrant, so that the above code doesn't raise a
RuntimeError
, it just gets on and executes the function. There is even a library for doing exactly that. But you don't want to do that.Why don't I want to do that?
You don't want to do that because it breaks all your concurrency assumptions. It gives you new, subtle and extremely-difficult-to-debug ways to shoot yourself in the foot.
Let's go back to our example. Let's say someone is using our
get_number()
function like this:In traditional multithreaded code, this is a disaster. We have a race condition. Another thread might intervene between when we read the value in the line
if self.value < 10:
and the line where we write the valueself.value = get_number()
. In traditional multithreaded code, we would need to put a lock around this to prevent concurrent updates.But in asyncio code, the above is absolutely fine. We know that the only place where context-switching can happen is where there is an
await
. We can write code like this as though it is single threaded because it is. There is no way other code can intervene between the last two lines.Now suppose you make asyncio event loops reentrant, so that asyncio code can call synchronous code that then calls asyncio code again. You just broke all your concurrency assumptions. We can no longer tell by looking at code where the context switches might occur, because any synchronous function might be hiding a re-entry to the asyncio event loop and might context switch when it is called. You now need to protect all these critical sections with locks again.
To the extent that your project uses asyncio for a good reason (and not just because someone said, "Hey, asyncio looks cool, let's use that!") it is almost certainly for this reason: asyncio lets you write concurrent code without having to worry that context switches can happen absolutely anywhere. It's easy to tell where they happen and it's easy to see where you need locks to prevent concurrent updates. Most code that deals with state that's shared between tasks can be written in the simple, obvious way; you only need to worry about concurrency where there's an
await
in your code. If you take all that away, what reason do you have for using asyncio at all? Might just as well use traditional threads.The text was updated successfully, but these errors were encountered: