-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
async calls #105
Comments
MM, yep, that's been a gleam in my eye for a long while and thanks for the API sketch! There's some facility for this in libcurl, in terms of an async API, but I haven't fully figured how to tie it into the PgSQL scheme nicely, since once you start up an async request, the whole who-owns-it question comes up... I was thinking about an API based on callbacks rather than a queue, so your idea bears some thought. Something for me to ponder on my ride tomorrow... |
Great -- thanks for response. The challenge with callbacks is that there is no way (that I know of) to handle callbacks at the SQL level. The idea here is to implement some wrapper to something like epoll_wait() (http_wait_for_request) with a timeout. Upon receiving an interesting event, control would be returned to the SQL level with information so that you wouldn't have to loop the outstanding connections to determine which one is ready to return data. |
Which I guess would be OK, but users would have to two-step all their HTTP work... since you are building a scheduler that makes sense for you :) but other folks might find it a little fiddle to make use of, no? |
Well, it's useful in any single threaded application. If you look at asynchronous stacks like node, you can see that most i/o works this way. This type of work is not super common in database coding but since you made this project I think we can both agree that this is an underutilized frontier of development. In general, robust programming of i/o requests outside the system either require threads or asynchronous behavior. Threads are right out in stored procedures, so that leaves async. Synchronous requests are easier to code and many people prefer them, but they are dangerous; you're blocking the thread and you have no wait to deal with the situation other than a crude timeout. So even in scenarios where there would only be one outstanding query at a time you'd still be better off using an asynchronous call if you wanted precise error handling. The perfect analogy here is dblink; I use it a lot to transfer data between databases; if the operation needs to meet certain guarantees in terms of robustness it usually ends up being run over the asynchronous call (mainly so I can exploit cancelling but there are other reasons). dblink really out to have a timeout on the is_busy call and something like dblink_wait_for_query to push the polling out of the sql level, but the api is good enough to work. The synchronous model you've built is fine and will certainly be good enough for most users but really strong i/o libraries will almost always have some kind of contrivance so the calling code can maintain control of the execution state. |
reading up on the libcurl api, the library supports async requests without threading (which would be required for this approach to be technically feasible). Here's in example I found: In the API sketch above, if you were to do this, http_wait_for_request would probe into the 'select' and continue to load up the result until the SQL level timeout was hit or the still_running flag was set so that the request was complete. 'select' is of course polling underneath the hood, so we would not want to scale to huge number of concurrent requests, but this ought to be much more efficient than sql level poll. Thanks for listening, curious your thoughts. |
Just wanted to ask why http_get SQL function does not use PARALLEL keyword in it's definition, but found this. |
yeah, PARALLEL != async. PARALLEL means you can spawn multiple similar option, in query, in situations where query planner thinks it's a good idea to do so. async means you kick the operation off in the background and do other things while the operation is in progress. workarounds today would be to use dblink or pg_background. |
Hi all, set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
explain select * from http(...) UNION ALL select * from http(...); It should show something like (look for Parallel Append)
But when
It works fine in 10% of cases and I see this error in 90% of cases. Is it possible to make |
Two confusing things. None of the functions are marked as |
Hi, I am trying to reproduce my prod env and isolate the problem. Please check the gist how to make parrallel HTTP requests in PostgreSQL. Seems it works fine for GET requests! https://gist.github.com/deem0n/37f63978773d12a8de61289900c5cc03 Basically, I found glitch, when
instead of returning normal response with status 500 it produce Exception.
And finally, if we disable parrallel plan, then we getting 200 Ok responses. |
So, thank you for that gist. I tried it out, and it worked fine... parallel plan, and the 7-wide parallel query ran and returned 7 results nice and fast. Could there be something else going on, in your system? |
Thanks for verification! Actually it could be something with the network settings, it is very hard to debug. I assume that pgsql-http works fine, and if not I will post detailed report why. |
For people interested in this topic, it's possible to get async HTTP queries by combining this extension with the pg_background extension. |
@pramsey do we have example somewhere ? i struggle to understand how to mix them |
Hi. Right now I'm working on an entirely stored procedure based coordination framework called pgtask. Think of it as a centralized cron for controlling database task execution in complex environments; I believe there to be a technical niche there. Anyways, I'm interested in being able to tap web services in addition to target remote database calls. For this to work in my architecture, I'd need to have some kind of asynchronous API; something along the lines of dblink_send for asynchronous querying (which I use to send queries since the coordinator is single threaded).
An ideal API would be able to do something like:
*) http_send (name, ...) send a named http request
*) http_request_complete(name, ) is the http request resolved?
*) http_get_request (name, timeout, ...) get a response, blocking until timeout if not done
*) http_wait_for_request(timeout) wait for the first available request to complete or until timeout. (epoll like interface to prevent polling/looping in sql) -- dblink really needs this
*) http_cancel_request (name, ...) cancel an outstanding request
*) http_list_requests() get a list of outstanding http requests
Food for thought. I don't know if this api (esp wait_for_request call) is technically feasible, but it would make this a much more usable extension since database queries are essentially can be considered single threaded environment. Anyways, thanks for your work!
The text was updated successfully, but these errors were encountered: