-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection Pooling #32
Comments
Their is an issue open for Ecto that solves this by "reserving" a worker On Wed, May 6, 2015 at 11:15 PM, Peter Hamilton notifications@github.com
|
I think table affinity is not going to be provided by existing solutions. The ecto solution solves a very different problem of nested transactions. We are dealing with a very different set of constraints. The unit of The performance bottleneck is purely server side. We will not have nor want The hand rolled proposal is built on top of independently useful building The ideal drop in solution would be something that:
Number 2 is exactly what a supervisor is for. This is already an out of the I'm going to take a very lightweight pass at this and see how difficult it On Thu, May 7, 2015, 8:26 AM Jason S. notifications@github.com wrote:
|
@hamiltop after talking with @ericmj and @fishcakez they've convinced me that maybe we don't need connection pooling and that I am probably wrong. |
Table affinity is what's got me convinced I want to do some level of I skimmed the conversation in irc and I think your previous comment here I'll hopefully have my code up soon. On Thu, May 7, 2015, 9:07 AM Jason S. notifications@github.com wrote:
|
https://github.com/hamiltop/exrethinkdb/blob/connection_pool/lib/exrethinkdb/connection/pool.ex used like: hosts = [[host: "localhost", port: 28015], [host: "localhost", port: 28015]]
{:ok, conn_pool} = Exrethinkdb.Connection.Pool.start_link([hosts: hosts])
conn = Exrethinkdb.Connection.Pool.get_connection(conn_pool)
Exrethinkdb.Query.table("people") |> Exrethinkdb.Connection.run conn
# alternatively, run from the conn_pool and it will pick a connection for you
Exrethinkdb.Query.table("people") |> Exrethinkdb.Connection.Pool.run conn_pool It's naive, but accomplishes the basic goals. The optimizations would be in how get_connection works in order to provide table affinity. As an already-implemented, cursors store the pid of their connection internally and will use that connection when |
@hamiltop, I really like the ideas that you've laid out for the hand rolled solution because it takes into account the unique characteristics of RethinkDB. Poolboy, and other technologies are going to make assumptions that are part of a different paradigm. I also like the fact that you were able to build a working version based on the code that's already been written. As @jeregrine mentioned, there are bound to be Gremlins lurking in the dark, but that seem to be the case for just about everything in this line of work. At this point, I can't think of a good reason not to run with your solution. |
Some interesting results: Three configurations:
All of these were run with The first test was a non-indexed read in a table of 10k records (looking for one record), repeated 40 times across 40
The next test was for an indexed
Conclusions Additionally, fine tuning the pool is important. When I ran the cluster test with 1 local and 1 remote connection it was much slower than when I ran it with 4 local and 1 remote. This was in a very lopsided network, but I've encountered lopsided clusters aplenty in production environments. I think the takeaway is that a developer will need to weight hosts in the pool according to their requirements. The configuration should be something like:
Additionally, table affinity is definitely required. Hitting the connection that isn't the master for the table was costly. |
The results are exactly what I would expect them to be given the parameters you described. In my mind this seems to indicate that initial design is working properly. This will add the flexibility needed to get the most out of RethinkDB and the application. A couple of things that I was thinking about were pools that had equally weighted hosts. For example, if you have a RethinkDB cluster in data-center A, and another RethinkDB Cluster in data-center B, it seems logical to have the following configuration: [ This would allow the application servers in data-center A to utilize the local network for best performance, but fall back to the cluster in data-center B. Given that they are equally weighted in each group, I would expect that the load would be balanced among them. I have been kicking around the idea of adding tags that could be used to specify things like racks, data-centers, etc, but the weight parameter may cover most cases. At this point I'm not sure if they would help or hurt, but I thought the idea was at least worth mentioning. --Nick |
weight: 0 would need to be a special case, as otherwise weights are I did find a bug in some code not pushed yet. On Fri, May 8, 2015 at 1:29 PM Nick notifications@github.com wrote:
|
You have a couple of issues regarding failure handling with the proposed pooling method: The case where all connections are unavailable isn't handled. This could happen if all connections are A supervisor blocks while starting a child. In the current implementation connect and handshake occur in the If one of the rethinkdb servers is down or there is a issue with connecting/sending data, a connection process will fail to connect. This will either result in the supervisor blocking for a significant time while a connection process tries to start (as above) or more likely the supervisor will shutdown because the Finally the poolling strategy does not provide any backpressure or load shedding to client processes beyond the connection processes' message queues. Designing around a minimal number of connection processes that pipeline requests makes this is a potential bottleneck. |
Thanks for chiming in @fishcakez The case for "unable to acquire a connection" has not been addressed at any level yet, but either a "let it crash" philosophy or responding with To check my understanding, the init function of a GenServer is run as part of GenServer.start_link and therefore it causes the supervisor to block. Is it a common pattern to send oneself a How is the max_restarts issue (where one buggy child kills all the other healthy children as it brings down the supervisor) solved elsewhere? By not using supervision? Backpressure is applied to individual clients through GenServer.call. The client process will block until it receives a response. This does nothing to address an unbounded number of client processes. It also does not deal with load shedding, as stated. Would a simple solution be for an individual connection to keep track of active clients and respond While I'm exploring the idea of a custom pool with table affinity, I'm committed to making |
" We could probably special case it so that is -- That is the behavior that I had in mind with my sample configuration. Would it be possible to round robin any weights that are the same? In my example config above, the idea was to round robin the primary cluster, then fail over to another cluster and round robing those connections. If that's not possible, I suppose the config could look like this, as long as 0 was selected before the higher number weights: [ "This custom pooling is an optimization. Perhaps there's a hybrid solution that can be found." |
The only two extension points I'd like are:
It may be worth seeing if an existing solution can be adapted. It may also On Fri, May 8, 2015 at 4:10 PM Nick notifications@github.com wrote:
|
inaka/worker_pool@master...marianoguerra:issue_16_custom_strategies seems to be interesting. I'd rather use Poolboy just for experience and expertise within the Elixir ecosystem. number 1 from my previous comment could probably be accomplished via some hack layer on top of a custom strategy. Something like "when creating a worker, wrap an existing connection if X and Y else start a new connection". |
You can still pipeline without using pools, either by making the pipelining explicit or by having a |
Tasks are something I'd like to add anyway. I don't think I'll be intellectually satisfied without oversubscription, As far as table affinity, anything wrong with N pools, one per server? Then On Sat, May 9, 2015, 4:30 AM Eric Meadows-Jönsson notifications@github.com
|
Re: Tasks |
Exactly how you describe, casting a reconnect message to self in
The fault tolerance in OTP comes about by isolating errors at suitable levels. A supervision tree does this in layers, errors slowly bubble up the supervision tree as processes keep crashing. Hopefully at some point as close to the bottom of the tree as possible the state is fixed and normal service resumes. Not being able to connect to the database (network partition) is a realistic error and it is reasonable to handle it. Ideally this error is contained so that connections to other databases are not effected. If the other connections are shutdown because the max restart limit is reached the fault has spread further than is necessary. A circuit breaker or fuse type system (such as https://github.com/jlouis/fuse) can be used to raise an alarm (http://www.erlang.org/doc/man/alarm_handler.html) and throttle retries, possibly backing off between attempts (such as https://github.com/ferd/backoff).
Yes.
You could try a slightly different take on the usual semantics if the max_overflow is set to 0. The poolboy transaction need only be to setup the asynchronous request (or task as in #33). It would not have to remain for the lifetime of the request (i.e. checkin the connection and then await the result). I am unsure what performance would be like but it might be interesting to try as it would not require changing your Connection process. |
The original goal of connection pooling pre 1.0.0 was that Modes of use
I think the original goal of solving identified problems and then providing an example of how to use it with poolboy is still the correct approach. The other issue discussed here is table affinity and I'm not going to include smart routing in 1.0.0 for now. Identified problems
Any discussions on those topics should be moved to the issues opened. Any new issues can and should be discussed in this thread. |
I'm going to reduce this problem down to a fairly simple one. Table affinity is not necessary with a rethinkdb-proxy. Throttling/Back Pressure/Load Shedding is still a problem, as is handling timeouts properly. It should work with poolboy out of the box, but that has yet to be tested. |
pool_options = [name: {:local, :hello_pool}, worker_module: RethinkDB.Connection, size: 10, max_overflow: 0]
:poolboy.start_link(pool_options, [])
a = :poolboy.checkout(:hello_pool)
RethinkDB.Query.table_list |> RethinkDB.Connection.run(a) Some advisable defaults:
Changefeeds I may create a |
Removing 1.0.0 milestone. The goal was to have a connection that worked easily with poolboy, which is what we have. |
Recently discovered that RethinkDB has core <-> connection affinity. So while the Elixir side scales well with multiple clients, a single connection is going to perform slightly worse on the server side. Connection pooling should be a priority. |
Agreed that Connection pooling should be a priority, it is kinda frustrating without this feature |
How is it frustrating? Can you provide some examples? On Mon, Apr 18, 2016, 2:56 AM Leng Sheng Hong notifications@github.com
|
I updated the README to show how to use |
I have been working on a adapter for Ecto for a while ( As Ecto 2.0 requires adapters to support Basically, Right now, the only thing you have to do to get started is to replace worker(RethinkDB.Connection, [[name: :foo]]) with supervisor(RethinkDB.Connection, [[name: :foo]]) To use a different pool implementation ( defmodule Repo do
use RethinkDB.Connection, pool: DBConnection.Poolboy
end If you are interested in using Note that i also implemented the new |
Awesome! I appreciate the help. My concerns are:
Otherwise, please submit the PR! Love the help. On Tue, Apr 19, 2016, 5:31 AM Mario Flach notifications@github.com wrote:
|
DBConnection does provide pooling (and poolboy and sbroker are optional dependencies) and the behaviour with pooling should be the same as without pooling, it just a case of adding the Ecto 2.0 does not require all adapters to use DBConnection, only the SQL adapters (as DBConnection is required for the (concurrent) sandbox). I have no idea about changefeeds but I assume they are approximately pubsub. This won't play well with DBConnection unfortunately because the socket is passed to the client process on each call. Postgrex uses a different process to support postgres notifications. I am not sure what the best strategy for pooling changefeeds is. The simplest strategy might be to use poolboy and only carry out the subscription to a changefeed in the poolboy transaction (checkin/checkout). Then remain subscribed to the changefeed after checking in the process to poolboy. Likely it would be a good idea to use a |
Yeah, I noticed that In an ideal world I think a tunable "oversubscription" setting would be what we want. Then maintain two pools, one for changefeeds and one for normal queries. Even in the query case, oversubscription on the connection is efficient. |
Shackle is a pooling library that supports multiplexing: https://github.com/lpgauth/shackle. It works quite differently to DBConnection but provides similar abstractions. |
Hi @hamiltop, The fork is ready to be merged without conflics. All the tests are passing successfully with the exception of 5 tests in There are a few implementation details that should be discussed thought:
|
Opened #106 to discuss |
Poolboy
Poolboy works by providing exclusive access to a given resource in a "transaction".
Pros:
Cons:
While my assumption was to use Poolboy, it seems that there are better approaches available. There are too many tradeoffs to be made for not a whole lot of benefit.
Hand rolled
An ideal connection pool will do the following:
Initial design:
A supervisor tree will be over N connections and a coordinator.
Client processes send queries directly to the connection, just like we currently do. In order to know which connection to contact, the client can request a connection from the coordinator. When the client is finished with the query it informs the coordinator (in order for the coordinator to properly balance connections).
This would require zero changes to
Exrethinkdb.Connection
. The connection would be dumb to pooling. We could provide aExrethinkdb.ConnectionPool
module withrun
/next
/close
that wrap the logic for interacting with the coordinator. The end API for the user would be the same. We will also provide ause Exrethinkdb.ConnectionPool
macro. Ideally you would replaceuse Exrethinkdb.Connection
withuse Exrethinkdb.ConnectionPool
and your application would still work perfectly.Routing queries to proper replicas can be done fairly well in the coordinator by looking at
table("foo_table") |> config |> run
. The coordinator can do this on a timer (once every minute is probably sufficient). It can then use this information to route the connection properly. In the event of stale_reads, the coordinator can load balance between non-master connections. We'll have to addtable
to theQuery
struct, but that's pretty straightforward.Exrethinkdb.ConnectionPool.run
will report back to the coordinator the success or failure of a query. The coordinator will also be monitoring the Connection processes. There will also be logic to retry on a different server if a failure occurs (in the event of stale_reads. otherwise, retrying won't be useful until a new master is selected).I think this custom approach will be fairly simple. It will be designed so that failure leads to a worst case scenario of routing requests to the less optimal replica.
The text was updated successfully, but these errors were encountered: