Skip to content

Commit

Permalink
Rewrite casync-http to use curl multi
Browse files Browse the repository at this point in the history
This commit brings parallel downloads to casync-http, by using the curl
multi interface, and more precisely the "select()" flavour. For details
see <https://curl.haxx.se/libcurl/c/libcurl-multi.html>.

Libcurl has two ways to achieve parallel downloads:
- for HTTP/1, it can open parallel connections. The maximum number of
  parallel connections is user-defined, through `MAX_HOST_CONNECTIONS`
  and `MAX_TOTAL_CONNECTIONS`.
- for HTTP/2, it can attempt to multiplex on a single connection. The
  maximum number of parallel downloads in this case is negociated
  between the client and the server (we talk about number of streams in
  the HTTP/2 jargon).
- (note that libcurl used to do pipelining over HTTP/1.1, but this is no
  longer supported since 7.62.0, and casync-http doesn't use it anyway)

Accordingly, this commit also introduces two new command-line arguments
to better control the behavior of parallel downloads:
- `--max-active-chunks` is the sum of 1. the number of chunks in the
  curl multi and 2. chunks downloaded and waiting to be sent to the
  remote.  It allows to limit the number of chunks waiting in RAM, in
  case we download faster than we can send to remote. It also gives a
  limit for the maximum number of concurrent downloads.
- `--max-host-connections` is for the case where libcurl opens parallel
  connections to the server. In all likelihood it's only used for HTTP1.

We probably want a large number for max-active-chunks, to ensure we
don't starve the libcurl multi handle, but at the same time we probably
don't want to open too many connections in parallel, and that's why
max-host-connections is a much lower number. It seems to be a sensible
default, according to my understanding so far. User might want to adjust
these number for their specific use-case.

Note that the command-line argument `--rate-limit-bps` doesn't make much
sense anymore, since it's set for each chunk, but now chunks are
downloaded in parallel, and we don't really know how many downloads are
actually happening in parallel. And from Daniel Steinberg:

    We don't have settings that limit the transfer speed of multiple,
    combined, transfers.

So we might want to completely remove this option, or rework it somehow.

Note that this commit removes the wrapper `robust_curl_easy_perform()`
introduced in 328f13d. Quick reminder: this wrapper was used to sleep
and retry on `CURLE_COULDNT_CONNECT`, and allowed to workaround what
seemed to be a misbehavior of the Python Simple HTTP Server. Now that we
do parallel downloads, we can't apply this workaround "as is", we can't
just sleep. So I removed the wrapper. The issue is still present and
reproducible though, but I just assume it's a server issue, not a casync
issue.

Regarding unit tests
--------------------

This commit also opens interesting questions regarding the unit tests.
For now we're using the Python Simple HTTP Server, which can only serve
requests sequentially. It doesn't allow to really test parallelism. It's
not really representative of real-life scenario where, I assume, chunks
are served by a production server such as apache or nginx. Additionally,
I think it would be best to run the test for both a HTTP/1 server and a
HTTP/2 server.

One possibility is to use nginx, it's easy enough to run it. nginx can
serve HTTP/2 only if TLS is enabled though, and out of the box
casync-http will fail if it can't recognize the certificate. So we might
want to add a command-line argument to trust any random certificate.

Additionally, nginx requires root, maybe not very suitable for a test
suite, however there might be some workarounds?

Another possibility is to run nghttpd. This option is light on
dependencies, in the sense that libcurl already relies on libnhgttp2,
however I didn't succeed in using this server yet.

TODOs
-----

There are a few todos left, mainly:
- add argument or env variables support in casync-tool, to match new
  arguments in casync-http.
- documentation, readme and such.
- some details in the code, see MR in GitHub.

Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
  • Loading branch information
elboulangero committed Mar 23, 2019
1 parent 095cb73 commit ebc2243
Show file tree
Hide file tree
Showing 2 changed files with 1,128 additions and 309 deletions.
2 changes: 1 addition & 1 deletion meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ libzstd = dependency(
conf.set10('HAVE_LIBZSTD', libzstd.found())

libcurl = dependency('libcurl',
version : '>= 7.29.0')
version : '>= 7.43.0')
openssl = dependency('openssl',
version : '>= 1.0')
libacl = cc.find_library('acl')
Expand Down
Loading

0 comments on commit ebc2243

Please sign in to comment.