Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Rewrite casync-http to use curl multi
This commit brings parallel downloads to casync-http, by using the curl multi interface, and more precisely the "select()" flavour. For details see <https://curl.haxx.se/libcurl/c/libcurl-multi.html>. Libcurl has two ways to achieve parallel downloads: - for HTTP/1, it can open parallel connections. The maximum number of parallel connections is user-defined, through `MAX_HOST_CONNECTIONS` and `MAX_TOTAL_CONNECTIONS`. - for HTTP/2, it can attempt to multiplex on a single connection. The maximum number of parallel downloads in this case is negociated between the client and the server (we talk about number of streams in the HTTP/2 jargon). - (note that libcurl used to do pipelining over HTTP/1.1, but this is no longer supported since 7.62.0, and casync-http doesn't use it anyway) Accordingly, this commit also introduces two new command-line arguments to better control the behavior of parallel downloads: - `--max-active-chunks` is the sum of 1. the number of chunks in the curl multi and 2. chunks downloaded and waiting to be sent to the remote. It allows to limit the number of chunks waiting in RAM, in case we download faster than we can send to remote. It also gives a limit for the maximum number of concurrent downloads. - `--max-host-connections` is for the case where libcurl opens parallel connections to the server. In all likelihood it's only used for HTTP1. We probably want a large number for max-active-chunks, to ensure we don't starve the libcurl multi handle, but at the same time we probably don't want to open too many connections in parallel, and that's why max-host-connections is a much lower number. It seems to be a sensible default, according to my understanding so far. User might want to adjust these number for their specific use-case. Note that the command-line argument `--rate-limit-bps` doesn't make much sense anymore, since it's set for each chunk, but now chunks are downloaded in parallel, and we don't really know how many downloads are actually happening in parallel. And from Daniel Steinberg: We don't have settings that limit the transfer speed of multiple, combined, transfers. So we might want to completely remove this option, or rework it somehow. Note that this commit removes the wrapper `robust_curl_easy_perform()` introduced in 328f13d. Quick reminder: this wrapper was used to sleep and retry on `CURLE_COULDNT_CONNECT`, and allowed to workaround what seemed to be a misbehavior of the Python Simple HTTP Server. Now that we do parallel downloads, we can't apply this workaround "as is", we can't just sleep. So I removed the wrapper. The issue is still present and reproducible though, but I just assume it's a server issue, not a casync issue. Regarding unit tests -------------------- This commit also opens interesting questions regarding the unit tests. For now we're using the Python Simple HTTP Server, which can only serve requests sequentially. It doesn't allow to really test parallelism. It's not really representative of real-life scenario where, I assume, chunks are served by a production server such as apache or nginx. Additionally, I think it would be best to run the test for both a HTTP/1 server and a HTTP/2 server. One possibility is to use nginx, it's easy enough to run it. nginx can serve HTTP/2 only if TLS is enabled though, and out of the box casync-http will fail if it can't recognize the certificate. So we might want to add a command-line argument to trust any random certificate. Additionally, nginx requires root, maybe not very suitable for a test suite, however there might be some workarounds? Another possibility is to run nghttpd. This option is light on dependencies, in the sense that libcurl already relies on libnhgttp2, however I didn't succeed in using this server yet. TODOs ----- There are a few todos left, mainly: - add argument or env variables support in casync-tool, to match new arguments in casync-http. - documentation, readme and such. - some details in the code, see MR in GitHub. Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
- Loading branch information