Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite casync-http to use curl multi #208

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

elboulangero
Copy link
Contributor

@elboulangero elboulangero commented Mar 23, 2019

This commit brings parallel downloads to casync-http, by using the curl
multi interface, and more precisely the "select()" flavour. For details
see https://curl.haxx.se/libcurl/c/libcurl-multi.html.

Libcurl has two ways to achieve parallel downloads:

  • for HTTP/1, it can open parallel connections. The maximum number of
    parallel connections is user-defined, through MAX_HOST_CONNECTIONS
    and MAX_TOTAL_CONNECTIONS.
  • for HTTP/2, it can attempt to multiplex on a single connection. The
    maximum number of parallel downloads in this case is negociated
    between the client and the server (we talk about number of streams in
    the HTTP/2 jargon).
  • (note that libcurl used to do pipelining over HTTP/1.1, but this is no
    longer supported since 7.62.0, and casync-http doesn't use it anyway)

Accordingly, this commit also introduces two new command-line arguments
to better control the behavior of parallel downloads:

  • --max-active-chunks is the sum of 1. the number of chunks in the
    curl multi and 2. chunks downloaded and waiting to be sent to the
    remote. It allows to limit the number of chunks waiting in RAM, in
    case we download faster than we can send to remote. It also gives a
    limit for the maximum number of concurrent downloads.
  • --max-host-connections is for the case where libcurl opens parallel
    connections to the server. In all likelihood it's only used for HTTP1.

We probably want a large number for max-active-chunks, to ensure we
don't starve the libcurl multi handle, but at the same time we probably
don't want to open too many connections in parallel, and that's why
max-host-connections is a much lower number. It seems to be a sensible
default, according to my understanding so far. User might want to adjust
these number for their specific use-case.

Note that the command-line argument --rate-limit-bps doesn't make much
sense anymore, since it's set for each chunk, but now chunks are
downloaded in parallel, and we don't really know how many downloads are
actually happening in parallel. And from Daniel Steinberg:

We don't have settings that limit the transfer speed of multiple,
combined, transfers.

So we might want to completely remove this option, or rework it somehow.

Note that this commit removes the wrapper robust_curl_easy_perform()
introduced in 328f13d. Quick reminder: this wrapper was used to sleep
and retry on CURLE_COULDNT_CONNECT, and allowed to workaround what
seemed to be a misbehavior of the Python Simple HTTP Server. Now that we
do parallel downloads, we can't apply this workaround "as is", we can't
just sleep. So I removed the wrapper. The issue is still present and
reproducible though, but I just assume it's a server issue, not a casync
issue.

Regarding unit tests

This commit also opens interesting questions regarding the unit tests.
For now we're using the Python Simple HTTP Server, which can only serve
requests sequentially. It doesn't allow to really test parallelism. It's
not really representative of real-life scenario where, I assume, chunks
are served by a production server such as apache or nginx. Additionally,
I think it would be best to run the test for both a HTTP/1 server and a
HTTP/2 server.

One possibility is to use nginx, it's easy enough to run it. nginx can
serve HTTP/2 only if TLS is enabled though, and out of the box
casync-http will fail if it can't recognize the certificate. So we might
want to add a command-line argument to trust any random certificate.

Additionally, nginx requires root, maybe not very suitable for a test
suite, however there might be some workarounds?

Another possibility is to run nghttpd. This option is light on
dependencies, in the sense that libcurl already relies on libnghttp2,
however I didn't succeed in using this server yet.

TODOs

There are a few todos left, mainly:

  • add argument or env variables support in casync-tool, to match new
    arguments in casync-http.
  • documentation, readme and such.
  • some details in the code, see MR in GitHub.

@elboulangero elboulangero force-pushed the mr/casync-http-curl-multi branch from f7cf33e to ebc2243 Compare March 23, 2019 11:05
meson.build Outdated Show resolved Hide resolved
struct CaChunkDownloader {
CaRemote *remote;
CURLM *multi;
Queue *ready; /* CURL handles waiting to be used */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know the size of the queues already, it would be more efficient to have static arrays, and avoid a lot of malloc/free. I can rework this.

src/casync-http.c Outdated Show resolved Hide resolved
}

static int ca_chunk_downloader_step(CaChunkDownloader *dl) {
int r;
Copy link
Contributor Author

@elboulangero elboulangero Mar 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this function is tricky. This is basically what happens during a loop iteration.

See the call ca_chunk_downloader_remote_step() below? If you move it to the end, then you break the testsuite, and a bunch of casync list commands hang forever. I don't really know why. But it's not sure you will reproduce it. It's interesting to see that if I run the testuite without parallel tests (ie. MESON_TESTTHREADS=1 ninja test), then everything works fine.

So there's some subtleties going here that I didn't fully understand, and I ended up ordering things in this function by trial and failure. In other words, this needs thorough review: this function, the functions it calls, and basically the communication with the casync remote.

@elboulangero
Copy link
Contributor Author

Ping?

Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, so if i get this right, then in this PR we'll synchronously run the curl logic, then synchronously the ca_remote logic and then the curl logic again and so on. Ideally both logics would run asynchronously in the same poll() loop, i.e. so that we get the fds to wait for out of curl and out of caremote (the latter is easy since it's only two) and pass them to the same poll() call. Can we make that happen with libcurl? I don't know curl well enough I must admit, and the reason I did my original http client the way i did (with its synchronous behaviour) was laziness. But if we fix this for multiple getters maybe we can fix that too?

Also, I am not sure why we need a queue for the requests? note that CaRemote already implements one anyway?

src/casync-http.c Outdated Show resolved Hide resolved
src/casync-http.c Outdated Show resolved Hide resolved
src/casync-http.c Outdated Show resolved Hide resolved
src/casync-http.c Outdated Show resolved Hide resolved
src/casync-http.c Outdated Show resolved Hide resolved
src/casync-http.c Outdated Show resolved Hide resolved
QueueItem *next;
};

typedef struct Queue {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, instead of introducing a new, local queue implementation here: let's just copy list.h from systemd's tree (i.e. https://github.com/systemd/systemd/blob/master/src/basic/list.h) and use that? it's an embedded list which means we need fewer allocations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 48f37dd and fdd4689. See my temporary branch mr/casync-http-curl-multi.

I will push-force latter.

src/casync-http.c Outdated Show resolved Hide resolved
src/casync-http.c Outdated Show resolved Hide resolved
src/casync-http.c Outdated Show resolved Hide resolved
@elboulangero elboulangero force-pushed the mr/casync-http-curl-multi branch from ebc2243 to 3c1f551 Compare April 10, 2019 03:32
@elboulangero
Copy link
Contributor Author

hmm, so if i get this right, then in this PR we'll synchronously run the curl logic, then synchronously the ca_remote logic and then the curl logic again and so on. Ideally both logics would run asynchronously in the same poll() loop, i.e. so that we get the fds to wait for out of curl and out of caremote (the latter is easy since it's only two) and pass them to the same poll() call. Can we make that happen with libcurl? I don't know curl well enough I must admit, and the reason I did my original http client the way i did (with its synchronous behaviour) was laziness. But if we fix this for multiple getters maybe we can fix that too?

Not sure I understand you, so let me detail a bit how things work.

There's actually only one poll for all events (curl and casync altogether). This happens in curl_multi_wait() at

c = curl_multi_wait(dl->multi, waitfds, ELEMENTSOF(waitfds), curl_timeout_ms, &n);

The pros of curl_multi_wait is that we don't have to deal with all fds that are in use by curl and feed them to poll manually, because that exactly what curl_multi_wait does already. Additionally, this function allows user to give additional fds to add to the poll, and that's what we do here: we give casync two fds, so that they're part of the poll.

If you want to peak into curl's implementation for curl_multi_wait, here are some quick links:

So it's an easy way to poll on both casync and curl fds, basically. The downside, maybe, is that when we're out of this function, we don't know which fds were triggered, all we can know is how many fds were triggered (which doesn't help much). That's why a "loop iteration" involves running both curl logic and casync logic, in case something happened (we're not sure).

That could be done differently, we could do the poll ourself instead of using curl_multi_wait, but basically it would mean replicating what curl does in curl_multi_wait, and I don't see any benefit. But I'm no poll expert, so I might miss something.

Also, I am not sure why we need a queue for the requests?

So we need to keep tracks of the active requests somehow. Active requests being either easy handles added to the curl multi, or either chunks downloaded, and waiting to be sent to the remote.

I implemented a queue mechanism because it seemed to be suitable, and having 3 queues for the 3 states possibles (ready, in progress and completed) seemed to make things easy. For example, by keeping a bit of stats inside the queues (n_added, n_removed), it's then super easy to display statistics at the end. And this is valuable to see where time is spent.

For example while testing locally (ie. super fast localhost network), I can see that the average size of the completed queue is 62 (out of 64 active chunks), which means that most of the time, we're waiting for chunks to be sent to the remote. So the casync communication between the two casync processes is the bottleneck.

OTOH, while testing with a remote server, obviously things are different, and this time it's the average size of the inprogress queue that is close to 64, which means that most of the time we're waiting for chunks to be downloaded.

Of course, this is not the only implementation possible, and instead of having 3 queues, we could have a "state" enum, and for each chunk we could keep track of the state it's in. No more queue, just a static array of chunks. But I'm not sure it would make the code easier to read and maintain.

Performance wise, it don't think there's any cons in having these 3 queues. It's small amount of data (max-active-chunks x 3). Right now, we use dynamic memory only because at some point I wasn't sure if the queue would be of fixed size or not. Right now, it turns out that it's a static size (fixed by the parameter --max-active-chunks), and if we agree on this implementation, I'll rewrite that part to make it static and remove all the malloc/free.

note that CaRemote already implements one anyway?

This part I don't really understand. Can I make use of that on the casync-http side?

@elboulangero
Copy link
Contributor Author

(Note that I fixed all the details you mentioned above. Since it was trivial enough I hit "Resolve" button myself, but truth to be told, I'm not familiar with reviewing stuff on github, and maybe you prefer to hit "Resolve" yourself, so please don't hesitate to tell me how it should be done).

Copy link
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have some tests for this. It is far from trivial ;) I agree we should switch to one of the "real" implementations, but I'm not sure which one is appropriate. By nghttpd do you mean https://www.nghttp2.org/? That seems to be dead (website from 2015) and is not packaged for Fedora, which would make things difficult for us.

I think it'd be easier to review this patchset if some parts were split out. E.g. the part to move arg_protocol enum higher could be separated, and it would just make things easier to review. I suggest some other parts to split out inline.

Like @poettering suggested, we want to pull in the list.h implementation from systemd. This will remove a large chunk of this patch too.

As for the general approach, I think it's reasonable. According to the documentation, the curl multi interface supports both doing poll internally and integrating into an external event loop. I think it's reasonable to start with the approach you picked. We have the option to switch to an external loop later on if necessary.


c = curl_easy_setopt(handle, CURLOPT_PROTOCOLS,
arg_protocol == PROTOCOL_FTP ? CURLPROTO_FTP :
arg_protocol == PROTOCOL_SFTP? CURLPROTO_SFTP:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe PROTOCOL_SFTP ? CURLPROTO_SFTP :? Squishing it together like this looks off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #219

}
}

/* (void) curl_easy_setopt(handle, CURLOPT_SSL_VERIFYPEER, false); */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to make this into a commandline option, as you suggest. It's useful for debugging independently of this PR.

Copy link
Contributor Author

@elboulangero elboulangero Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #219


/* (void) curl_easy_setopt(handle, CURLOPT_SSL_VERIFYPEER, false); */

/* (void) curl_easy_setopt(handle, CURLOPT_VERBOSE, 1L); */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't do it, as -v is already an option to enable casync verbosity. Maybe -vv to enable curl verbose as well?

Since there's already #194 on the same topic, I would batch that altogether. I have some WIP regarding this issue, but I wanted to wait until this PR is completed and merged before finishing it and opening a PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#217 propagates both options --log-level and -v from casync to casync-http. The cURL verbosity is enabled for log-level notice and higher (i.e. on debug).

do { \
if (ENABLE_LOG_TRACE) \
log_debug("[%d] " fmt, (int) getpid(), ##__VA_ARGS__); \
} while (false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log_trace should go in log.h.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gportay will follow up on that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit de2ba61. See my temporary branch mr/casync-http-curl-multi.

I will push-force latter.

queue_push(dl->inprogress, handle);

/* We know there must be something to do, since we just added something. */
c = curl_multi_perform(dl->multi, &running_handles);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems strange to call curl_multi_perform in a loop. It should handle all handles at once, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gportay will follow up on that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keszybz: indeed curl_multi_perform handles all the handles.

This function handles transfers on all the added handles that need attention in an non-blocking fashion.

I will call the function once, outside the for loop.

for (;;) {
if (quit) {
log_info("Got exit signal, quitting");
r = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do return 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit fdd4689. See my temporary branch mr/casync-http-curl-multi.

I will push-force latter.

}

return c;
return r;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and remove the this line completely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit fdd4689. See my temporary branch mr/casync-http-curl-multi.

I will push-force latter.

@@ -180,7 +1151,7 @@ static size_t write_index(const void *buffer, size_t size, size_t nmemb, void *u

r = ca_remote_put_index(rr, buffer, product);
if (r < 0) {
log_error("Failed to put index: %m");
log_error_errno(r, "Failed to put index: %m");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch would be much easier to read if those fixes (which are independent) were split out into a separate patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #219

@@ -623,9 +1393,6 @@ static int run(int argc, char *argv[]) {
r = process_remote(rr, PROCESS_UNTIL_FINISHED);

finish:
if (curl)
curl_easy_cleanup(curl);

return r;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this cleanup block is not necessary anymore. It would be nice to drop the label, and simply use return instead of goto everywhere above. Could be done as a separate commit if you prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #219

static curl_off_t arg_rate_limit_bps = 0;
static bool arg_verbose = false;
static curl_off_t arg_rate_limit_bps = 0;
static uint64_t arg_max_active_chunks = MAX_ACTIVE_CHUNKS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint64_t seems a bit over the top. Maybe just make this unsigned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #219

Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
This is to prepare the next commit, where we will use the protocol enum
for more than just the protocol passed in arguments, and the arg_ prefix
won't make sense anymore.

Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
This commits brings in two helpers:
- protocol_str() to convert an enum protocol to a string, which is
  useful mainly for logs.
- protocol_status_ok() as a unique place to check if the protocol status
  that we get from libcurl means OK or KO.

Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
It seems to me that the condition PROCESS_UNTIL_WRITTEN is reached when
there's no more data to write, hence ca_remote_has_unwritten() returns
0.

And it also seems that this is could be copy/paste mistake, as all the
code above is similar, but the condition matches the function we call,
ie:
- PROCESS_UNTIL_CAN_PUT_CHUNK > ca_remote_can_put_chunk
- PROCESS_UNTIL_CAN_PUT_INDEX > ca_remote_can_put_index
- PROCESS_UNTIL_CAN_PUT_ARCHIVE > ca_remote_can_put_archive
- PROCESS_UNTIL_HAVE_REQUEST > ca_remote_has_pending_requests

But here, the function returns the opposite of what we want:
- PROCESS_UNTIL_WRITTEN > ca_remote_has_unwritten

Note that I didn't observe any bug due to that, and the test suite
succeeds before and after this patch.

Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
While working on this code, I stumbled on cases where casync got stuck
because we forgot to call PROCESS_UNTIL_WRITTEN here. Well, TBH as long
as we download chunks, we're fine because we end up calling
PROCESS_UNTIL_WRITTEN afterwards.

But in any case, it seems more correct to sync after downloading these
files, and it doesn't hurt.

Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
The way we use curl handle is that we create it once, and then re-use it
again and again, as it's more efficient than re-allocating a new one for
each request.

By looking at the code closely, it turns out that the setup of the curl
handle needs to be done only once, then afterwards we only need to
change the URL in order to re-use the handle.

So this commit brings two helper functions to reflect that:
- make_curl_easy_handle() does the init work and set all the options for
  the handle.
- configure_curl_easy_handle() does the things that are needed in order
  to re-use the handle. In effect, it only sets the URL.

Additionally, this commit introduces curl_easy_cleanupp, in order to use
automatic pointers.

Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
These macros aim to make setting curl options easier.

CURL_SETOPT_EASY() sets the option, and on failure it outputs a generic
error message with the name of the option that failed, and returns -EIO.

The CURL_SETOPT_EASY_CANFAIL() variant does not return, it only outputs
an error message.

Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
This removes the need for the 'finish' label, hence a bunch of goto go
away.

Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
@elboulangero
Copy link
Contributor Author

Hi and sorry for being that long, lately I've been under water due to other projects.

I split all the work that is not really related to multi http request, but more refactoring, into #219, which can be merged to master if it's good enough. It's mostly shuffling code, nothing polemic I think, but I'm sure it will benefit from a round of review / rework.

Due to other tasks ongoing on my side, I won't be able to work further on that, hence let me introduce @gportay who will keep the good job going. I'm still around and will help him catching up.

Thanks for the collaboration and I hope to see u again through more casync PR!

elboulangero and others added 4 commits June 25, 2019 16:59
Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
The goal is to make the run() function more readable, and only outline
the major steps, while the bulk of the work is left to other functions.

Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
This long option does not use a short option (see the optstring in call
of getopt_long).

Use a constant instead, as it is done in casync-tool.c.
This can be useful for testing, if ever we do HTTP2/SSL with a local,
untrusted server.

Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
@elboulangero
Copy link
Contributor Author

It would be great to have some tests for this. It is far from trivial ;) I agree we should switch to one of the "real" implementations, but I'm not sure which one is appropriate. By nghttpd do you mean https://www.nghttp2.org/? That seems to be dead (website from 2015) and is not packaged for Fedora, which would make things difficult for us.

I think nginx is best, as it's real in the sense that it powers the Internet for real, and it's also easy enough to set up for local tests. The only downside I see is that it requires root, and in Debian we build packages and run test suites without root. I don't know about fedora.

I also proposed nghttp2 because it's a dependency of curl (under the form of libnghttp2), so it's already around when we build casync. I'm not familiar with the project though. It seems that it's maintained: https://github.com/nghttp2/nghttp2/graphs/code-frequency.

I think it'd be easier to review this patchset if some parts were split out. E.g. the part to move arg_protocol enum higher could be separated, and it would just make things easier to review. I suggest some other parts to split out inline.

Done in #219

Like @poettering suggested, we want to pull in the list.h implementation from systemd. This will remove a large chunk of this patch too.

@gportay will take care of that.

As for the general approach, I think it's reasonable. According to the documentation, the curl multi interface supports both doing poll internally and integrating into an external event loop. I think it's reasonable to start with the approach you picked. We have the option to switch to an external loop later on if necessary.

Yes indeed. It's very easy with curl, as you basically just call curl_multi_wait() and curl_multi_perform() one after another. It's a bit more difficult with casync though, as you have to understand very well the protocol that the casync remotes speak to be sure that you do the right calls, and go to poll at the right moment after you did everything you could. Otherwise it's easy to get stuck.

gportay and others added 3 commits July 3, 2019 08:35
The file is taken from systemd[1].

Note: The inclusion of the macro header is removed.

[1]: https://raw.githubusercontent.com/systemd/systemd/0d7f7c2fde377d9bf618d16aa230757f956f8191/src/basic/list.h
This commit brings parallel downloads to casync-http, by using the cURL
multi interface, and more precisely the "select()" flavour. For details
see https://curl.haxx.se/libcurl/c/libcurl-multi.html.

The cURL library has two ways to achieve parallel downloads:

    for HTTP/1, it can open parallel connections. The maximum number of
    parallel connections is user-defined, through MAX_HOST_CONNECTIONS
    and MAX_TOTAL_CONNECTIONS.

    for HTTP/2, it can attempt to multiplex on a single connection. The
    maximum number of parallel downloads in this case is negociated
    between the client and the server (we talk about number of streams in
    the HTTP/2 jargon).

Note that libcurl used to do pipelining over HTTP/1.1, but this is no
longer supported since 7.62.0, and casync-http doesn't use it anyway)

Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
Signed-off-by: Gaël PORTAY <gael.portay@collabora.com>
gportay added 6 commits July 4, 2019 14:41
Currently, casync-http handles the option --log-level but casync does
not propagate its verbosity to its protocol helpers.

This commit propagates the --log-level to the protocol helpers as it is
done for option --rate-limit-bps.

Now, the debug messages from the helper casync-http are printed to
stderr if the --log-level is specified to casync.
Enable CURL verbosity for the notice and higher log levels (ie. debug).

When CURLOPT_VERBOSE is enabled, libcurl outputs traces to stderr, such
as headers and a lot of internal information.

Now, the libcurl verbose informations are dumped to stdout when the
options --log-level notice or debug are specified to casync.
This commit introduces a new option --max-active-chunks=<MAX> that
limits the number of simultaneous chunk transfers from the remote.

The MAX number is the sum of:

    1. the number of chunks added to the cURL multi interface, and
    2. chunks downloaded and waiting to be sent to the remote.

It limits the number of chunks stored in memory that are ready to be
sent to the casync process; this limit the memory usage in the situation
where the network is faster than the pipe communication between the
helper and casync.
This commit introduces a new option --max-host-connections=<MAX> that
limits the number of connections to a single host.

See https://curl.haxx.se/libcurl/c/CURLMOPT_MAX_HOST_CONNECTIONS.html.
Currently, casync-http handles the option --trust-peer but casync does
not propagate it to its protocol helpers.

This commit propagates the --trust-peer the protocol helpers as it is
done for option --rate-limit-bps.
The macro log_trace is imported from systemd.
@gportay gportay force-pushed the mr/casync-http-curl-multi branch from 3c1f551 to d8748c9 Compare July 4, 2019 19:12
@gportay
Copy link
Contributor

gportay commented Jul 4, 2019

I have push-force a new version. It is based on #219, with more atomic commits.

@gportay
Copy link
Contributor

gportay commented Jul 18, 2019

@poettering, @keszybz is there any chance you have time to make a review on all PR? I assume you are very busy these days :) Thanks anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants