You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have noticed that the AWS CPP SDK does not maintain HTTP keepalive with services when the service responds with a HTTP 4xx or 5xx status code - the connection is closed directly after receive such a response. The result is a high volume of HTTP reconnects and, in my case, high CPU usage due to excessive loading of /etc/pki/ssl/certs/ca-bundle.crt for each new connection (and the abundance of HTTP 400 errors in DynamoDB for ConditionalCheckFailed exceptions)
The root cause is behavior from libcurl, in which some code determines that 'upload was incomplete' after receiving an HTTP error code which is 300 or above (see https://github.com/curl/curl/blob/master/lib/http.c#L4128 ). The reason for this is that the AWS SDK CPP is providing a CURLOPT_READFUNCTION but not a CURLOPT_POSTFIELDSIZE. Normally this would result in a chunked upload, but CurlHttpClient.cpp is overriding this by providing a Content-length and Transfer-Encoding header explicitly. As a result, libcurl is unable to determine if the entire request body has been sent after receiving the response. Disputably this is a bug in libcurl, but the fix in the SDK seems pretty easy.
On my system the fix caused significant CPU usage improvements (like, 50%) due to not having to re-load the cert bundle for each request but it is heavily dependent on the number of 4xx's received.
Expected Behavior
HTTP keepalive should work when we receive a 400
Current Behavior
HTTP connection is dropped after any 3xx, 4xx or 5xx response from a service
Reproduction Steps
The easiest way to show this is by modeling the interaction with libcurl. The minimal code for libcurl to show the reconnection, modeled after the interaction that CurlHttpClient has with libcurl is:
Just compile with g++ -o curltest curltest.cpp -lcurl and create a file called small with some random content for it to attempt to upload.
This test makes two requests to dynamodb which are clearly incorrect, resulting in a 404 from dynamodb. What should happen, is that the second request re-uses the connection from the first, but it does not (see output below), unless the "UNCOMMENT TO FIX" line is uncommented.
Without the fix, this will output:
...
* HTTP error before end of send, stop sending
<
* Closing connection 0
...
* Hostname dynamodb.us-east-1.amazonaws.com was found in DNS cache
* Trying 52.119.226.214:443...
* Connected to dynamodb.us-east-1.amazonaws.com (52.119.226.214) port 443 (#1)
...
uncommenting the fix produces:
...
* We are completely uploaded and fine
...
* Connection #0 to host dynamodb.us-east-1.amazonaws.com left intact
* Found bundle for host dynamodb.us-east-1.amazonaws.com: 0x7073d0 [serially]
* Can not multiplex, even if we wanted to!
* Re-using existing connection! (#0) with host dynamodb.us-east-1.amazonaws.com
* Connected to dynamodb.us-east-1.amazonaws.com (52.119.226.80) port 443 (#0)
Currently I'm using a different workaround, which is setting CURLOPT_KEEP_SENDING_ON_ERROR which also allows the connection to stay open. This is settable without modifying the AWS SDK CPP by setting this curl flag manually, but that seems like a bit of a hack.
AWS CPP SDK version used
I used 1.8 but the same code is in 1.9
Compiler and Version used
gcc
Operating System and version
linux AL2
The text was updated successfully, but these errors were encountered:
Describe the bug
I have noticed that the AWS CPP SDK does not maintain HTTP keepalive with services when the service responds with a HTTP 4xx or 5xx status code - the connection is closed directly after receive such a response. The result is a high volume of HTTP reconnects and, in my case, high CPU usage due to excessive loading of /etc/pki/ssl/certs/ca-bundle.crt for each new connection (and the abundance of HTTP 400 errors in DynamoDB for ConditionalCheckFailed exceptions)
The root cause is behavior from libcurl, in which some code determines that 'upload was incomplete' after receiving an HTTP error code which is 300 or above (see https://github.com/curl/curl/blob/master/lib/http.c#L4128 ). The reason for this is that the AWS SDK CPP is providing a CURLOPT_READFUNCTION but not a CURLOPT_POSTFIELDSIZE. Normally this would result in a chunked upload, but CurlHttpClient.cpp is overriding this by providing a Content-length and Transfer-Encoding header explicitly. As a result, libcurl is unable to determine if the entire request body has been sent after receiving the response. Disputably this is a bug in libcurl, but the fix in the SDK seems pretty easy.
On my system the fix caused significant CPU usage improvements (like, 50%) due to not having to re-load the cert bundle for each request but it is heavily dependent on the number of 4xx's received.
Expected Behavior
HTTP keepalive should work when we receive a 400
Current Behavior
HTTP connection is dropped after any 3xx, 4xx or 5xx response from a service
Reproduction Steps
The easiest way to show this is by modeling the interaction with libcurl. The minimal code for libcurl to show the reconnection, modeled after the interaction that CurlHttpClient has with libcurl is:
Just compile with
g++ -o curltest curltest.cpp -lcurl
and create a file calledsmall
with some random content for it to attempt to upload.This test makes two requests to dynamodb which are clearly incorrect, resulting in a 404 from dynamodb. What should happen, is that the second request re-uses the connection from the first, but it does not (see output below), unless the "UNCOMMENT TO FIX" line is uncommented.
Without the fix, this will output:
uncommenting the fix produces:
Possible Solution
Additional Information/Context
Currently I'm using a different workaround, which is setting CURLOPT_KEEP_SENDING_ON_ERROR which also allows the connection to stay open. This is settable without modifying the AWS SDK CPP by setting this curl flag manually, but that seems like a bit of a hack.
AWS CPP SDK version used
I used 1.8 but the same code is in 1.9
Compiler and Version used
gcc
Operating System and version
linux AL2
The text was updated successfully, but these errors were encountered: