Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect chunked encoding with 0-length underlying stream #3259

Open
1 task done
pitrou opened this issue Jan 20, 2025 · 10 comments
Open
1 task done

Incorrect chunked encoding with 0-length underlying stream #3259

pitrou opened this issue Jan 20, 2025 · 10 comments
Labels
bug This issue is a bug. needs-triage This issue or PR still needs to be triaged.

Comments

@pitrou
Copy link

pitrou commented Jan 20, 2025

Describe the bug

On the Apache Arrow CI, newer versions of the AWS SDK lead to test failures against Minio:
apache/arrow#45304

A network traffic capture exhibits a problem around chunked encoding when the body is 0 bytes. The AWS SDK sends this request:

PUT /bucket/emptydir/ HTTP/1.1
Host: 127.0.0.1:39491
Accept: */*
amz-sdk-invocation-id: B545DDA1-9E33-4987-B074-2B147AB8BD75
amz-sdk-request: attempt=1
authorization: AWS4-HMAC-SHA256 Credential=minio/20250120/us-east-1/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-encoding;content-type;host;transfer-encoding;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-trailer, Signature=5be6a95be231511f69fdc161843bbe83413ba52540e1607bc9de36ea42a827c5
content-encoding: aws-chunked
content-type: binary/octet-stream
transfer-encoding: chunked
user-agent: aws-sdk-cpp/1.11.488 ua/2.1 api/S3 os/Linux#5.15.0-130-generic lang/c++#C++11 md/aws-crt#0.29.9-dev+d5a3907e md/arch#x86_64 md/GCC#13.3.0
x-amz-content-sha256: STREAMING-UNSIGNED-PAYLOAD-TRAILER
x-amz-date: 20250120T094325Z
x-amz-decoded-content-length: 0
x-amz-trailer: x-amz-checksum-crc64nvme
Expect: 100-continue

To which Minio responds this:

HTTP/1.1 400 Bad Request
Accept-Ranges: bytes
Content-Length: 331
Content-Type: application/xml
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Id-2: a4f5403139a65a8fd51b2aa0caa208291435f782459a282182de2ddf9eec98bf
X-Amz-Request-Id: 181C5D5BF688A7B0
X-Content-Type-Options: nosniff
X-Ratelimit-Limit: 11618
X-Ratelimit-Remaining: 11618
X-Xss-Protection: 1; mode=block
Date: Mon, 20 Jan 2025 09:43:25 GMT
Connection: close

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>BadRequest</Code><Message>malformed chunked encoding</Message><Key>emptydir/</Key><BucketName>bucket</BucketName><Resource>/bucket/emptydir/</Resource><RequestId>181C5D5BF688A7B0</RequestId><HostId>a4f5403139a65a8fd51b2aa0caa208291435f782459a282182de2ddf9eec98bf</HostId></Error>

The HTTP 1.1 RFC around chunked encoding makes it clear that a 0-size trailing chunk is mandatory, but the AWS SDK doesn't send one.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

The AWS SDK should send a 0-size trailing chunk even if the underlying stream is empty.

Current Behavior

The AWS SDK sends an empty request body even though it declares the transfer-encoding as chunked, which is incorrect.

Reproduction Steps

Typical reproduction step:

    Aws::S3::Model::PutObjectRequest req;
    req.SetBucket(ToAwsString("bucket"));
    req.SetKey(ToAwsString("emptydir/"));
    req.SetBody(std::make_shared<std::stringstream>(""));
    s3_client->PutObject(req);

(with s3_client configured to access a Minio endpoint)

Possible Solution

No response

Additional Information/Context

No response

AWS CPP SDK version used

1.1.488

Compiler and Version used

Not sure, but likely irrelevant

Operating System and version

Ubuntu 22.04

@pitrou pitrou added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 20, 2025
@pitrou
Copy link
Author

pitrou commented Jan 20, 2025

@sbiscigl This may be related to your changes here: #3186

@pitrou
Copy link
Author

pitrou commented Jan 20, 2025

I think the problem is quite likely in the code below:

size_t BufferedRead(char *dst, size_t amountToRead) {
assert(dst != nullptr);
// only read and write to chunked stream if the underlying stream
// is still in a valid state
if (m_stream->good()) {
// Try to read in a 64K chunk, if we cant we know the stream is over
m_stream->read(m_data.GetUnderlyingData(), DataBufferSize);
size_t bytesRead = static_cast<size_t>(m_stream->gcount());
writeChunk(bytesRead);
// if we've read everything from the stream, we want to add the trailer
// to the underlying stream
if ((m_stream->peek() == EOF || m_stream->eof()) && !m_stream->bad()) {
writeTrailerToUnderlyingStream();
}
}

The gotcha is that good() returns false if EOF is reached. So, if the underlying stream is empty, writeTrailerToUnderlyingStream is never called.

Replacing the m_stream->good() condition with !m_stream->bad() might be sufficient to fix the issue.

@pitrou
Copy link
Author

pitrou commented Jan 20, 2025

Also, in the reproducer above:

    Aws::S3::Model::PutObjectRequest req;
    req.SetBucket(ToAwsString("bucket"));
    req.SetKey(ToAwsString("emptydir/"));
    req.SetBody(std::make_shared<std::stringstream>(""));
    s3_client->PutObject(req);

removing the SetBody call avoids the issue altogether (but of course, in the general case, one is not sending a constant string :-)).

@sbiscigl
Copy link
Contributor

sbiscigl commented Jan 20, 2025

Fwiw we have this integration test to try to test the case and its currently running and succeeding in AWS S3 so im curious, are you seeing issues with AWS S3 or just minio, or other 3rd party api compatible services.

also its related to Announcement: S3 default integrity change #3253 where we now send trailing checksums as a default. We extensively tested with aws s3 but usage with third party may vary. the RFC seems fairly clear here though and minio does support all of this, so let me try your solution out, it makes sense.

@pitrou
Copy link
Author

pitrou commented Jan 20, 2025

Fwiw we have this integration test to try to test the case and its currently running and succeeding in AWS S3 so im curious, are you seeing issues with AWS S3 or just minio, or other 3rd party api compatible services.

As the issue description says, we found this out with Minio as a server.

Also, regarding https://github.com/aws/aws-sdk-cpp/blob/main/tests/aws-cpp-sdk-s3-integration-tests/BucketAndObjectOperationTest.cpp#L2390-L2410 : as pointed out in #3259 (comment) above, you would need to add a SetBody to trigger the issue anyway.

@sbiscigl
Copy link
Contributor

sbiscigl commented Jan 20, 2025

as pointed out in #3259 (comment) above, you would need to add a SetBody to trigger the issue anyway.

Yep seeing that now, creating another test, pushing a fix out, will have it ready momentarily.

The HTTP 1.1 RFC around chunked encoding makes it clear that a 0-size trailing chunk is mandatory, but the AWS SDK doesn't send one.
BadRequestmalformed chunked encodingemptydir/bucket/bucket/emptydir/181C5D5BF688A7B0a4f5403139a65a8fd51b2aa0caa208291435f782459a282182de2ddf9eec98bf

This isnt exactly the issue, the issue is that we were sending one zero sized "chunk" then one trailing zero. i.e. the chunked stream would look like

0\r\n\r\n0\r\nx-amz-checksum-crc32:AAAAAA==\r\n\r\n

where as it should look like

0\r\nx-amz-checksum-crc32:AAAAAA==\r\n\r\n

also the term "chunk" here is overloaded, the error in question is referring to aws-chunked NOT transfer encoding chunked which is a little confusing.

still malformed though, missed this test case, my bad, adding the test and pushing out a fix momentarily

@pitrou
Copy link
Author

pitrou commented Jan 20, 2025

also the term "chunk" here is overloaded, the error in question is referring to aws-chunked NOT transfer encoding chunked which is a little confusing.

I see, but the client is sending the header "transfer-encoding: chunked" anyway :) Are both happening at the same time?

@sbiscigl
Copy link
Contributor

Are both happening at the same time?

Yeah both are happening at the same time, libcurl is handling transfer-encoding: chunked header, we just give it bytes in a callback. those bytes that we give it the callback are aws-chunk encoded, which is essentially double buffered and encoded into a underlying steam per the S3 spec.

A while back we actually had a nasty bug where transfer-encoding: chunked was tied to aws-chunked that lead to errors because libcurls callback doesnt guarantee more than 8kb where as aws-chunked requires that.

so hopefully that gives a little bit of a back history of whats going on there. fix coming soon though, finishing up some tests.

@sbiscigl
Copy link
Contributor

fix merged, will be tagged with todays relesase, give a shout if you dont see it work as intended

@h-vetinari
Copy link

Thanks a lot for the super-quick handling!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. needs-triage This issue or PR still needs to be triaged.
Projects
None yet
Development

No branches or pull requests

3 participants