Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36770: [C++] Use custom endpoint for s3 using environment variable AWS_ENDPOINT_URL #36791

Merged
merged 1 commit into from
Jul 21, 2023

Conversation

adbmal
Copy link
Contributor

@adbmal adbmal commented Jul 20, 2023

Rationale for this change

we need a way to read custom object storage (such as minio host or other s3-like storage).
use environment variable AWS_ENDPOINT_URL

What changes are included in this PR?

set variable endpoint_override according the environment variable

Are these changes tested?

unittest and tested on pyarrow

Are there any user-facing changes?

No

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@adbmal adbmal changed the title [C++] Use custom endpoint for s3 using AWS_ENDPOINT [C++] Use custom endpoint for s3 using environment variable AWS_ENDPOINT Jul 20, 2023
@adbmal adbmal changed the title [C++] Use custom endpoint for s3 using environment variable AWS_ENDPOINT GH-36770: [C++] Use custom endpoint for s3 using environment variable AWS_ENDPOINT Jul 20, 2023
@github-actions
Copy link

⚠️ GitHub issue #36770 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test to cpp/src/arrow/filesystem/s3fs_test.cc?

@@ -337,6 +338,10 @@ Result<S3Options> S3Options::FromUri(const Uri& uri, std::string* out_path) {
} else {
options.ConfigureDefaultCredentials();
}
auto endpoint_env = arrow::internal::GetEnvVar(kDefaultEndpointEnvVar);
if(endpoint_env.ok()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(endpoint_env.ok()) {
if (endpoint_env.ok()) {

@@ -127,6 +127,7 @@ using internal::ToAwsString;
using internal::ToURLEncodedAwsString;

static const char kSep = '/';
constexpr char kDefaultEndpointEnvVar[] = "AWS_ENDPOINT";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using AWS_ENDPOINT_URL instead of AWS_ENDPOINT because aws-sdk-cpp will support AWS_ENDPOINT_URL aws/aws-sdk-cpp#2587 ?
(How about accepting AWS_ENDPOINT_URL in arrow-rs too?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the same name for constant name?
For example, kAwsEndpointEnvVar for this case.

@@ -337,6 +338,10 @@ Result<S3Options> S3Options::FromUri(const Uri& uri, std::string* out_path) {
} else {
options.ConfigureDefaultCredentials();
}
auto endpoint_env = arrow::internal::GetEnvVar(kDefaultEndpointEnvVar);
if(endpoint_env.ok()) {
options.endpoint_override = endpoint_env.MoveValueUnsafe();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify this?

Suggested change
options.endpoint_override = endpoint_env.MoveValueUnsafe();
options.endpoint_override = *endpoint_env;

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Jul 20, 2023
@adbmal
Copy link
Contributor Author

adbmal commented Jul 21, 2023

@kou thanks for your comments, I updated PR, please review again.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Could you update the PR title and description before we merge this?

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Jul 21, 2023
@adbmal adbmal changed the title GH-36770: [C++] Use custom endpoint for s3 using environment variable AWS_ENDPOINT GH-36770: [C++] Use custom endpoint for s3 using environment variable AWS_ENDPOINT_URL Jul 21, 2023
@github-actions
Copy link

⚠️ GitHub issue #36770 has been automatically assigned in GitHub to PR creator.

1 similar comment
@github-actions
Copy link

⚠️ GitHub issue #36770 has been automatically assigned in GitHub to PR creator.

@adbmal
Copy link
Contributor Author

adbmal commented Jul 21, 2023

+1

Could you update the PR title and description before we merge this?

updated, thanks @kou

@kou kou merged commit f43bfd6 into apache:main Jul 21, 2023
@kou kou removed the awaiting merge Awaiting merge label Jul 21, 2023
@kou
Copy link
Member

kou commented Jul 21, 2023

Thanks.
I've merged this. Could you add take only comment to #36770 to assign it to you?

@kou
Copy link
Member

kou commented Jul 21, 2023

@assignUser We have #36791 (comment) message but we couldn't assign #36770 to abdmal automatically.

@js8544
Copy link
Collaborator

js8544 commented Jul 21, 2023

@assignUser We have #36791 (comment) message but we couldn't assign #36770 to abdmal automatically.

Hi @kou, This is due to #34389, they will have to comment on the issue first.

@kou
Copy link
Member

kou commented Jul 21, 2023

Ah! Thanks!

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit f43bfd6.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

R-JunmingChen pushed a commit to R-JunmingChen/arrow that referenced this pull request Aug 20, 2023
…riable AWS_ENDPOINT_URL (apache#36791)

### Rationale for this change
we need a way to read custom object storage (such as minio host or other s3-like storage).
use environment variable `AWS_ENDPOINT_URL `

### What changes are included in this PR?
set variable endpoint_override according the environment variable

### Are these changes tested?
unittest and tested on pyarrow

### Are there any user-facing changes?
No

* Closes: apache#36770

Authored-by: yiwei.wang <yiwei.wang@weride.ai>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…riable AWS_ENDPOINT_URL (apache#36791)

### Rationale for this change
we need a way to read custom object storage (such as minio host or other s3-like storage).
use environment variable `AWS_ENDPOINT_URL `

### What changes are included in this PR?
set variable endpoint_override according the environment variable

### Are these changes tested?
unittest and tested on pyarrow

### Are there any user-facing changes?
No

* Closes: apache#36770

Authored-by: yiwei.wang <yiwei.wang@weride.ai>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Use custom url for s3 using AWS_ENDPOINT_URL
3 participants