Argo S3 Artifacts Failing in v3.6.2 #14021
Labels
area/artifacts
S3/GCP/OSS/Git/HDFS etc
solution/workaround
There's a workaround, might not be great, but exists
type/bug
type/regression
Regression from previous behavior (a specific type of bug)
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened? What did you expect to happen?
Background
We've been using Argo Workflows in production for a while now and recently upgraded to
v3.6.2
in a test environment. With the new Argo version the S3 artifacts suddenly stopped working.Our first Workflow step uploads output Artifacts back to S3, this has been working just fine in
v3.5.11
. With the upgrade we now see the first step of the workflow successfully run our code, but the Emissary container fails with the following error:I've double checked our Workflow Artifact configuration, and we do correctly supply the region:
Problem
I've spent time tracing this through the code and it seems that Argo first takes the S3 Artifact Config we supply in the workflow and builds an S3 Driver using our supplied region here. This is then fed into the client options here, and then passed to the STS client here.
The STS client (used to assume IAM Roles, which provides authentication to AWS services) unpacks the region from these options here, and then uses them to resolve the endpoint of the STS server.
During the endpoint resolution process is where the failure occurs, with the STS SDK throwing this error.
Solution
To test my theory that the STS client within the Emissary Executor was not correctly resolving the AWS region, I manually set this env var within the Emissary container by leveraging the
workflow-controller-configmap
:This immediately fixes the issue and my artifacts started appearing in S3. This works because the AWS SDK will fall back on the standard AWS env vars if regular configurations fail to produce metadata needed for the SDK to figure out the endpoint and credentials of an AWS service.
Version(s)
v3.6.2
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflow that uses private images.
Logs from the workflow controller
If needed I can go back and recreate the failure to get these logs, but I don't think they are much use.
The text was updated successfully, but these errors were encountered: