-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define syntax for escaping declarative config env var substitution #4375
base: main
Are you sure you want to change the base?
Define syntax for escaping declarative config env var substitution #4375
Conversation
Why Also doesn't this break being able to do something like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving despite my comments since $$
is what the collector uses and need for $${DOLLARS}
seem unnecessary in an SDK configuration :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The escape sequence is not fully specified.
If I want ${API_KEY}
in the environment variable value, this says it should be encoded as $${API_KEY}
, ok.
If I want $${API_KEY}
in the value, how should this be encoded ?
If '$' is used to as an escape character, then the spec must have a well defined way to print a $
non escaping character, so that any string containing $
can be represented.
Resolution could be to say that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should do it simply in the same way as docker compose
. See https://docs.docker.com/reference/compose-file/interpolation/
You can use a
$$
(double-dollar sign) when your configuration needs a literal dollar sign.
Is also looks that it is what OTel Collector is doing as well:
This is the syntax proposed in this PR. |
I disagree. According to this PR something like |
Yes exactly. Matching collector syntax is the motivation here.
I see. That's fair. When I was writing this PR I did look at the collector's language, which is similar to docker's, and which I found to be overly simplistic:
Its missing guidance on how For example, suppose we have: If we perform env env vars substitution after Alternatively, env var substitution might take place before So You also gave an example of
My understanding is that the regular expression / logic as proposed in this PR will indeed resolve The reason is that the regular expression I added an additional test case to my java prototype implementation of this. Note - no changes to the implementation were required to make this test pass. More examples in the specification might be helpful. |
Put this comment in the wrong repo: Oh, now I see the collector uses |
@jack-berg, I find the Collector and and Docker Compose docs clearer.
This will provide different experience for users who have experience with Collector and Docker Compose who are used to always type I agree with the intention to match with Collector (and Docker Compose) syntax 👍 |
Its seems clear on paper until you try to implement it. The docker and collector specs are incomplete. Why does If you try to implement this, and treat I would prefer to match the algorithm of the collector, but looking for advice on how to describe this in such a way that 11 language implementations will get it right. Despite the seemingly simple syntax as described in docs, the collector's substitution logic is quite complex / nuanced. The current env var substitution allows for simpler implementations that can leverage regular expressions to do the heavy lifting. I considered trying to get the both of both worlds (i.e. collector behavior and a regular expression based spec), by changing the regular expression to match:
I.e. make the whole
I'm actually not sure there's a way to accomplish the collector's behavior without requiring implementations to carefully recreate its complex recursive algorithm which statefully walks values, character by character. (Not sure how I would approach trying to to specify this.) And I think a regular expression based approach is essential to having a low enough complexity bar to implement. |
Docker composer also does it recursively: https://github.com/compose-spec/compose-go/blob/main/interpolation/interpolation.go
I think "specification by example" will help achieving it. Meaning specifying expected results for given input.
Maybe this would help: https://github.com/compose-spec/compose-go/blob/main/template/template.go
As far as I remember, I was calling it out long ago during some SIG meeting that env var substitution is complex. I think that I even said that each language should have fuzz tests for such parsing because of its complexity. I was proposing that the users could simply use However, the feedback was that the Configuration SIG wants to try (challenge) it given the assumption that users need it. If this won't be working like in OTel Collector then I think we will be giving bad user experience. Given that OTel Collector is still not stable, the alternative way could be simplifying the env var substitution in both Collector config and SDK config. |
We could also provide a seperate OTel tool which does the env var substitution in the same way as the OTel Collector. |
It doesn't have to be complex. With a regex-based specification, the java implementation is only 28 lines long, plus a little more boiler plate to make sure that logic is invoked by the YAML parser.
We actually already have this. The configuration validator is a tool written in go, and available as a binary or docker container, which accepts a target YAML config file and:
The problem is, not all ecosystems will want or be able to bundle in a binary specify to a particular platform / architecture. So this tool is available to help implementations, but we cannot rely on it being used everywhere.
As of this PR, the differences I know of are:
So there are already differences between collector and language implementations. Its probably safe to say that the env var substitution for declarative config is a subset of the collector. And this seems fine to me. Especially if its a strict subset (i.e. declarative config takes care not to introduce new things the collector will not add). Not sure if @open-telemetry/collector-maintainers would be open to it, but it doesn't seem like much is lost by removing generic support for replacing I get the appeal "docker does this" argument, but docker env var substitution breaks from collector substitution and declarative configuration is other key ways. All try to have a syntax inspired by shell-style substitution, but each supports shell-style to varying extents, and the collector extends it with the provider concept which is distinctly different. |
I see. This is not trivial when reading the regexp that checks for 1 or 2 The Having examples so that every language implements the proper behavior is critical here. |
This is not how parser works, the second dollar character can not be used to reduce two different rules at the same time. Either
Or
This is a typical shift/reduce conflict in the grammar that needs to be resolved by reformulating the grammar (so that The nasty part here is to convey sufficient information in the spec without having to produce a full parser grammar and reference litterature. Proposal: (1) The dollar character is used as an escape sequence, so that (2) Expression matching the regular expression: ${(?:env:)?(?<ENV_NAME>[a-zA-Z_][a-zA-Z0-9_])(:-(?<DEFAULT_VALUE>[^\n]))?} are replaced with an environment variable substitution (3) The yaml input string is consumed from left to right, with rule (1) taking precedence other rule (2). Data printed in the output string does not participates in further parsing. Examples, with FOO=BAR (a)
(b)
(c)
(d)
|
I created a separate issue: #4384 Regarding this PR I just expect the same env var substitution behavior from all OTel Components. |
Nice proposal @marcalff - I implemented this in my prototype PR in this commit. Its only slightly more complex that a pure regex based approach.
I agree, but I think your proposal balances it nicely. Let me take a shot at updating the text in this PR accordingly. |
I agree with @marcalff's proposal at #4375 (comment) as this is the collector's current behavior. Additionally, the collector will escape See https://github.com/open-telemetry/opentelemetry-collector/blob/main/confmap/internal/e2e/expand_test.go as a good list of examples for how the collector handles expansion/escaping (input for the test can be found here). If you're wondering how the collector handles a specific scenario that is not already covered in those tests let me know and I'll add it. |
Dissmissing as I will be OOO next week and I see that this in going into the right direction
Dismissing review to avoid blocking, will take a look at changes.
Ok I've pushed a commit to reflect @marcalff's proposal, including some pseudocode of what an implementation might look like. I've also converted the examples section to a markdown table and added more examples per this conversation. The re-org improves readability as the set of examples grows. Let me know what you think. |
@@ -75,6 +75,29 @@ more non line break characters (i.e. any character except `\n`). If a referenced | |||
environment variable is not defined and does not have a `DEFAULT_VALUE`, it MUST | |||
be replaced with an empty value. | |||
|
|||
The `$` character is an escape sequence, such that `$$` in the input is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: add changelog entry before merging
@@ -75,6 +75,29 @@ more non line break characters (i.e. any character except `\n`). If a referenced | |||
environment variable is not defined and does not have a `DEFAULT_VALUE`, it MUST | |||
be replaced with an empty value. | |||
|
|||
The `$` character is an escape sequence, such that `$$` in the input is | |||
translated to a single `$` in the output. The resolved `$` from an escape | |||
sequence MUST not be considered when matching input against the environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MUST NOT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think (correct me if I'm wrong) this is a critical part of the mechanics:
Assuming API_KEY=123
.
If the resolved $
is not considered when matching against environment variables, we have:
"$${API_KEY}"
=> "$" + resolveEnvVars("{API_KEY}")
=> "${API_KEY}"
If instead, the resolve $
is considered when matching against environment variables, we could have something like:
"$${API_KEY}"
=> "$" + "{API_KEY}"
=> resolveEnvVars("${API_KEY}")
=> "123"
.
In other words, without this, an parser might think that it can do the substitution in two separate passes. The first pass replacing $$
with $
. The second pass replacing env var substitution references. A two pass approach produces incorrect results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess Marc may simply had this in mind:
sequence MUST not be considered when matching input against the environment | |
sequence MUST NOT be considered when matching input against the environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry about the short comment. I meant to write MUST NOT in uppercase, per pellared change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…y-specification into escape-env-var-substitution
Resolves #3914.
An environment variable substitution reference which starts with
$$
(i.e.$${API_KEY}
) is escaped. Implementations should strip the leading$
character and not perform substitution (i.e.$${API_KEY} => ${API_KEY}
).See java implementation here: open-telemetry/opentelemetry-java#7033