Recommendation for provenance revision stratgey when aggregation policy and data #513
Unanswered
HarshPathakhp
asked this question in
OPA and Rego
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Opa community,
I am posting this discussion with the intention of gaining some insight on best way to create revisions for aggregated bundles.
Background
When OPA loads multiple bundles, it can run into some problems, as stated in the docs below -
When implementing aggregation, a question arises - what should be an appropriate revision for the aggregate bundle so that it hints to the provenance of the individual bundles? With the case of multiple bundles, each individual bundle has its own revision, which can naturally hint to the source of the bundle. For example, if these bundles are being generated from commits on gitlab, the revision can perhaps be set to the commit hash.
In a centralized, multi-tenant deployment of OPA which serves authorization requests of multiple independent teams, aggregation of policies is abstracted to the end users, in the sense that the end users need not care that their individual bundles are being aggregated into one, for them to make requests to it. In such a scenario, picking an appropriate revision for the aggregated bundle becomes a bit tricky.
I have listed a few options here with their pros/cons. Please let me know about your thoughts on the below approaches.
Option 1 - Encode the revision of individual bundles
Consider the following JSON
The JSON represents the revision of each individual bundle. We can then do a base64 encoding and present that as the revision of the aggregate.
Advantage
Since the aggregated bundle revision itself captures the revisions of individual bundles, application teams can find the revision of their concerned bundle(s) when calling data APIs with
provenance=true
. This will be helpful during diagnosis if some authorization calls don't return proper responses.Disadvantage
The base64 encoding increases as the number of individual bundles increase. With some simulations I ran, with 30 individual bundles, the above JSON when b64 encoded reaches 1 KB. It may perhaps impact latency of authorization calls slightly but is definitely a problem when versioning and storing the aggregate bundles. A fixed size revision would have been perfect.
Option 2 - Use a fixed size revision string and store actual revisions elsewhere
To solve the versioning problem, if we go with fixed size revisions, end users cannot decode the aggregate revision. They will have to refer to some database to get the indivudal bundle revisions, that actually matter to them. The disadvantages are clear - resource overhead (an additional database for storing the revisions). Furhter, the abstraction I talked about previously is being violated here. Also, end users need to make an extra hop to get their bundle revisions, which some may not prefer from a philosophical viewpoint (If teams are concerned with just their bundle revisions, why should they need to make an extra hop to get it?).
Option 3 - Use a fixed size revision string and store actual revisions in manifest metadata.
Here we get away with an additional database. However, the extra hop problem remains, as users will have to query /v1/data/system endpoint on OPA.
Furhter, if we choose this over Option-2, we lose the ability to view individual bundle revisions of older aggregate bundles, as part of audit.
Overall, this seems to me like a complicated problem, which has arisen because OPA is being used in a centralized and multitenancy environment, something which it is not designed for. However, as open-policy-agent/opa#6166 goes, many people prefer to use OPA as a centralized cluster for audit preparedness and other usecases.
My choice seems to be with Option-2. As always, many real world problems can never have perfect solutions. But I am happy to read advice from others in the community. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions