Recommendation for provenance revision stratgey when aggregation policy and data #513

HarshPathakhp · 2023-11-09T07:30:25Z

HarshPathakhp
Nov 9, 2023

Hi Opa community,
I am posting this discussion with the intention of gaining some insight on best way to create revisions for aggregated bundles.

Background

When OPA loads multiple bundles, it can run into some problems, as stated in the docs below -

We recommend that whenever possible, you implement policy and data aggregation centrally, however, in some cases that’s not possible (e.g., due to latency requirements.). When using multiple sources there are no ordering guarantees for which bundle loads first and takes over some root. If multiple bundles conflict, but are loaded at different times, OPA may go into an error state. It is highly recommended to use the health check and include bundle state: Monitoring OPA

When implementing aggregation, a question arises - what should be an appropriate revision for the aggregate bundle so that it hints to the provenance of the individual bundles? With the case of multiple bundles, each individual bundle has its own revision, which can naturally hint to the source of the bundle. For example, if these bundles are being generated from commits on gitlab, the revision can perhaps be set to the commit hash.

In a centralized, multi-tenant deployment of OPA which serves authorization requests of multiple independent teams, aggregation of policies is abstracted to the end users, in the sense that the end users need not care that their individual bundles are being aggregated into one, for them to make requests to it. In such a scenario, picking an appropriate revision for the aggregated bundle becomes a bit tricky.

I have listed a few options here with their pros/cons. Please let me know about your thoughts on the below approaches.

Option 1 - Encode the revision of individual bundles

Consider the following JSON

{
"bundle1": "abcdef",
"bundle2": "efgh12",
"bundle3": "zxygjk"
}

The JSON represents the revision of each individual bundle. We can then do a base64 encoding and present that as the revision of the aggregate.

Advantage

Since the aggregated bundle revision itself captures the revisions of individual bundles, application teams can find the revision of their concerned bundle(s) when calling data APIs with provenance=true. This will be helpful during diagnosis if some authorization calls don't return proper responses.

Disadvantage

The base64 encoding increases as the number of individual bundles increase. With some simulations I ran, with 30 individual bundles, the above JSON when b64 encoded reaches 1 KB. It may perhaps impact latency of authorization calls slightly but is definitely a problem when versioning and storing the aggregate bundles. A fixed size revision would have been perfect.

Option 2 - Use a fixed size revision string and store actual revisions elsewhere

To solve the versioning problem, if we go with fixed size revisions, end users cannot decode the aggregate revision. They will have to refer to some database to get the indivudal bundle revisions, that actually matter to them. The disadvantages are clear - resource overhead (an additional database for storing the revisions). Furhter, the abstraction I talked about previously is being violated here. Also, end users need to make an extra hop to get their bundle revisions, which some may not prefer from a philosophical viewpoint (If teams are concerned with just their bundle revisions, why should they need to make an extra hop to get it?).

Option 3 - Use a fixed size revision string and store actual revisions in manifest metadata.

Here we get away with an additional database. However, the extra hop problem remains, as users will have to query /v1/data/system endpoint on OPA.
Furhter, if we choose this over Option-2, we lose the ability to view individual bundle revisions of older aggregate bundles, as part of audit.

Overall, this seems to me like a complicated problem, which has arisen because OPA is being used in a centralized and multitenancy environment, something which it is not designed for. However, as open-policy-agent/opa#6166 goes, many people prefer to use OPA as a centralized cluster for audit preparedness and other usecases.

My choice seems to be with Option-2. As always, many real world problems can never have perfect solutions. But I am happy to read advice from others in the community. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open Policy Agent

Recommendation for provenance revision stratgey when aggregation policy and data #513

{{title}}

Replies: 0 comments

Select a reply

Open Policy Agent

Recommendation for provenance revision stratgey when aggregation policy and data #513

HarshPathakhp Nov 9, 2023

Background

Option 1 - Encode the revision of individual bundles

Advantage

Disadvantage

Option 2 - Use a fixed size revision string and store actual revisions elsewhere

Option 3 - Use a fixed size revision string and store actual revisions in manifest metadata.

Replies: 0 comments

HarshPathakhp
Nov 9, 2023