Skip to content

Commit

Permalink
[Docs] Add DR002 for versioning in codegen
Browse files Browse the repository at this point in the history
Part of OpenAssetIO#88. Consolidate the discussion, provoked by iterations of the
design proposal, into a decision record.

Signed-off-by: David Feltell <david.feltell@foundry.com>
  • Loading branch information
feltech committed May 16, 2024
1 parent 928408a commit ebfb363
Showing 1 changed file with 226 additions and 0 deletions.
226 changes: 226 additions & 0 deletions decisions/DR002-Versioning-traits-and-specifications-codegen.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
# DR025 Versioning Traits and Specification - generated view classes

- **Status:** Decided
- **Impact:** High
- **Driver:** @feltech
- **Approver:** @elliotcmorris @themissingcow
- **Outcome:** Traits and specifications will be versioned independent
of the schema, there will be no concept of a schema version, and
Trait/Specification view classes will be generated with version
suffixes on the class name.

## Background

The medium of data exchange between a host and a manager is a logically
opaque data blob, i.e. a `TraitsData` object. In order to extract
information
from this object, Trait and/or Specification view classes must be
used[^1]. These classes wrap a `TraitsData` instance, and provide a
suite of accessor and mutator methods that are relevant to the target
trait. The classes are generated from a YAML schema (e.g. see
[traits.yml](../traits.yml)).

Hosts and managers may use different versions of the schema, and hence
different versions of the view classes, and yet still wish to work
together.

This decision record follows on from a previous decision (OpenAssetIO
[DR023](https://github.com/OpenAssetIO/OpenAssetIO/blob/main/doc/decisions/DR023-Versioning-traits-and-specifications-method.md))
that communicating a trait's version should be done by bundling the
version number with the data blob that is communicated across the API,
i.e. within `TraitsData`, most likely by appending the version number to
the unique trait ID.

With this previous decision in mind, we then need to decide on how the
trait versions are represented in the high level interface, i.e. in
the definition and usage of Trait/Specification view classes.

A motivating example should make this problem clear.

[^1]: In reality, a `TraitsData` is a simple dictionary-like structure,
and the `TraitsData` type has a low-level interface for interacting with
it, but usage of this is discouraged.

### Motivating example

An example usage of the current form of these generated classes might
be:

```python
url = LocatableContentTrait(trait_data).getLocation()
```

Imagine that we want to rename the LocatableContent trait's `"location"`
property to a more descriptive `"url"` property, hence changing the
generated view class's method from `getLocation` to `getUrl`.

Given that hosts and managers are developed independently, we may end up
with a situation where one side is setting `"location"` (using
`setLocation`) in the data, handing it over to the other side, who then
attempts to read `"url"` (using `getUrl`). I.e. we have a version
mismatch.

There is therefore an incompatibility at the data layer (i.e. field
names differ for the same semantic information). With C++, the data
layer is where the incompatibility ends. The Trait/Specification view
classes are private utility classes whose symbols should not be
exported, so there will be no source or binary incompatibility.

However, with Python there is no such concept of a private, build-time
only, class. The manager plugin and host application must use the same
`openassetio-mediacreation` distribution package in the Python
environment (not considering, for the moment, custom vendoring). So one
side or the other will hit an `AttributeError` exception when trying to
use a method from the version they developed against, rather than the
version installed into the environment.

### Assumptions

We need a way for host and manager plugin authors to work with multiple
trait versions.

* A Trait/Specification view class is needed for each version, such
that a user can imbue a particular version of a trait in some data;
and can detect that a particular version of a trait is imbued in some
data.
* Trait unique IDs will be suffixed with a version number. This means
two Trait view classes for the same trait, but for different versions,
will be treated as if they are entirely separate traits.
Version-agnostic utility functions may be added in the future, but it
is out of scope for now.
* If a Specification view class is used to construct/imbue a trait
set/data, that data will _not_ have the Specification version encoded
in the data directly (only implicitly through the versioned IDs of the
composite traits).

## Relevant data

[OpenTimelineIO schema
versioning](https://opentimelineio.readthedocs.io/en/latest/tutorials/otio-file-format-specification.html#example)
is perhaps the closest analog. The version of the schema is appended to
the schema ID whenever it appears within a OTIO JSON document.

The options presented were arrived at by sketching a proposal in [a Pull
Request](https://github.com/OpenAssetIO/OpenAssetIO-MediaCreation/pull/90),
soliciting feedback, and iterating. The final form of that PR reflects
the chosen option.

## Options considered

### Option 1 - Per schema versioning

When traits or specifications in the YAML document are updated, a
top-level schema version is incremented. During codegen, top-level
namespaces are created by providing multiple YAML documents, one for
each schema version.

For example

```python
from openassetio_mediacreation.v1.traits.content import LocatableContent as LocatableContent_v1
from openassetio_mediacreation.v2.traits.content import LocatableContent as LocatableContent_v2
from openassetio_mediacreation.v2.specifications.twoDimensional import ImageSpecification
```

#### Pros

- Tantalising possibility to use [Python namespace
packages](https://packaging.python.org/en/latest/guides/packaging-namespace-packages)
to allow different schema versions to be installed independently
side-by-side.
- The schema version a specification comes from instantly tells you the
schema version of the constituent traits.
- The YAML is kept small and focussed just on the latest versions.
- Minimal changes to the `traitgen` tool and existing YAML documents.
- Maintaining only the latest versions in the live YAML document
prevents accidental changes to old versions that could break backward
compatibility.
- The consumer is in charge of deciding which versions they support.
I.e. once a host/manager determines that they no longer wish to
support a particular version, they can stop
generating/installing/bundling subpackages for it.
- Once it is clear that a host/manager understands a particular schema
version (via `managementPolicy` or otherwise), the communicating
manager/host can be confident in using that schema version for other
traits/specifications.

#### Cons

- A source-incompatible breaking change, unless significant
special-casing is added.
- Verbose when using two versions in the same source file, either
requiring use of qualified names (e.g. `v1.traits.LocatableContent`)
or additional aliasing (e.g.
`from ... import LocatableContent as LocatableContent_v1`).
- Not clear at-a-glance which traits have changed between schema
versions, e.g. it's not clear if
`v2.traits.content.LocatableContentTrait` is the same as
`v1.traits.content.LocatableContentTrait`.
- Must compare multiple YAML documents side-by-side in order to discover
the history of changes to a particular trait/specification.
- Traits/specifications that are unchanged between versions implies
duplicated code across namespaces (though likely simply aliased).
- Independently generated/installed subpackages for each schema version
would mean that deprecation warnings could not be added to old
versions. This is mitigated if multiple versions are generated
together, where the older version can be detected and deprecation
warnings added by codegen.

### Option 2 - Per Trait/Specification versioning

A single YAML document is maintained, where each trait/specification
definition branches off into a list of versions. Old
Trait/specification versions can be marked as deprecated and removed
eventually, to prevent infinite growth.

For example

```python
from openassetio_mediacreation.traits.content import LocatableContent_v1
from openassetio_mediacreation.traits.content import LocatableContent_v2
from openassetio_mediacreation.specifications.twoDimensional import ImageSpecification_v2
```

#### Pros

- Fairly trivial to say "`_v1`" is equivalent to "" (blank), then e.g.
`import LocatableContent` continues to work as before, and this option
is fully source compatible. I.e. not a breaking change.
- Placing versions alongside one-another in the YAML definition allows
easy discovery of the history of changes.
- IDE code completion will list all versions of a Trait/Specification
view class next to one-another.

#### Cons

- No indication of the version of the constituent traits from the
version of a Specification view class.
- Large change to `traitgen` tool and non-trivial breaking change to
YAML documents.
- Keeping old versions in a living document (as opposed to e.g. git
history) is a potential source of accidental breakages to backward
compatibility.
- Generating all possible versions bloats an application's distribution,
when it may only use a small subset of them.
- Higher level branching on a schema version is never possible.
- Specification version must be bumped when a constituent trait has a
version bump, even if nothing else in the specification has changed.
Conceptually, specifications are trait version agnostic, but must
become version-aware for the purposes of codegen, which is
inconsistent.

## Outcome

We will implement Option 2 - Per Trait/Specification versioning.

A huge benefit is how much easier it is to make this solution a
non-breaking change to current users.

In addition, it has better discoverability through IDE code completion,
and it is easier to view history through a single YAML document rather
than across several documents.

There will be a rather large change to the `traitgen` tool and the YAML
JSON schema, causing a headache for any early adopters who are
generating their own traits. However, this is less critical than changes
to the generated output in use within pipelines.

0 comments on commit ebfb363

Please sign in to comment.