Skip to content

Commit

Permalink
decide whether each API should be server- or client-side-versioned (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
davepacheco authored Dec 14, 2024
1 parent 89d6c76 commit 3449a64
Show file tree
Hide file tree
Showing 8 changed files with 591 additions and 11 deletions.
3 changes: 2 additions & 1 deletion .github/buildomat/build-and-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@ banner ls-apis
source ./tools/include/force-git-over-https.sh;
ptime -m cargo xtask ls-apis apis &&
ptime -m cargo xtask ls-apis deployment-units &&
ptime -m cargo xtask ls-apis servers
ptime -m cargo xtask ls-apis servers &&
ptime -m cargo xtask ls-apis check
)

#
Expand Down
24 changes: 24 additions & 0 deletions dev-tools/ls-apis/README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,30 @@ You can generate a PNG image of the graph like this:
$ dot -T png -o deployment-units.png deployment-units.dot
```

=== Checking the upgrade DAG

As of this writing:

* All Dropshot/Progenitor APIs in the system are identified with this tool (as far as we know).
* Every API is annotated with metadata in `api-manifest.toml`.
* The API metadata specifies whether versioning for the API is managed on the server side exclusively or using client-side versioning as well. We've made this choice for every existing API.
* There are no cycles in the server-side-managed-API dependency graph.

This tool verifies these properties. If you add a new API or a new API dependency without adding valid metadata, or if that metadata breaks these constraints, you will have to fix this up before you can land the change to Omicron.

You can verify the versioning metadata using:

```
$ cargo xtask ls-apis check
```

You might see any of the following errors:

* `+missing field `versioned_how`+`: this probably means that you added an API to the metadata but neglected to specify the `versioned_how` field. This must be either `client` or `server`. In general, prefer `server`. The tool should tell you if you try to use `server` but that can't work. If in doubt, ask the update team.
* `at least one API has unknown version strategy (see above)`: this probably means you added an API with `version_how = "unknown"`. (The specific API(s) with this problem are listed immediately above this error.) You need to instead decide if it should be client-side or server-side versioned. If in doubt, ask the update team.
* `+missing field `versioned_how_reason`+`: this probably means that you added an API with `versioned_how = "client"`, but did not add a `versioned_how_reason`. This field is required. It's a free-text field that explains why we couldn't use server-side versioning for this API.
* `graph of server-managed components has a cycle (includes node: "omicron-nexus")`: this probably means that you added an API with `versioned_how = "server"`, but that created a circular dependency with other server-side-versioned components. You'll need to break the cycle by changing one of these components (probably your new one) to be client-managed instead.
* `API identified by client package "nexus-client" (Nexus Internal API) is the "client" in a "non-dag" dependency rule, but its "versioned_how" is not "client"` (the specific APIs may be different): this (unlikely) condition means just what it says: the API manifest file contains a dependency rule saying that some API is *not* part of the update DAG, but that API is not marked accordingly. See the documentation below on filter rules.

== Details

Expand Down
76 changes: 76 additions & 0 deletions dev-tools/ls-apis/api-manifest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,28 @@ packages = [ "oximeter-collector" ]
#
# `server_package_name`: the package that contains the Dropshot API definition.
#
# `versioned_how`: one of:
#
# "server": this component uses server-side versioning only, meaning that the
# update system can ensure that servers are always updated before
# clients.
#
# "client": this component uses client-side versioning, meaning that the
# update system cannot ensure that servers are always updated
# before clients, so clients need to deal with multiple different
# server versions.
#
# You must also specify `versioned_how_reason` if you use "client".
#
# "unknown": this has not been determined. This will break tests. But the
# tooling supports this value so that during development you can
# relax these constraints.
#
# [`versioned_how_reason`]: free text string explaining why `versioned_how` must
# be "client" for this API. This is printed in documentation and command output
# and serves as documentation for developers to understand why we made this
# choice. This should be present only if `versioned_how = "client"`.
#
# [`notes`]: optional free-form human-readable summary documentation about this
# API
################################################################################
Expand All @@ -140,11 +162,14 @@ packages = [ "oximeter-collector" ]
client_package_name = "bootstrap-agent-client"
label = "Bootstrap Agent"
server_package_name = "bootstrap-agent-api"
versioned_how = "client"
versioned_how_reason = "depends on itself (i.e., instances call each other)"

[[apis]]
client_package_name = "clickhouse-admin-keeper-client"
label = "Clickhouse Cluster Admin for Keepers"
server_package_name = "clickhouse-admin-api"
versioned_how = "server"
notes = """
This is the server running inside multi-node Clickhouse keeper zones that's \
responsible for local configuration and monitoring.
Expand All @@ -154,6 +179,7 @@ responsible for local configuration and monitoring.
client_package_name = "clickhouse-admin-server-client"
label = "Clickhouse Cluster Admin for Servers"
server_package_name = "clickhouse-admin-api"
versioned_how = "server"
notes = """
This is the server running inside multi-node Clickhouse server zones that's \
responsible for local configuration and monitoring.
Expand All @@ -163,6 +189,7 @@ responsible for local configuration and monitoring.
client_package_name = "clickhouse-admin-single-client"
label = "Clickhouse Single-Node Cluster Admin"
server_package_name = "clickhouse-admin-api"
versioned_how = "server"
notes = """
This is the server running inside single-node Clickhouse server zones that's \
responsible for local configuration and monitoring.
Expand All @@ -172,6 +199,7 @@ responsible for local configuration and monitoring.
client_package_name = "cockroach-admin-client"
label = "CockroachDB Cluster Admin"
server_package_name = "cockroach-admin-api"
versioned_how = "server"
notes = """
This is the server running inside CockroachDB zones that performs \
configuration and monitoring that requires the `cockroach` CLI.
Expand All @@ -181,11 +209,14 @@ configuration and monitoring that requires the `cockroach` CLI.
client_package_name = "crucible-agent-client"
label = "Crucible Agent"
server_package_name = "crucible-agent"
versioned_how = "server"

[[apis]]
client_package_name = "repair-client"
label = "Crucible Repair"
server_package_name = "crucible-downstairs"
versioned_how = "client"
versioned_how_reason = "depends on itself (i.e., instances call each other)"
notes = """
The repair service offered by a crucible-downstairs supports both repairing \
one downstairs from another, and making a clone of a read-only downstairs \
Expand All @@ -196,11 +227,13 @@ when creating a new region in the crucible agent.
client_package_name = "crucible-pantry-client"
label = "Crucible Pantry"
server_package_name = "crucible-pantry"
versioned_how = "server"

[[apis]]
client_package_name = "ddm-admin-client"
label = "Maghemite DDM Admin"
server_package_name = "ddmd"
versioned_how = "server"
notes = """
The `ddmd` server runs in each sled GZ and each switch zone. These daemons \
provide an interface for advertising network prefixes, and observing what \
Expand All @@ -216,11 +249,17 @@ observability APIs in the future.
client_package_name = "dns-service-client"
label = "DNS Server"
server_package_name = "dns-server-api"
versioned_how = "server"

[[apis]]
client_package_name = "dpd-client"
label = "Dendrite DPD"
server_package_name = "dpd"
# TODO We might need to pick this apart more carefully. DPD invokes and is
# invoked by several different services. It's possible that some of them can
# treat it as server-managed.
versioned_how = "client"
versioned_how_reason = "Sled Agent calling DPD creates a circular dependency"
notes = """
Dendrite's data plane daemon (`dpd`) is the interface to configure and manage \
the rack switches. It's consumed by sled-agent to get the rack off the \
Expand All @@ -235,17 +274,29 @@ repo is not currently open source.
client_package_name = "gateway-client"
label = "Management Gateway Service"
server_package_name = "gateway-api"
versioned_how = "server"
notes = "Wicketd is deployed in a unit with MGS so we can ignore that one."

[[apis]]
client_package_name = "installinator-client"
label = "Wicketd Installinator"
server_package_name = "installinator-api"
# The installinator-client is used only by Installinator. This is part of the
# recovery OS image. Today, this is not really "shipped" in the traditional
# sense. The only way it gets used today is by an operator uploading it to
# Wicket and then performing a MUPdate. In this sense, we do not control the
# client, so we mark this client-managed -- it's up to the operator to make sure
# they're using a TUF repo whose recovery image contains an Installinator that's
# compatible with the API served by Wicket on the deployed system. In practice,
# we're almost certainly just going to avoid changing this API.
versioned_how = "client"
versioned_how_reason = "client is provided implicitly by the operator"

[[apis]]
client_package_name = "mg-admin-client"
label = "Maghemite MG Admin"
server_package_name = "mgd"
versioned_how = "server"
notes = """
The `mgd` daemon runs in each switch zone. This daemon is responsible for all \
external route management for a switch. It provides interfaces for static \
Expand All @@ -258,17 +309,25 @@ bring the rack up.
client_package_name = "nexus-client"
label = "Nexus Internal API"
server_package_name = "nexus-internal-api"
# nexus-client has to be client-versioned today because it's got a cyclic
# dependency with sled-agent-client, which is server-versioned.
versioned_how = "client"
versioned_how_reason = "Circular dependencies between Nexus and other services"

[[apis]]
client_package_name = "oxide-client"
label = "External API"
server_package_name = "nexus-external-api"
# The versioning strategy for the external API is outside the scope of this
# tool. It doesn't matter what we put here.
versioned_how = "server"
notes = "Special case, since we don't fully control all clients"

[[apis]]
client_package_name = "oximeter-client"
label = "Oximeter"
server_package_name = "oximeter-api"
versioned_how = "server"
notes = """
Shared types for this interface are in `omicron-common`. The producer makes \
requests to Nexus, and receives them from `oximeter-collector`. \
Expand All @@ -281,6 +340,7 @@ and makes the periodic renewal requests to `oximeter-collector`.
client_package_name = "propolis-client"
label = "Propolis"
server_package_name = "propolis-server"
versioned_how = "server"
notes = """
Sled Agent is deployed in a unit with Propolis so we can ignore that one.
"""
Expand All @@ -289,16 +349,20 @@ Sled Agent is deployed in a unit with Propolis so we can ignore that one.
client_package_name = "sled-agent-client"
label = "Sled Agent"
server_package_name = "sled-agent-api"
versioned_how = "server"

[[apis]]
client_package_name = "repo-depot-client"
label = "Repo Depot API"
server_package_name = "repo-depot-api"
versioned_how = "client"
versioned_how_reason = "depends on itself (i.e., instances call each other)"

[[apis]]
client_package_name = "wicketd-client"
label = "Wicketd"
server_package_name = "wicketd-api"
versioned_how = "server"
notes = """
wicketd-client is only used by wicket, which is deployed in a unit with wicketd.
"""
Expand All @@ -307,6 +371,7 @@ wicketd-client is only used by wicket, which is deployed in a unit with wicketd.
client_package_name = "crucible-control-client"
label = "Crucible Control (for testing only)"
server_package_name = "crucible"
versioned_how = "server"
notes = """
Exposed by Crucible upstairs for debugging via the `cmon` debugging tool.
"""
Expand All @@ -315,6 +380,7 @@ Exposed by Crucible upstairs for debugging via the `cmon` debugging tool.
client_package_name = "dsc-client"
label = "Downstairs Controller (debugging only)"
server_package_name = "dsc"
versioned_how = "server"
dev_only = true
notes = """
`dsc` is a control program for spinning up and controlling instances of Crucible
Expand Down Expand Up @@ -400,6 +466,16 @@ DNS server depends on itself only to provide TransientServer, which is not a
deployed component.
"""

[[dependency_filter_rules]]
ancestor = "omicron-sled-agent"
client = "sled-agent-client"
evaluation = "not-deployed"
note = """
Sled Agent uses a Sled Agent client in two ways: RSS, which we don't care about
for the purpose of upgrade, and in the `zone-bundle` binary, which is not
deployed.
"""

#
# "Bogus" dependencies. See above for details.
#
Expand Down
34 changes: 34 additions & 0 deletions dev-tools/ls-apis/src/api_metadata.rs
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,20 @@ impl AllApiMetadata {

Ok(which_rules[0].evaluation)
}

/// Returns the list of APIs that have non-DAG dependency rules
pub(crate) fn non_dag_apis(&self) -> impl Iterator<Item = &ApiMetadata> {
self.dependency_rules.iter().filter_map(|(client_pkgname, rules)| {
rules.iter().any(|r| r.evaluation == Evaluation::NonDag).then(
|| {
// unwrap(): we previously verified that the "client" for
// all dependency rules corresponds to an API that we have
// metadata for.
self.apis.get(client_pkgname).unwrap()
},
)
})
}
}

/// Format of the `api-manifest.toml` file
Expand Down Expand Up @@ -186,6 +200,23 @@ impl TryFrom<RawApiMetadata> for AllApiMetadata {
}
}

/// Describes how an API in the system manages drift between client and server
#[derive(Deserialize, PartialEq, Eq)]
#[serde(rename_all = "snake_case")]
#[serde(tag = "versioned_how", content = "versioned_how_reason")]
pub enum VersionedHow {
/// We have not yet determined how this API will be versioned.
Unknown,

/// This API will be versioned solely on the server. (The update system
/// will ensure that servers are always updated before clients.)
Server,

/// This API will be versioned on the client. (The update system cannot
/// ensure that servers are always updated before clients.)
Client(String),
}

/// Describes one API in the system
#[derive(Deserialize)]
pub struct ApiMetadata {
Expand All @@ -199,6 +230,9 @@ pub struct ApiMetadata {
pub server_package_name: ServerPackageName,
/// human-readable notes about this API
pub notes: Option<String>,
/// describes how we've decided this API will be versioned
#[serde(default, flatten)]
pub versioned_how: VersionedHow,
/// If `dev_only` is true, then this API's server is not deployed in a
/// production system. It's only used in development environments. The
/// default is that APIs *are* deployed.
Expand Down
Loading

0 comments on commit 3449a64

Please sign in to comment.