Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store trait #21

Merged
merged 45 commits into from
Aug 30, 2024
Merged

Store trait #21

merged 45 commits into from
Aug 30, 2024

Conversation

sgwilym
Copy link
Contributor

@sgwilym sgwilym commented Jun 24, 2024

Work in progress, currently being discussed and iterated upon.

@sgwilym sgwilym added the enhancement New feature or request label Jun 24, 2024
@AljoschaMeyer
Copy link
Contributor

Design Notes on a Willow Store Trait: Mutation

For our rust implementation of Willow, we are designing a trait to abstract over data stores. Among the features that a store must provide are ingesting new entries, querying areas, resolving payloads, and subscribing to change notifications. Turns out this trait becomes rather involved. In this writeup, I want to focus on a small subset of the trait: everything that allows user code to mutate a data store.

On the surface, Willow stores support only a single mutating operation: ingesting new entries. Does the following trait (heavily simplified on the type-level) do the trick?

trait Store {
    async fn ingest(&mut self, entry: Entry) -> Result<(), StoreError>;
}

Nope, there is actually a whole lot more to consider. We'll start with something simple: while the data model only considers adding new entries to a store, there is a second operation that all implementations should support: locally deleting entries. We want to support this both for explicit entries, and for whole areas:

trait Store {
    async fn ingest(&mut self, entry: Entry) -> Result<(), StoreError>;
    async fn forget(&mut self, entry: Entry) -> Result<(), StoreError>;
    async fn forget_area(&mut self, area: Area3d) -> Result<(), StoreError>;
    // We also need ingestion and forgetting of payloads,
    // but we'll leave that for later.
}

Now we have a sketch of a somewhat useable API, but it does not admit particularly efficient implementations. We want stores to (potentially) be backed by persistent storage. But persisting data to disk is expensive. Typically, writes are performed to an in-memory buffer, and then periodically (or explicitly on demand) flushed to disk (compare the fsync syscall in posix). To support this pattern, we should change the semantics of our methods to "kindly asking the store to eventually do something", and add a flush method to force an expensive commitment of all prior mutations (typically by fsyncing a write-ahead log).

trait Store {
    async fn ingest(&mut self, entry: Entry) -> Result<(), StoreError>;
    async fn forget(&mut self, entry: Entry) -> Result<(), StoreError>;
    async fn forget_area(&mut self, area: Area3d) -> Result<(), StoreError>;
    async flush(&mut self) -> Result<(), StoreError>;
}

Another typical optimisation is that of efficiently operating on slices of bulk data instead of individual pieces of data. Forgetting should hopefully be rare enough, but we should definitely have a method for bulk ingestion.

trait Store {
    async fn ingest(&mut self, entry: Entry) -> Result<(), StoreError>;
    async fn ingest_bulk(&mut self, entries: &Entry[]) -> Result<(), StoreError>;
    async fn forget(&mut self, entry: Entry) -> Result<(), StoreError>;
    async fn forget_area(&mut self, area: Area3d) -> Result<(), StoreError>;
    async flush(&mut self) -> Result<(), StoreError>;
}

The next issue is one of interior versus exterior mutability. Those async methods desugar to returning Futures. And since the methods take a mutable reference to the store as an argument, no other store methods can be called while such a Future is in scope. Hence, this trait forces sequential issuing of commands, with no concurrency. While it might be borderline acceptable to force linearization of all mutating store accesses (especially since most method calls would simply place an instruction in a buffer rather than actually performing and committing side-effects), the inability to mutate the store while also querying it (say, by subscribing to changes) is a dealbreaker. Thus, we should change those methods to take an immutable &self reference, forcing store implementatons to employ interior mutability. To the experienced rust programmer, this shouldn't be too surprising: the whole point of a store is to provide mutable, aliased state, after all.

trait Store {
    async fn ingest(& self, entry: Entry) -> Result<(), StoreError>;
    async fn ingest_bulk(&self, entries: &Entry[]) -> Result<(), StoreError>;
    async fn forget(&self, entry: Entry) -> Result<(), StoreError>;
    async fn forget_area(&self, area: Area3d) -> Result<(), StoreError>;
    async flush(&self) -> Result<(), StoreError>;
}

Finally, there is a line of thought that I'm less confident about yet. Starting with simple ingest operations and then adding support for buffering (and flushing of the buffer) and bulk operations duplicates a lot of the design of the ufotofu BulkConsumer. So perhaps it would make sense to simply expose a BulkConsumer, whose items are an enum for the different operations (ingest, forget, forget_area). Replacing method calls with explicitly moving around enum values might sound inefficient, but there's an argument to be made that any efficient buffering operation would implement the method calls by storing a reification of the operation inside some buffer anyways. The main upside would be that of using a principled abstraction instead of providing a zoo of methods that end up implementing the same semantics anyways.

Aside from the drawback of forcing the explicit reification of operations, another drawback of the consumer approach would be the question of how to obtain the consumer. If the store itself implemented BulkConsumer, then the issue of exerior mutability would make it unusuable. If the store had a function that took a &self and returned an owned handle that implements BulkConsumer, then arbitrarily many handles could be created, i.e., the consumer would be forced to support many writers. I don't really see a way around that. So the two options seem to either be a multi-writer consumer, or a collection of (interiorly mutable) methods on the store trait without an explicit consumer.

This is the main issue I wanted to convey in this writeup. I have glossed over various details (generics and assocated types, precise errors, information to return on successful operations, an ingest-unless-it-would-overwrite method, payload handling, etc.), because those are comparatively superficial. But I'd love to hear some feedback on the issues of interior vs exterior mutability and explicit vs implicit (buffered, bulk) consumer semantics.

@AljoschaMeyer
Copy link
Contributor

AljoschaMeyer commented Jul 1, 2024

Storage requirements

  • user-facing
    • mutation
      • ingest entry
        • optionally do nothing if it would overwrite a (strict) extension of the path
      • append to payload prefix
      • forget entry by pair of path and subspace id
      • forget by area (of interest)
      • forget all but area (of interest) (within a containing area (of interest)) (?)
      • all forget functionality but for payloads instead of entries
      • all forget functionality comes with a traceless flag
        • if traceless, the storage keeps no record of ever having had the data
        • if not traceless, the storage is allowed to persist the forget query for an arbitrary amount of time, in order to accurately inform persistent subscribers about the forgetting in the future
          • non-traceless forgetting does not imply rejecting future data that matches the forget operation, such a service must live at a different level (and must be implemented by not ingesting the data again in the first place)
      • flush (force persistence of all prior mutations)
        • one-shot queries may or may not observe mutations before they were flushed
        • subscriptions are only notified of mutations that have been flushed successfully
    • query
      • query entry by pair of path and subspace id
        • yields Entry, AuthorisationToken, available payload prefix (length)
        • option to filter out results with incomplete payloads
      • query entries by area (of interest)
        • all of the functionality of singleton queries, also
        • ordering results (arbitrary, PTS (optionally reversed), TSP (optionally reversed), SPT (optionally reversed)) (?)
      • subscriptions: all queries should also work as long-lived subscriptions
        • this includes subscription to payload prefix appending
        • this also includes notifications about overwriting and forgetting things
      • persistent subscriptions
        • all subscription notifications come with a u64 progress id, client code can stop consuming a subscription at any time and can later (even between program shutdown and restart) resume it at the same point by supplying its progress id
        • this is a best-effort service, the storage may reject a resumption because it is too outdated
          • a simplistic implementation can effectively opt out of providing persistent subscriptions by simply considering all ids as outdated
        • forgetting entries/payloads can cause progress ids to become outdated, unless the storage tracks the things it forgot
          • this is why all forget functionality comes with a flag whether it should be completely traceless or whether the storage is allowed to track the act of forgetting
  • internal (for replication)
    • all of the public-facing functionality may also be used internally
    • query entries by 3dRange
      • yields Entry, AuthorisationToken, available payload prefix (length)
      • option to filter out results with incomplete payloads (?)
      • optionally order according to the ReconciliationAnnounceEntries:will_sort field
      • optionally as a subscription (?)
    • summarise 3dRange (count, fingerprint)
    • convert area of interest to 3dRange
    • partition a 3dRange into k roughly evenly sized 3dRanges

@sgwilym
Copy link
Contributor Author

sgwilym commented Aug 20, 2024

I'm starting to work in earnest on this again.

One thing I have thought about is how what a store is in our implementations is not what a store is in the spec:

A store is a set of AuthorisedEntries ...

Whereas in our implementations, a store is a set of authorised entries AND a partial set of corresponding payloads.

In willow-js, this led to this situation where you need to specify how a store's payloads should be stored and retrieved via PayloadDrivers, ramping up the complexity of the store.

I suggested the opportunity to separate the two concepts to @AljoschaMeyer, so that we'd have Stores concerned only with entries and, err, PayloadStores concerned only with Payloads. Rather than have Store.ingest_payload, we could have Store.update_available_payload_length to update the fingerprints.

But Aljoscha raised a good point:

I feel like internally the implementations need to be coupled though, say, for locking.

Which is persuasive, but maybe there's some way around it? Which is why I'm sharing this here.

@sgwilym
Copy link
Contributor Author

sgwilym commented Aug 21, 2024

@AljoschaMeyer I have now added all mutation methods (unless we want a bulk payload ingestion method?)

@sgwilym sgwilym mentioned this pull request Aug 22, 2024
data-model/src/store.rs Outdated Show resolved Hide resolved
data-model/src/store.rs Show resolved Hide resolved
data-model/src/store.rs Outdated Show resolved Hide resolved
data-model/src/store.rs Outdated Show resolved Hide resolved
/// Attempt to ingest many [`AuthorisedEntry`] in the [`Store`].
///
/// The result being `Ok` does **not** indicate that all entry ingestions were successful, only that each entry had an ingestion attempt, some of which *may* have returned [`EntryIngestionError`]. The `Err` type of this result is only returned if there was some internal error.
fn bulk_ingest_entry(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning a Vec of errors causes an allocation that might be avoidable (or at least configurable) if the method took a consumer of BulkIngestionResults as an argument instead? Not entirely sure whether that would really be an improvement. CC @Frando

Copy link
Contributor Author

@sgwilym sgwilym Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I think you would always want to know about errors, right? The consumer pattern is cool, but is it maybe a little fancy?

@sgwilym sgwilym requested a review from AljoschaMeyer August 27, 2024 14:04
data-model/src/store.rs Outdated Show resolved Hide resolved
@sgwilym sgwilym requested a review from AljoschaMeyer August 29, 2024 11:18
@Frando
Copy link
Contributor

Frando commented Aug 29, 2024

Note: Many of the structs still need the standard derives, Debug, Clone, and Eq/PartialEq where applicable.

@sgwilym sgwilym merged commit 442854e into main Aug 30, 2024
14 checks passed
@sgwilym sgwilym deleted the store branch August 30, 2024 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants