Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async Rust ADR #6549

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
191 changes: 191 additions & 0 deletions docs/adr/0009-async-rust.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
# Using Async Rust

* Status: draft
* Deciders:
* Date: ???
* Previous discussion: https://github.com/mozilla/application-services/pull/5910

## Context and Problem Statement

Our Rust components are currently written using synchronous Rust, however all current consumers wrap them in another class to present an async-style interface:

* Firefox Android uses `CoroutineScope.runBlocking` to adapt methods to be `suspend`
* Firefox iOs mostly uses `DispatchQueue` and completion handlers, but the long-term plan is to transition to `async` functions.
Some components already do this, using `withCheckedThrowingContinuation` on top of `DispatchQueue` to define async functions.
* Firefox Desktop auto-generates C++ wrapper code to make them async using a [TOML config file](https://searchfox.org/mozilla-central/rev/cdfe21b20eacfaa6712dd9821d6383859ce386c6/toolkit/components/uniffi-bindgen-gecko-js/config.toml).
This was chosen because JS is single-threaded and doesn't provide a simple way to run blocking functions in a task queue.
One drawback from this choice is that Desktop has a very different async system compared to the mobile apps.

With the new UniFFI async capabilities, it's possible to move the async code into Rust and avoid this wrapper layer.
This ADR discusses if we should do this, how we could implement it, and what our general approach to async should be.
To make things concrete, the Suggest component is used as an example of what would change if we moved to async Rust.

### Scope

This ADR covers the question of using a wrapper layer to implant async functionality.
It does not cover some related questions:

* **Scheduling this work.**
If we decide to embrace async Rust, we do not need to commit to any particular deadline for switching to it.
* **Wrappers in general.**
[The previous PR](https://github.com/mozilla/application-services/pull/5910/) dedicated a separate ADR for this question, but we never came to a decision on this.
It seems that there's no single answer to the question, the general consensus was that we should evaluate things on a case-by-case basis and this ADR will just focus on the case of async.
* **Wrappers in consumer code.**
If we change things so that the current async wrapping layer is no longer needed, consumer engineers will still have a choice on if they want to keep their current wrapper layer or not.
This choice should be left to consumer engineers.
This ADR will touch on this, but not recommend anything.

## Considered Options

* **Option A: Do nothing.**
Keep the current system in place.
* **Option B: Async Rust using the current UniFFI.**
Make our component methods async by requiring the foreign code to pass in an interface that runs tasks in a worker queue and using
`oneshot::Channel` to communicate the results back. See below for how this would work in practice.
* **Option C: Async Rust with additional UniFFI support.**
Like `B`, but make `WorkerQueue` and `RustTask` built-in UniFFI types.
This would eliminate the need to define our own interfaces for these, instead UniFFI would allow foreign code to passing in `DispatchQueue`/`CoroutineScopes` to Rust and Rust could use those to run blocking tasks in a work queue.
* **Option D: Extend the uniffi-bindgen-gecko-js config system to Mobile.**
Extend the Gecko-JS system, where the config specifies that functions/methods should be wrapped as async, to also work for Kotlin/Swift.

### Example code

I (Ben) made some diffs to illustrate how the code would change for each option.
When doing that, I realized the wrapper layer was actually implementing important functionality for the component:

* The Kotlin code extended our interrupt support to also interrupt pending `query()` calls.
* The Kotlin code also catches all errors and coverts them into regular returns (`query()` returns an empty list and `ingest()` returns false).
* The Swift code split async methods into 2 categories: low-priority calls like `ingest()` and high-priority calls like `query()`

As part of the example changes, I moved this functionality to our Rust components.
This results in some extra code in the diffs, but I thought it was good to explore the messy details of this transition.
A important factor for deciding this ADR is where we want this functionality to live.

#### Option B
- [app-services changes](./files/0009/option-b-app-services.diff)
- [android-components changes](./files/0009/option-b-android-components.diff)
- [Firefox iOS changes](./files/0009/option-b-firefox-ios.diff)
- [Firefox Desktop changes](./files/0009/option-b-firefox-desktop.diff)

This option is possible to implement today, so I was able to test that the app-services and android-components changes actually compiled
I didn't do that for iOS and desktop, mostly because it's harder to perform a local build.
I think we can be confident that the actual changes would look similar to this.

Summary of changes:
* Added the `WorkerQueue` and `RustTask` interfaces
* `RustTask` encapsulates a single task
* `WorkerQueue` is implemented by the foreign side and runs a `RustTask` in a work queue where it's okay to block.
* Use the above interfaces to wrap sync calls to be async, by running them in the `WorkerQueue` then sending the result via a `oneshot::Channel` that the
original async function was `await`ing.
* This is also a good place to do the error conversion/reporting.
* SuggestStoreBuilder gets a `build_async` method, which creates an `SuggestStoreAsync`.
It inputs 2 `WorkerQueue`s: one for ingest and one for everything else.
* Added supporting code so that `query()` can create it's `SqlInterruptScope` ahead of time, outside of the scheduled task.
That way `interrupt()` can also interrupt pending calls to `query()`.
* Updated the `ingest()` method to catch errors and return `false` rather than propagating them.
In general, I think this is a better API for consumers since there's nothing meaningful they can do with errors other than report them.
`query()` should probably also be made infallible.
* Removed the wrapper class from `android-components`, but kept the wrapper class in `firefox-ios`.
The goal here was to show the different options for consumer code.

#### Option C
- [app-services changes](./files/0009/option-c-app-services.diff)
- [android-components changes](./files/0009/option-c-android-components.diff)
- [Firefox iOS changes](./files/0009/option-c-firefox-ios.diff)
- [Firefox Desktop changes](./files/0009/option-c-firefox-desktop.diff)

This option assumes that UniFFI will provide something similar to `WorkerQueue`, so we don't need to define/implement that in app-services or the consumer repos.
This requires changes to UniFFI core, so none of this code works today.
However, I think we can be fairly confident that these changes will work since we have a long-standing [UniFFI PR](https://github.com/mozilla/uniffi-rs/pull/1837) that implements a similar feature -- in fact it's a more complex version.

Summary of changes: essentially the same as `B`, but we don't need to define/implement the `WorkerQueue` and `RustTask` interfaces.

#### Option D

I'm not completely sure how this one would work in practice, but I assume that it would mean TOML configuration for Kotlin/Swift similar to the current Desktop configuration:

```
[suggest.async_wrappers]
# All functions/methods are wrapped to be async by default and must be `await`ed.
enable = true
# These are exceptions to the async wrapping. These functions must not be `await`ed.
main_thread = [
"raw_suggestion_url_matches",
"SuggestStore.new",
"SuggestStore.interrupt",
"SuggestStoreBuilder.new",
"SuggestStoreBuilder.data_path",
"SuggestStoreBuilder.load_extension",
"SuggestStoreBuilder.remote_settings_bucket_name",
"SuggestStoreBuilder.remote_settings_server",
"SuggestStoreBuilder.build",
]
```

This would auto-generate async wrapper code.
For example, the Kotlin code would look something like this:

```
class SuggestStore {
/**
* Queries the database for suggestions.
*/
suspend fun query(query: SuggestionQuery): List<Suggestion> =
withContext(Dispatchers.IO) {
// ...current auto-generated code here
}
```

## Decision Outcome

## Pros and Cons of the Options

### (A) Do nothing

* Good, because it requires the least amount of work
* Good, because it's proven to work
* Bad, because async is handled very differently in JS vs Kotlin+Swift
* Bad, because our component functionality implemented in wrapper code.
It needs to be duplicated or will be only available on one platform.
For example, the Kotlin functionality to interrupt pending `query()` calls would also be useful on Swift, but we never implemented that.
* Bad, because our component functionality is implemented in other repos, which makes it likely to go stale.
However, we could mitigate this moving that Kotlin/Swift wrapper code into app-services.
* Bad, because it makes our documentation system worse. For example, our docs for [query()](https://mozilla.github.io/application-services/kotlin/kotlin-components-docs/mozilla.appservices.suggest/-suggest-store/query.html) say it's a sync function, but it practice it's actually async.

### (B) Async Rust using the current UniFFI

* Good, because we'll have a common system for Async that works similarly all platforms
* Good, because our generated docs will match how the methods are used in practice.
* Good, because it encourages us to move async complexity into the core code.
This makes it available to all platforms and more likely to be maintained.
* Good because it opens the door for more efficient thread usage.
For example, we could make methods more fine-grained and only use the work queue for SQL operations, but not for network requests.
* Bad, because we're taking on risk by introducing the async UniFFI code to app-services.
* Bad, because our consumers need to define all a `WorkerQueue` implementations, which is a bit of a burden.
This feels especially bad on JS, where the whole concept of a work queue feels alien.
* Bad, because it makes it harder to provide bindings on new languages that don't support async, like C and C++.
Maybe we could bridge the gap with some sort of callback-based async system, but it's not clear how that would work.

### (C) Async Rust with additional UniFFI support

* Good/Bad/Risky for mostly the same reasons as (B).
* Good, because it removes the need for us and consumers to define `WorkerQueue` traits/impls.
* Good, because it can simplify the `WorkerQueue` code.
In particular, we can guarantee that the task is only executed once, which removes the need for `RustTaskContainer`.
* Bad, because we'll have to maintain the UniFFI worker queue code.

### (D) Extend the uniffi-bindgen-gecko-js config system to Mobile

* Good, because we'll have a common system for Async that works similarly all platforms
* Mostly good, because our generated docs will match how the methods are used in practice.
However, it could be weird to write docstrings for sync functions that are wrapped to be async.
* Good, because it's less Risky than B/C.
The system would just be auto-generating the kind of wrappers we already used.
* Bad, because it's hard for consumer engineers to configure.
For example, how can firefox-ios devs pick which QOS they want to use with a `DispatchQueue`?
They would probably have to specify it in the config file, which is less convent than passing it in from code.
* Bad, because it's not clear how we could handle complex cases like using both a low-priority and high-priority queue.

## Decision Outcome
?
Loading