From 4cb891a0b26d1b723a4febbf60def0ea017d55c3 Mon Sep 17 00:00:00 2001 From: Alex Garcia Date: Tue, 12 Nov 2024 21:35:39 -0800 Subject: [PATCH] doc internal --- ARCHITECTURE.md | 54 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) create mode 100644 ARCHITECTURE.md diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 0000000..8ac9501 --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,54 @@ +## `vec0` + +### idxStr + +The `vec0` idxStr is a string composed of single "header" character and 0 or +more "blocks" of 4 characters each. + +The "header" charcter denotes the type of query plan, as determined by the +`enum vec0_query_plan` values. The current possible values are: + +| Name | Value | Description | +| -------------------------- | ----- | ---------------------------------------------------------------------- | +| `VEC0_QUERY_PLAN_FULLSCAN` | `'1'` | Perform a full-scan on all rows | +| `VEC0_QUERY_PLAN_POINT` | `'2'` | Perform a single-lookup point query for the provided rowid | +| `VEC0_QUERY_PLAN_KNN` | `'3'` | Perform a KNN-style query on the provided query vector and parameters. | + +Each 4-character "block" is associated with a corresponding value in `argv[]`. For example, the 1st block at byte offset `1-4` (inclusive) is the 1st block and is associated with `argv[1]`. The 2nd block at byte offset `5-8` (inclusive) is associated with `argv[2]` and so on. Each block describes what kind of value or filter the given `argv[i]` value is. + + +#### `VEC0_IDXSTR_KIND_KNN_MATCH` (`'{'`) + +`argv[i]` is the query vector of the KNN query. + +The remaining 3 characters of the block are `_` fillers. + +#### `VEC0_IDXSTR_KIND_KNN_K` (`'}'`) + +`argv[i]` is the limit/k value of the KNN query. + +The remaining 3 characters of the block are `_` fillers. + +#### `VEC0_IDXSTR_KIND_KNN_ROWID_IN` (`'['`) + +`argv[i]` is the optional `rowid in (...)` value, and must be handled with [`sqlite3_vtab_in_first()` / +`sqlite3_vtab_in_next()`](https://www.sqlite.org/c3ref/vtab_in_first.html). + +The remaining 3 characters of the block are `_` fillers. + +#### `VEC0_IDXSTR_KIND_KNN_PARTITON_CONSTRAINT` (`']'`) + +`argv[i]` is a "constraint" on a specific partition key. + +The second character of the block denotes which partition key to filter on, using `A` to denote the first partition key column, `B` for the second, etc. It is encoded with `'A' + partition_idx` and can be decoded with `c - 'A'`. + +The third character of the block denotes which operator is used in the constraint. It will be one of the values of `enum vec0_partition_operator`, as only a subset of operations are supported on partition keys. + +The fourth character of the block is a `_` filler. + + +#### `VEC0_IDXSTR_KIND_POINT_ID` (`'!'`) + +`argv[i]` is the value of the rowid or id to match against for the point query. + +The remaining 3 characters of the block are `_` fillers.