-
Notifications
You must be signed in to change notification settings - Fork 400
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add support for Vector Search Indexes (#3266)
* Add support for Vector Search Indexes This is initial work - right now things are blocked by errors in OpenAPI spec, and as result, Go SDK has incorrect structs. So we need to wait until spec is fixed * fix schema * Upgrade to the latest SDK and update schema customization * Add unit tests * Fix bad merge * Wait for deletion * Add documentation * Use `VectorIndex` instead of `CreateVectorIndexRequest` as the base structure * Fix copy/paste error
- Loading branch information
Showing
9 changed files
with
367 additions
and
56 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
--- | ||
subcategory: "Vector Search" | ||
--- | ||
# databricks_vector_search_index Resource | ||
|
||
-> **Note** This resource could be only used on Unity Catalog-enabled workspace! | ||
|
||
This resource allows you to create [Vector Search Index](https://docs.databricks.com/en/generative-ai/create-query-vector-search.html) in Databricks. Vector Search is a serverless similarity search engine that allows you to store a vector representation of your data, including metadata, in a vector database. The Vector Search Index provides the ability to search data in the linked Delta Table. | ||
|
||
## Example Usage | ||
|
||
```hcl | ||
resource "databricks_vector_search_index" "sync" { | ||
name = "main.default.vector_search_index" | ||
endpoint_name = databricks_vector_search_endpoint.this.name | ||
primary_key = "id" | ||
index_type = "DELTA_SYNC" | ||
delta_sync_index_spec { | ||
source_table = "main.default.source_table" | ||
pipeline_type = "TRIGGERED" | ||
embedding_source_columns { | ||
name = "text" | ||
embedding_model_endpoint_name = databricks_model_serving.this.name | ||
} | ||
} | ||
} | ||
``` | ||
|
||
## Argument Reference | ||
|
||
The following arguments are supported (change of any parameter leads to recreation of the resource): | ||
|
||
* `name` - (required) Three-level name of the Vector Search Index to create (`catalog.schema.index_name`). | ||
* `endpoint_name` - (required) The name of the Vector Search Endpoint that will be used for indexing the data. | ||
* `primary_key` - (required) The column name that will be used as a primary key. | ||
* `index_type` - (required) Vector Search index type. Currently supported values are: | ||
* `DELTA_SYNC`: An index that automatically syncs with a source Delta Table, automatically and incrementally updating the index as the underlying data in the Delta Table changes. | ||
* `DIRECT_ACCESS`: An index that supports the direct read and write of vectors and metadata through our REST and SDK APIs. With this model, the user manages index updates. | ||
* `delta_sync_index_spec` - (object) Specification for Delta Sync Index. Required if `index_type` is `DELTA_SYNC`. | ||
* `source_table` (required) The name of the source table. | ||
* `embedding_source_columns` - (required if `embedding_vector_columns` isn't provided) array of objects representing columns that contain the embedding source. Each entry consists of: | ||
* `name` - The name of the column | ||
* `embedding_model_endpoint_name` - The name of the embedding model endpoint | ||
* `embedding_vector_columns` - (required if `embedding_source_columns` isn't provided) array of objects representing columns that contain the embedding vectors. Each entry consists of: | ||
* `name` - The name of the column. | ||
* `embedding_dimension` - Dimension of the embedding vector. | ||
* `pipeline_type` - Pipeline execution mode. Possible values are: | ||
* `TRIGGERED`: If the pipeline uses the triggered execution mode, the system stops processing after successfully refreshing the source table in the pipeline once, ensuring the table is updated based on the data available when the update started. | ||
* `CONTINUOUS`: If the pipeline uses continuous execution, the pipeline processes new data as it arrives in the source table to keep the vector index fresh. | ||
* `direct_access_index_spec` - (object) Specification for Direct Vector Access Index. Required if `index_type` is `DIRECT_ACCESS`. | ||
* `schema_json` - The schema of the index in JSON format. Check the [API documentation](https://docs.databricks.com/api/workspace/vectorsearchindexes/createindex#direct_access_index_spec-schema_json) for a list of supported data types. | ||
* `embedding_source_columns` - (required if `embedding_vector_columns` isn't provided) array of objects representing columns that contain the embedding source. Each entry consists of: | ||
* `name` - The name of the column | ||
* `embedding_model_endpoint_name` - The name of the embedding model endpoint | ||
* `embedding_vector_columns` - (required if `embedding_source_columns` isn't provided) array of objects representing columns that contain the embedding vectors. Each entry consists of: | ||
* `name` - The name of the column. | ||
* `embedding_dimension` - Dimension of the embedding vector. | ||
|
||
## Attribute Reference | ||
|
||
In addition to all arguments above, the following attributes are exported: | ||
|
||
* `id` - The same as the name of the index. | ||
* `creator` - Creator of the endpoint. | ||
* `delta_sync_index_spec`: | ||
* `pipeline_id` - ID of the associated Delta Live Table pipeline. | ||
* `status` - Object describing the current status of the index consisting of the following fields: | ||
* `message` - Message associated with the index status | ||
* `indexed_row_count` - Number of rows indexed | ||
* `ready` - Whether the index is ready for search | ||
* `index_url` - Index API Url to be used to perform operations on the index | ||
|
||
## Import | ||
|
||
The resource can be imported using the name of the Vector Search Index | ||
|
||
```bash | ||
terraform import databricks_vector_search_index.this <index-name> | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
package vectorsearch | ||
|
||
import ( | ||
"context" | ||
"errors" | ||
"fmt" | ||
"log" | ||
"time" | ||
|
||
"github.com/databricks/databricks-sdk-go" | ||
"github.com/databricks/terraform-provider-databricks/common" | ||
"github.com/hashicorp/terraform-plugin-sdk/v2/helper/retry" | ||
"github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema" | ||
|
||
"github.com/databricks/databricks-sdk-go/apierr" | ||
"github.com/databricks/databricks-sdk-go/service/vectorsearch" | ||
) | ||
|
||
const defaultIndexProvisionTimeout = 15 * time.Minute | ||
|
||
func waitForVectorSearchIndexDeletion(w *databricks.WorkspaceClient, ctx context.Context, searchIndexName string) error { | ||
return retry.RetryContext(ctx, defaultIndexProvisionTimeout, func() *retry.RetryError { | ||
_, err := w.VectorSearchIndexes.GetIndexByIndexName(ctx, searchIndexName) | ||
if err == nil { | ||
return retry.RetryableError(fmt.Errorf("vector search index %s is still not deleted", searchIndexName)) | ||
} | ||
if errors.Is(err, apierr.ErrResourceDoesNotExist) || errors.Is(err, apierr.ErrNotFound) { | ||
return nil | ||
} | ||
return retry.NonRetryableError(fmt.Errorf("vector search index %w", err)) | ||
}) | ||
} | ||
|
||
func waitForSearchIndexCreation(w *databricks.WorkspaceClient, ctx context.Context, searchIndexName string) error { | ||
return retry.RetryContext(ctx, defaultIndexProvisionTimeout-deleteCallTimeout, func() *retry.RetryError { | ||
index, err := w.VectorSearchIndexes.GetIndexByIndexName(ctx, searchIndexName) | ||
if err != nil { | ||
return retry.NonRetryableError(err) | ||
} | ||
if index.Status.Ready { // We really need to depend on the detailed status of the index, but it's not available in the API yet | ||
return nil | ||
} | ||
return retry.RetryableError(fmt.Errorf("vector search index %s is still pending", searchIndexName)) | ||
}) | ||
} | ||
|
||
func ResourceVectorSearchIndex() common.Resource { | ||
s := common.StructToSchema( | ||
vectorsearch.VectorIndex{}, | ||
func(s map[string]*schema.Schema) map[string]*schema.Schema { | ||
common.MustSchemaPath(s, "delta_sync_index_spec", "embedding_vector_columns").MinItems = 1 | ||
exof := []string{"delta_sync_index_spec", "direct_access_index_spec"} | ||
s["delta_sync_index_spec"].ExactlyOneOf = exof | ||
s["direct_access_index_spec"].ExactlyOneOf = exof | ||
|
||
common.CustomizeSchemaPath(s, "endpoint_name").SetRequired() | ||
common.CustomizeSchemaPath(s, "primary_key").SetRequired() | ||
common.CustomizeSchemaPath(s, "status").SetReadOnly() | ||
common.CustomizeSchemaPath(s, "creator").SetReadOnly() | ||
common.CustomizeSchemaPath(s, "name").SetRequired() | ||
common.CustomizeSchemaPath(s, "index_type").SetRequired() | ||
common.CustomizeSchemaPath(s, "delta_sync_index_spec", "pipeline_id").SetReadOnly() | ||
return s | ||
}) | ||
|
||
return common.Resource{ | ||
Create: func(ctx context.Context, d *schema.ResourceData, c *common.DatabricksClient) error { | ||
w, err := c.WorkspaceClient() | ||
if err != nil { | ||
return err | ||
} | ||
var req vectorsearch.CreateVectorIndexRequest | ||
common.DataToStructPointer(d, s, &req) | ||
_, err = w.VectorSearchIndexes.CreateIndex(ctx, req) | ||
if err != nil { | ||
return err | ||
} | ||
err = waitForSearchIndexCreation(w, ctx, req.Name) | ||
if err != nil { | ||
nestedErr := w.VectorSearchIndexes.DeleteIndexByIndexName(ctx, req.Name) | ||
if nestedErr != nil { | ||
log.Printf("[ERROR] Error cleaning up search index: %s", nestedErr.Error()) | ||
} | ||
return err | ||
} | ||
d.SetId(req.Name) | ||
return nil | ||
}, | ||
Read: func(ctx context.Context, d *schema.ResourceData, c *common.DatabricksClient) error { | ||
w, err := c.WorkspaceClient() | ||
if err != nil { | ||
return err | ||
} | ||
index, err := w.VectorSearchIndexes.GetIndexByIndexName(ctx, d.Id()) | ||
if err != nil { | ||
return err | ||
} | ||
return common.StructToData(*index, s, d) | ||
}, | ||
Delete: func(ctx context.Context, d *schema.ResourceData, c *common.DatabricksClient) error { | ||
w, err := c.WorkspaceClient() | ||
if err != nil { | ||
return err | ||
} | ||
err = w.VectorSearchIndexes.DeleteIndexByIndexName(ctx, d.Id()) | ||
if err != nil { | ||
return err | ||
} | ||
return waitForVectorSearchIndexDeletion(w, ctx, d.Id()) | ||
}, | ||
StateUpgraders: []schema.StateUpgrader{}, | ||
Schema: s, | ||
SchemaVersion: 0, | ||
Timeouts: &schema.ResourceTimeout{ | ||
Create: schema.DefaultTimeout(defaultIndexProvisionTimeout), | ||
}, | ||
} | ||
} |
Oops, something went wrong.