Merge branch 'main' into exporter/jobs-refactoring

databricks · Jan 15, 2025 · 15e2160 · 15e2160
2 parents c0cd3c9 + 1f7c015
commit 15e2160
Show file tree

Hide file tree

Showing 71 changed files with 27,302 additions and 19,480 deletions.
diff --git a/.codegen/_openapi_sha b/.codegen/_openapi_sha
@@ -1 +1 @@
-a6a317df8327c9b1e5cb59a03a42ffa2aabeef6d
+779817ed8d63031f5ea761fbd25ee84f38feec0d
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -123,7 +123,7 @@ We are migrating the resource from SDKv2 to Plugin Framework provider and hence
 2. Create a file with resource_resource-name.go and write the CRUD methods, schema for that resource. For reference, please take a look at existing resources eg: `resource_app.go`.
   - Make sure to set the user agent in all the CRUD methods.
   - In the `Metadata()`, use the method `GetDatabricksProductionName()`.
-  - In the `Schema()` method, import the appropriate struct from the `internal/service/{package}_tf` package and use the `ResourceStructToSchema` method to convert the struct to schema. Use the struct that does not have the `_SdkV2` suffix.
+  - In the `Schema()` method, import the appropriate struct from the `internal/service/{package}_tf` package and use the `ResourceStructToSchema` method to convert the struct to schema. Use the struct that does not have the `_SdkV2` suffix. The schema for the struct is automatically generated and maintained within the `ApplySchemaCustomizations` method of that struct. If you need to customize the schema further, pass in a `CustomizableSchema` to `ResourceStructToSchema` and customize the schema there. If you need to use a manually crafted struct in place of the auto-generated one, you must implement the `ApplySchemaCustomizations` method in a similar way.
 3. Create a file with `resource_resource-name_acc_test.go` and add integration tests here.
 4. Create a file with `resource_resource-name_test.go` and add unit tests here. Note: Please make sure to abstract specific method of the resource so they are unit test friendly and not testing internal part of terraform plugin framework library. You can compare the diagnostics, for example: please take a look at: `data_cluster_test.go` 
 5. Add the resource under `internal/providers/pluginfw/pluginfw.go` in `Resources()` method. Please update the list so that it stays in alphabetically sorted order.

diff --git a/docs/resources/automatic_cluster_update_setting.md b/docs/resources/automatic_cluster_update_setting.md
@@ -6,6 +6,8 @@ subcategory: "Settings"
 
 -> This resource can only be used with a workspace-level provider!
 
+~> On Azure you need to use [azurerm_databricks_workspace](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/databricks_workspace#automatic_cluster_update_enabled-1) resource to configure this setting.
+
 The `databricks_automatic_cluster_update_workspace_setting` resource allows you to control whether automatic cluster update is enabled for the current workspace. By default, it is turned off. Enabling this feature on a workspace requires that you add the Enhanced Security and Compliance add-on.
 
 ## Example Usage

diff --git a/docs/resources/cluster.md b/docs/resources/cluster.md
@@ -34,6 +34,8 @@ resource "databricks_cluster" "shared_autoscaling" {
 * `cluster_name` - (Optional) Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.
 * `spark_version` - (Required) [Runtime version](https://docs.databricks.com/runtime/index.html) of the cluster. Any supported [databricks_spark_version](../data-sources/spark_version.md) id.  We advise using [Cluster Policies](cluster_policy.md) to restrict the list of versions for simplicity while maintaining enough control.
 * `runtime_engine` - (Optional) The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include: `PHOTON`, `STANDARD`.
+* `use_ml_runtime` - (Optional, Boolean, only with `kind`) Whenever ML runtime should be selected or not.  Actual runtime is determined by `spark_version` (DBR release), this field `use_ml_runtime`, and whether `node_type_id` is GPU node or not.
+* `is_single_node` - (Optional, Boolean, only with `kind`) When set to true, Databricks will automatically set single node related `custom_tags`, `spark_conf`, and `num_workers`.
 * `driver_node_type_id` - (Optional) The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as `node_type_id` defined above.
 * `node_type_id` - (Required - optional if `instance_pool_id` is given) Any supported [databricks_node_type](../data-sources/node_type.md) id. If `instance_pool_id` is specified, this field is not needed.
 * `instance_pool_id` (Optional - required if `node_type_id` is not given) - To reduce cluster start time, you can attach a cluster to a [predefined pool of idle instances](instance_pool.md). When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to `TERMINATED`, the instances it used are returned to the pool and reused by a different cluster.
@@ -43,8 +45,12 @@ resource "databricks_cluster" "shared_autoscaling" {
 * `autotermination_minutes` - (Optional) Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to `60`.  *We highly recommend having this setting present for Interactive/BI clusters.*
 * `enable_elastic_disk` - (Optional) If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have `autotermination_minutes` and `autoscale` attributes set. More documentation available at [cluster configuration page](https://docs.databricks.com/clusters/configure.html#autoscaling-local-storage-1).
 * `enable_local_disk_encryption` - (Optional) Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. *Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.*
-* `data_security_mode` - (Optional) Select the security features of the cluster. [Unity Catalog requires](https://docs.databricks.com/data-governance/unity-catalog/compute.html#create-clusters--sql-warehouses-with-unity-catalog-access) `SINGLE_USER` or `USER_ISOLATION` mode. `LEGACY_PASSTHROUGH` for passthrough cluster and `LEGACY_TABLE_ACL` for Table ACL cluster. If omitted, default security features are enabled. To disable security features use `NONE` or legacy mode `NO_ISOLATION`. In the Databricks UI, this has been recently been renamed *Access Mode* and `USER_ISOLATION` has been renamed *Shared*, but use these terms here.
-* `single_user_name` - (Optional) The optional user name of the user to assign to an interactive cluster. This field is required when using `data_security_mode` set to `SINGLE_USER` or AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).
+* `kind` - (Optional, enum) The kind of compute described by this compute specification.  Possible values (see [API docs](https://docs.databricks.com/api/workspace/clusters/create#kind) for full list): `CLASSIC_PREVIEW` (if corresponding public preview is enabled).
+* `data_security_mode` - (Optional) Select the security features of the cluster (see [API docs](https://docs.databricks.com/api/workspace/clusters/create#data_security_mode) for full list of values). [Unity Catalog requires](https://docs.databricks.com/data-governance/unity-catalog/compute.html#create-clusters--sql-warehouses-with-unity-catalog-access) `SINGLE_USER` or `USER_ISOLATION` mode. `LEGACY_PASSTHROUGH` for passthrough cluster and `LEGACY_TABLE_ACL` for Table ACL cluster. If omitted, default security features are enabled. To disable security features use `NONE` or legacy mode `NO_ISOLATION`.  If `kind` is specified, then the following options are available:
+  * `DATA_SECURITY_MODE_AUTO`: Databricks will choose the most appropriate access mode depending on your compute configuration.
+  * `DATA_SECURITY_MODE_STANDARD`: Alias for `USER_ISOLATION`.
+  * `DATA_SECURITY_MODE_DEDICATED`: Alias for `SINGLE_USER`.
+* `single_user_name` - (Optional) The optional user name of the user (or group name if `kind` if specified) to assign to an interactive cluster. This field is required when using `data_security_mode` set to `SINGLE_USER` or AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).
 * `idempotency_token` - (Optional) An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.
 * `ssh_public_keys` - (Optional) SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.
 * `spark_env_vars` - (Optional) Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.

diff --git a/docs/resources/compliance_security_profile_setting.md b/docs/resources/compliance_security_profile_setting.md
@@ -8,6 +8,8 @@ subcategory: "Settings"
 
 ~> This setting can NOT be disabled once it is enabled.
 
+~> On Azure you need to use [azurerm_databricks_workspace](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/databricks_workspace#compliance_security_profile_enabled-1) resource to configure this setting.
+
 The `databricks_compliance_security_profile_workspace_setting` resource allows you to control whether to enable the 
 compliance security profile for the current workspace. Enabling it on a workspace is permanent. By default, it is 
 turned off. This setting can NOT be disabled once it is enabled.

diff --git a/docs/resources/enhanced_security_monitoring_setting.md b/docs/resources/enhanced_security_monitoring_setting.md
@@ -6,6 +6,9 @@ subcategory: "Settings"
 
 -> This resource can only be used with a workspace-level provider!
 
+~> On Azure you need to use [azurerm_databricks_workspace](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/databricks_workspace#enhanced_security_monitoring_enabled-1) resource to configure this setting.
+
+
 The `databricks_enhanced_security_monitoring_workspace_setting` resource allows you to control whether enhanced security monitoring 
 is enabled for the current workspace. If the compliance security profile is enabled, this is automatically enabled. By default, 
 it is disabled. However, if the compliance security profile is enabled, this is automatically enabled. If the compliance security 

diff --git a/docs/resources/permissions.md b/docs/resources/permissions.md
@@ -901,7 +901,7 @@ General Permissions API does not apply to access control for tables and they hav
 
 ## Data Access with Unity Catalog
 
-Initially in Unity Catalog all users have no access to data, which has to be later assigned through [databricks_grants](grants.md) resource.
+Initially in Unity Catalog all users have no access to data, which has to be later assigned through [databricks_grants](grants.md) or [databricks_grant](grant.md) resource.
 
 ## Argument Reference
 

diff --git a/docs/resources/sql_table.md b/docs/resources/sql_table.md
@@ -3,12 +3,14 @@ subcategory: "Unity Catalog"
 ---
 # databricks_sql_table (Resource)
 
-Within a metastore, Unity Catalog provides a 3-level namespace for organizing data: Catalogs, databases (also called schemas), and tables / views.
+Within a metastore, Unity Catalog provides a 3-level namespace for organizing data: Catalogs, databases (also called schemas), and tables/views.
 
-A `databricks_sql_table` is contained within [databricks_schema](schema.md), and can represent either a managed table, an external table or a view.
+A `databricks_sql_table` is contained within [databricks_schema](schema.md), and can represent either a managed table, an external table, or a view.
 
 This resource creates and updates the Unity Catalog table/view by executing the necessary SQL queries on a special auto-terminating cluster it would create for this operation. You could also specify a SQL warehouse or cluster for the queries to be executed on.
 
+~> This resource doesn't handle complex cases of schema evolution due to the limitations of Terraform itself.  If you need to implement schema evolution it's recommended to use specialized tools, such as, [Luquibase](https://medium.com/dbsql-sme-engineering/advanced-schema-management-on-databricks-with-liquibase-1900e9f7b9c0) and [Flyway](https://medium.com/dbsql-sme-engineering/databricks-schema-management-with-flyway-527c4a9f5d67).
+
 ## Example Usage
 
 ```hcl
@@ -153,22 +155,22 @@ resource "databricks_sql_table" "thing" {
 
 The following arguments are supported:
 
-* `name` - Name of table relative to parent catalog and schema. Change forces creation of a new resource.
-* `catalog_name` - Name of parent catalog. Change forces creation of a new resource.
-* `schema_name` - Name of parent Schema relative to parent Catalog. Change forces creation of a new resource.
-* `table_type` - Distinguishes a view vs. managed/external Table. `MANAGED`, `EXTERNAL` or `VIEW`. Change forces creation of a new resource.
+* `name` - Name of table relative to parent catalog and schema. Change forces the creation of a new resource.
+* `catalog_name` - Name of parent catalog. Change forces the creation of a new resource.
+* `schema_name` - Name of parent Schema relative to parent Catalog. Change forces the creation of a new resource.
+* `table_type` - Distinguishes a view vs. managed/external Table. `MANAGED`, `EXTERNAL`, or `VIEW`. Change forces the creation of a new resource.
 * `storage_location` - (Optional) URL of storage location for Table data (required for EXTERNAL Tables). Not supported for `VIEW` or `MANAGED` table_type.
-* `data_source_format` - (Optional) External tables are supported in multiple data source formats. The string constants identifying these formats are `DELTA`, `CSV`, `JSON`, `AVRO`, `PARQUET`, `ORC`, `TEXT`. Change forces creation of a new resource. Not supported for `MANAGED` tables or `VIEW`.
+* `data_source_format` - (Optional) External tables are supported in multiple data source formats. The string constants identifying these formats are `DELTA`, `CSV`, `JSON`, `AVRO`, `PARQUET`, `ORC`, and `TEXT`. Change forces the creation of a new resource. Not supported for `MANAGED` tables or `VIEW`.
 * `view_definition` - (Optional) SQL text defining the view (for `table_type == "VIEW"`). Not supported for `MANAGED` or `EXTERNAL` table_type.
 * `cluster_id` - (Optional) All table CRUD operations must be executed on a running cluster or SQL warehouse. If a cluster_id is specified, it will be used to execute SQL commands to manage this table. If empty, a cluster will be created automatically with the name `terraform-sql-table`.
 * `warehouse_id` - (Optional) All table CRUD operations must be executed on a running cluster or SQL warehouse. If a `warehouse_id` is specified, that SQL warehouse will be used to execute SQL commands to manage this table. Conflicts with `cluster_id`.
 * `cluster_keys` - (Optional) a subset of columns to liquid cluster the table by. Conflicts with `partitions`.
-* `storage_credential_name` - (Optional) For EXTERNAL Tables only: the name of storage credential to use. Change forces creation of a new resource.
-* `owner` - (Optional) Username/groupname/sp application_id of the schema owner.
-* `comment` - (Optional) User-supplied free-form text. Changing comment is not currently supported on `VIEW` table_type.
+* `storage_credential_name` - (Optional) For EXTERNAL Tables only: the name of storage credential to use. Change forces the creation of a new resource.
+* `owner` - (Optional) User name/group name/sp application_id of the schema owner.
+* `comment` - (Optional) User-supplied free-form text. Changing the comment is not currently supported on the `VIEW` table type.
 * `options` - (Optional) Map of user defined table options. Change forces creation of a new resource.
-* `properties` - (Optional) Map of table properties.
-* `partitions` - (Optional) a subset of columns to partition the table by. Change forces creation of a new resource. Conflicts with `cluster_keys`. Change forces creation of a new resource.
+* `properties` - (Optional) A map of table properties.
+* `partitions` - (Optional) a subset of columns to partition the table by. Change forces the creation of a new resource. Conflicts with `cluster_keys`. Change forces creation of a new resource.
 
 ### `column` configuration block
 
@@ -177,15 +179,15 @@ Currently, changing the column definitions for a table will require dropping and
 
 * `name` - User-visible name of column
 * `type` - Column type spec (with metadata) as SQL text. Not supported for `VIEW` table_type.
-* `identity` - (Optional) Whether field is an identity column. Can be `default`, `always` or unset. It is unset by default.
+* `identity` - (Optional) Whether the field is an identity column. Can be `default`, `always`, or unset. It is unset by default.
 * `comment` - (Optional) User-supplied free-form text.
 * `nullable` - (Optional) Whether field is nullable (Default: `true`)
 
 ## Attribute Reference
 
-In addition to all arguments above, the following attributes are exported:
+In addition to all the arguments above, the following attributes are exported:
 
-* `id` - ID of this table in form of `<catalog_name>.<schema_name>.<name>`.
+* `id` - ID of this table in the form of `<catalog_name>.<schema_name>.<name>`.
 
 ## Import
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		a6a317df8327c9b1e5cb59a03a42ffa2aabeef6d
		779817ed8d63031f5ea761fbd25ee84f38feec0d