Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLOPS-596] Add Lakehouse Monitor Resource #3238

Merged
merged 28 commits into from
Mar 7, 2024

Conversation

aravind-segu
Copy link
Contributor

@aravind-segu aravind-segu commented Feb 9, 2024

Changes

Adds Lakehouse Monitor Resource to the Terraform Provider

Tests

  • make test run locally
  • relevant change in docs/ folder
  • covered with integration tests in internal/acceptance
  • relevant acceptance tests are passing
  • using Go SDK

@aravind-segu aravind-segu requested review from a team as code owners February 9, 2024 10:30
@aravind-segu aravind-segu requested review from tanmay-db and removed request for a team February 9, 2024 10:30
@codecov-commenter
Copy link

codecov-commenter commented Feb 9, 2024

Codecov Report

Attention: Patch coverage is 78.57143% with 15 lines in your changes are missing coverage. Please review.

Project coverage is 83.47%. Comparing base (d4812c5) to head (ad9481d).
Report is 4 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #3238   +/-   ##
=======================================
  Coverage   83.46%   83.47%           
=======================================
  Files         174      176    +2     
  Lines       16067    16165   +98     
=======================================
+ Hits        13410    13493   +83     
- Misses       1845     1854    +9     
- Partials      812      818    +6     
Files Coverage Δ
provider/provider.go 94.65% <100.00%> (+0.02%) ⬆️
catalog/resource_lakehouse_monitor.go 78.26% <78.26%> (ø)

... and 2 files with indirect coverage changes

monitoring/resourse_lakehouse_monitor.go Outdated Show resolved Hide resolved
monitoring/resourse_lakehouse_monitor.go Outdated Show resolved Hide resolved
monitoring/resourse_lakehouse_monitor.go Outdated Show resolved Hide resolved
monitoring/resourse_lakehouse_monitor.go Outdated Show resolved Hide resolved
@aravind-segu aravind-segu requested a review from alexott February 14, 2024 00:04
@alexott
Copy link
Contributor

alexott commented Feb 14, 2024

@aravind-segu integration tests are passing (at least on AWS)

Copy link
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contributing this resource! We do need to make some adjustments to improve the long-term maintainability story, and they may necessitate deeper changes. If it isn't possible to do this right from the start, we need to prioritize the work needed for this to be properly supported.

monitoring/resource_lakehouse_monitor_test.go Outdated Show resolved Hide resolved
monitoring/resource_lakehouse_monitor_test.go Outdated Show resolved Hide resolved
monitoring/resource_lakehouse_monitor_test.go Outdated Show resolved Hide resolved
monitoring/resourse_lakehouse_monitor.go Outdated Show resolved Hide resolved
monitoring/resourse_lakehouse_monitor.go Outdated Show resolved Hide resolved
monitoring/resourse_lakehouse_monitor.go Outdated Show resolved Hide resolved
monitoring/resourse_lakehouse_monitor.go Outdated Show resolved Hide resolved
internal/acceptance/lakehouse_monitor_test.go Outdated Show resolved Hide resolved
internal/acceptance/lakehouse_monitor_test.go Show resolved Hide resolved
monitoring/resourse_lakehouse_monitor.go Outdated Show resolved Hide resolved
monitoring/resourse_lakehouse_monitor.go Outdated Show resolved Hide resolved
monitoring/resourse_lakehouse_monitor.go Outdated Show resolved Hide resolved
monitoring/resourse_lakehouse_monitor.go Outdated Show resolved Hide resolved
monitoring/resourse_lakehouse_monitor.go Outdated Show resolved Hide resolved
@aravind-segu aravind-segu requested a review from alexott February 27, 2024 10:45
@alexott
Copy link
Contributor

alexott commented Feb 29, 2024

Integration test is still failing:

    init_test.go:236: Step 1/2 error: After applying this test step, the plan was not empty.
        stdout:
        
        
        Terraform used the selected providers to generate the following execution
        plan. Resource actions are indicated with the following symbols:
          ~ update in-place
        
        Terraform will perform the following actions:
        
          # databricks_lakehouse_monitor.testMonitorInference will be updated in-place
          ~ resource "databricks_lakehouse_monitor" "testMonitorInference" {
              - drift_metrics_table_name   = "sandbox$tdkkfakkkfadj.things$tdkkfakkkfadj.bar$tdkkfakkkfadj_inference_drift_metrics" -> null
                id                         = "sandbox$tdkkfakkkfadj.things$tdkkfakkkfadj.bar$tdkkfakkkfadj_inference"
              - monitor_version            = "0" -> null
              - profile_metrics_table_name = "sandbox$tdkkfakkkfadj.things$tdkkfakkkfadj.bar$tdkkfakkkfadj_inference_profile_metrics" -> null
              - status                     = "MONITOR_STATUS_PENDING" -> null
                # (3 unchanged attributes hidden)
        
                # (1 unchanged block hidden)
            }
        
        Plan: 0 to add, 1 to change, 0 to destroy.

@alexott
Copy link
Contributor

alexott commented Feb 29, 2024

after fixing computed fields, another error in update test:

    init_test.go:236: Step 2/2 error: Error running apply: exit status 1
        
        Error: cannot update lakehouse monitor: Data Monitor 'sandbox$tijgeledgkhhb.things$tejcfjihaeblg.bar$tejcfjihaeblg_inference' does not exist.
        
          with databricks_lakehouse_monitor.testMonitorInference,
          on terraform_plugin_test.tf line 36, in resource "databricks_lakehouse_monitor" "testMonitorInference":
          36:         resource "databricks_lakehouse_monitor" "testMonitorInference" {
        

@aravind-segu aravind-segu requested a review from mgyucht March 4, 2024 22:28
@aravind-segu
Copy link
Contributor Author

Copy link
Contributor

@alexott alexott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small changes are still required in the code.

Plus there is no documentation yet

create.FullName = d.Get("table_name").(string)

endpoint, err := w.LakehouseMonitors.Create(ctx, create)
WaitForMonitor(w, ctx, create.FullName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to put

			if err != nil {
				return err
			}

before this line, and then have this line as:

err = WaitForMonitor(w, ctx, create.FullName)

Otherwise we don't capture wait errors

Comment on lines 89 to 93
err = common.StructToData(endpoint, monitorSchema, d)
if err != nil {
return err
}
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just rewrite these lines as

return common.StructToData(endpoint, monitorSchema, d)

@aravind-segu aravind-segu requested a review from alexott March 5, 2024 20:48
Copy link
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two main comments:

  1. Waiters are supported in the SDK by default, you just need to annotate your API appropriately. I'll send you the link to this offline.
  2. How exactly is table_name supposed to work? It seems like it is a required field but it is not in the Create or Update requests.

catalog/resource_lakehouse_monitor.go Show resolved Hide resolved
catalog/resource_lakehouse_monitor.go Show resolved Hide resolved
catalog/resource_lakehouse_monitor.go Show resolved Hide resolved
catalog/resource_lakehouse_monitor.go Show resolved Hide resolved
Copy link
Contributor

@alexott alexott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good - we need to decide about readiness waiting - should we wait for new Go SDK, or implement it later

catalog/resource_lakehouse_monitor.go Show resolved Hide resolved
catalog/resource_lakehouse_monitor.go Show resolved Hide resolved
docs/resources/lakehouse_monitor.md Show resolved Hide resolved
Copy link
Contributor

@alexott alexott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general good, pending decision on if we should wait for OpenAPI spec changes for wait command

Comment on lines 118 to 123
### Computed Fields
* `monitor_version` - The version of the monitor config (e.g. 1,2,3). If negative, the monitor may be corrupted
* `drift_metrics_table_name` - The full name of the drift metrics table. Format: __catalog_name__.__schema_name__.__table_name__.
* `profile_metrics_table_name` - The full name of the profile metrics table. Format: __catalog_name__.__schema_name__.__table_name__.
* `status` - Status of the Monitor
* `dashboard_id` - The ID of the generated dashboard.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit - it's better to move it to the Attribute Reference section

@aravind-segu
Copy link
Contributor Author

in general good, pending decision on if we should wait for OpenAPI spec changes for wait command

I checked with Miles, and he is ok to push this for now. I will wait for his approval as well

Copy link
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, provided you address @alexott's comment to move computed fields under the Attribute Reference section of the doc.

catalog/resource_lakehouse_monitor.go Show resolved Hide resolved
catalog/resource_lakehouse_monitor.go Show resolved Hide resolved
docs/resources/lakehouse_monitor.md Outdated Show resolved Hide resolved
docs/resources/lakehouse_monitor.md Outdated Show resolved Hide resolved
docs/resources/lakehouse_monitor.md Outdated Show resolved Hide resolved
docs/resources/lakehouse_monitor.md Show resolved Hide resolved
}

resource "databricks_lakehouse_monitor" "testTimeseriesMonitor" {
table_name = "${databricks_catalog.sandbox.name}.${databricks_schema.things.name}.${databricks_table.myTestTable.name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I wonder if this is a bit too much, we ended up separating this in a UC Model into 3 parameters for this reason: https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/registered_model

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to their python api docs (https://api-docs.databricks.com/python/lakehouse-monitoring/latest/databricks.lakehouse_monitoring.html#databricks.lakehouse_monitoring.create_monitor) , it can be {catalog}.{schema}.{table} or {schema}.{table} or {table} and the api fills in the current catalog or schema. So we technically dont need to split it up and force users to fill in all three fields. I think this should be the recommendation to fill in all three, but if the user reads other docs, they should be able to just use all three of the options

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, so if someone doesn't specify schema and catalog via Terraform though, what would be the "current" catalog and schema it would infer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in for databricks_sql_table you always specify catalog & schema and id will be three-level name

table_name = "${databricks_catalog.sandbox.name}.${databricks_schema.things.name}.${databricks_table.myTestTable.name}"
assets_dir = "/Shared/provider-test/databricks_lakehouse_monitoring/${databricks_table.myTestTable.name}"
output_schema_name = "${databricks_catalog.sandbox.name}.${databricks_schema.things.name}"
snapshot {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a bit weird syntax, would it be better to expose this as a boolean instead and then convert in the resource implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We debated about this. We changed the Go SDK to use an empty struct in place of any. This is also the the example expected in the Go SDK, so thought it would be clear. Miles also did not want too many changes between Go SDK structs and the terraform input structs.

Also in the future if there are any snapshot relevant parameters the team introduces, we dont need additional changes in the terraform provider as we are already using the struct.


* `table_name` - (Required) - The full name of the table to attach the monitor too. Its of the format {catalog}.{schema}.{tableName}
* `assets_dir` - (Required) - The directory to store the monitoring assets (Eg. Dashboard and Metric Tables)
* `output_schema_name` - (Required) - Schema where output metric tables are created
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clarify that it needs to be catalog.schema

}
}

resource "databricks_lakehouse_monitor" "testTimeseriesMonitor" {
table_name = "${databricks_catalog.sandbox.name}.${databricks_schema.things.name}.${databricks_table.myTestTable.name}"
assets_dir = "/Shared/provider-test/databricks_lakehouse_monitoring/${databricks_table.myTestTable.name}"
table_name = "${databricks_catalog.sandbox.name}.${databricks_schema.things.name}.${databricks_sql_table.myTestTable.name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be simplified to databricks_sql_table.myTestTable.id now: https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/sql_table#id

### Snapshot Monitor
```hcl
resource "databricks_lakehouse_monitor" "testMonitorInference" {
table_name = "${databricks_catalog.sandbox.name}.${databricks_schema.things.name}.${databricks_table.myTestTable.name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@alexott alexott added this pull request to the merge queue Mar 7, 2024
Merged via the queue into databricks:main with commit 57ee88b Mar 7, 2024
5 checks passed
@samuhepp
Copy link

Hey! Will this be included in the next release? If so, when is the approximate target date for that? Keen to use this instead of the notebook approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants