-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MLOPS-596] Add Lakehouse Monitor Resource #3238
Conversation
b1f910f
to
6891b57
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3238 +/- ##
=======================================
Coverage 83.46% 83.47%
=======================================
Files 174 176 +2
Lines 16067 16165 +98
=======================================
+ Hits 13410 13493 +83
- Misses 1845 1854 +9
- Partials 812 818 +6
|
@aravind-segu integration tests are passing (at least on AWS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for contributing this resource! We do need to make some adjustments to improve the long-term maintainability story, and they may necessitate deeper changes. If it isn't possible to do this right from the start, we need to prioritize the work needed for this to be properly supported.
Integration test is still failing:
|
after fixing computed fields, another error in update test:
|
996f913
to
589ad01
Compare
Integration tests are passing: https://github.com/databricks-eng/eng-dev-ecosystem/actions/runs/8147735527 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small changes are still required in the code.
Plus there is no documentation yet
create.FullName = d.Get("table_name").(string) | ||
|
||
endpoint, err := w.LakehouseMonitors.Create(ctx, create) | ||
WaitForMonitor(w, ctx, create.FullName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to put
if err != nil {
return err
}
before this line, and then have this line as:
err = WaitForMonitor(w, ctx, create.FullName)
Otherwise we don't capture wait errors
err = common.StructToData(endpoint, monitorSchema, d) | ||
if err != nil { | ||
return err | ||
} | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just rewrite these lines as
return common.StructToData(endpoint, monitorSchema, d)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two main comments:
- Waiters are supported in the SDK by default, you just need to annotate your API appropriately. I'll send you the link to this offline.
- How exactly is table_name supposed to work? It seems like it is a required field but it is not in the Create or Update requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good - we need to decide about readiness waiting - should we wait for new Go SDK, or implement it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general good, pending decision on if we should wait for OpenAPI spec changes for wait command
docs/resources/lakehouse_monitor.md
Outdated
### Computed Fields | ||
* `monitor_version` - The version of the monitor config (e.g. 1,2,3). If negative, the monitor may be corrupted | ||
* `drift_metrics_table_name` - The full name of the drift metrics table. Format: __catalog_name__.__schema_name__.__table_name__. | ||
* `profile_metrics_table_name` - The full name of the profile metrics table. Format: __catalog_name__.__schema_name__.__table_name__. | ||
* `status` - Status of the Monitor | ||
* `dashboard_id` - The ID of the generated dashboard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nit - it's better to move it to the Attribute Reference
section
I checked with Miles, and he is ok to push this for now. I will wait for his approval as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, provided you address @alexott's comment to move computed fields under the Attribute Reference section of the doc.
docs/resources/lakehouse_monitor.md
Outdated
} | ||
|
||
resource "databricks_lakehouse_monitor" "testTimeseriesMonitor" { | ||
table_name = "${databricks_catalog.sandbox.name}.${databricks_schema.things.name}.${databricks_table.myTestTable.name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I wonder if this is a bit too much, we ended up separating this in a UC Model into 3 parameters for this reason: https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/registered_model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to their python api docs (https://api-docs.databricks.com/python/lakehouse-monitoring/latest/databricks.lakehouse_monitoring.html#databricks.lakehouse_monitoring.create_monitor) , it can be {catalog}.{schema}.{table}
or {schema}.{table}
or {table}
and the api fills in the current catalog or schema. So we technically dont need to split it up and force users to fill in all three fields. I think this should be the recommendation to fill in all three, but if the user reads other docs, they should be able to just use all three of the options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, so if someone doesn't specify schema and catalog via Terraform though, what would be the "current" catalog and schema it would infer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, in for databricks_sql_table
you always specify catalog & schema and id
will be three-level name
table_name = "${databricks_catalog.sandbox.name}.${databricks_schema.things.name}.${databricks_table.myTestTable.name}" | ||
assets_dir = "/Shared/provider-test/databricks_lakehouse_monitoring/${databricks_table.myTestTable.name}" | ||
output_schema_name = "${databricks_catalog.sandbox.name}.${databricks_schema.things.name}" | ||
snapshot {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a bit weird syntax, would it be better to expose this as a boolean instead and then convert in the resource implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We debated about this. We changed the Go SDK to use an empty struct in place of any
. This is also the the example expected in the Go SDK, so thought it would be clear. Miles also did not want too many changes between Go SDK structs and the terraform input structs.
Also in the future if there are any snapshot relevant parameters the team introduces, we dont need additional changes in the terraform provider as we are already using the struct.
|
||
* `table_name` - (Required) - The full name of the table to attach the monitor too. Its of the format {catalog}.{schema}.{tableName} | ||
* `assets_dir` - (Required) - The directory to store the monitoring assets (Eg. Dashboard and Metric Tables) | ||
* `output_schema_name` - (Required) - Schema where output metric tables are created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's clarify that it needs to be catalog.schema
} | ||
} | ||
|
||
resource "databricks_lakehouse_monitor" "testTimeseriesMonitor" { | ||
table_name = "${databricks_catalog.sandbox.name}.${databricks_schema.things.name}.${databricks_table.myTestTable.name}" | ||
assets_dir = "/Shared/provider-test/databricks_lakehouse_monitoring/${databricks_table.myTestTable.name}" | ||
table_name = "${databricks_catalog.sandbox.name}.${databricks_schema.things.name}.${databricks_sql_table.myTestTable.name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be simplified to databricks_sql_table.myTestTable.id
now: https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/sql_table#id
### Snapshot Monitor | ||
```hcl | ||
resource "databricks_lakehouse_monitor" "testMonitorInference" { | ||
table_name = "${databricks_catalog.sandbox.name}.${databricks_schema.things.name}.${databricks_table.myTestTable.name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
Hey! Will this be included in the next release? If so, when is the approximate target date for that? Keen to use this instead of the notebook approach. |
Changes
Adds Lakehouse Monitor Resource to the Terraform Provider
Tests
make test
run locallydocs/
folderinternal/acceptance