Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot Config snapshot_meta_column_names Returns a Compilation Error on the Second dbt snapshot Invocation #887

Open
migueldichoso opened this issue Dec 19, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@migueldichoso
Copy link

migueldichoso commented Dec 19, 2024

Describe the bug

The dbt documentation (snapshot_meta_column_names) describes the snapshot_meta_column_names config to be used to customize the names of the metadata columns within each snapshot available for dbt=1.9.

In databricks=1.9.1, the snapshot config snapshot_meta_column_names completes successfully on first dbt snapshot invocation.

On the second dbt snapshot invocation, it will return a compilation error like below.

15:22:49    Compilation Error in snapshot customers_snapshot (snapshots/customers_snapshot.yml)
  Snapshot target is missing configured columns (missing "dbt_scd_id", "dbt_valid_from", "dbt_valid_to"). See https://docs.getdbt.com/docs/build/snapshots#snapshot-meta-fields for more information.
  
  > in macro materialization_snapshot_databricks (macros/materializations/snapshot.sql)
  > called by snapshot customers_snapshot (snapshots/customers_snapshot.yml)
15:22:49

Steps To Reproduce

  1. Create a snapshot file following the documentation (https://docs.getdbt.com/docs/build/snapshots). For example, in this case, it will be customers_snapshot.yml
snapshots:
  - name: customers_snapshot
    relation: ref('stg_customers')
    config:
      strategy: check
      unique_key: [id]
      check_cols: [id]
      snapshot_meta_column_names:
        dbt_valid_from: _scd_valid_from
        dbt_valid_to: _scd_valid_to
        dbt_scd_id: _scd_id
        dbt_updated_at: _scd_updated_att
        dbt_is_deleted: _scd_is_deleted
  1. Run dbt snapshot . In this case, I run dbt snapshot --select customers_snapshot --log-level debug.
15:18:48  Using databricks connection "snapshot.dbt_fundamentals_course_project.customers_snapshot"
15:18:48  On snapshot.dbt_fundamentals_course_project.customers_snapshot: /* 
{
    "app": "dbt",
    "dbt_version": "1.9.1",
    "dbt_databricks_version": "1.9.1",
    "databricks_sql_connector_version": "3.6.0",
    "profile_name": "default",
    "target_name": "mdichoso_databricks",
    "node_id": "snapshot.dbt_fundamentals_course_project.customers_snapshot"
} */
    
        create or replace table `miguel_catalog`.`dbt_mdichoso`.`customers_snapshot`
      
      using delta

      as
      
    select *,
        md5(coalesce(cast(id as string ), '')
         || '|' || coalesce(cast(
    current_timestamp()
 as string ), '')
        ) as _scd_id,
        
    current_timestamp()
 as _scd_updated_att,
        
    current_timestamp()
 as _scd_valid_from,
        
  
  coalesce(nullif(
    current_timestamp()
, 
    current_timestamp()
), null)
  as _scd_valid_to
from (
        select * from `miguel_catalog`.`dbt_mdichoso`.`stg_customers`
    ) sbq
 
15:18:48  Databricks adapter: Cursor(session-id=01efbe1c-8ad2-1a2f-b885-e953da2139cc, command-id=Unknown) - Created cursor
15:18:51  SQL status: OK in 2.990 seconds
...
15:18:52  Completed successfully
15:18:52  
15:18:52  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

See attached log for the first run.

  1. Run another dbt snapshot . The second run should error out.
15:22:49  Completed with 1 error, 0 partial successes, and 0 warnings:
15:22:49  
15:22:49    Compilation Error in snapshot customers_snapshot (snapshots/customers_snapshot.yml)
  Snapshot target is missing configured columns (missing "dbt_scd_id", "dbt_valid_from", "dbt_valid_to"). See https://docs.getdbt.com/docs/build/snapshots#snapshot-meta-fields for more information.
  
  > in macro materialization_snapshot_databricks (macros/materializations/snapshot.sql)
  > called by snapshot customers_snapshot (snapshots/customers_snapshot.yml)
15:22:49  
15:22:49  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

See attached log for the second run.

Expected behavior

The second dbt snapshot run should return successfully. No compilation error due to missing configured columns.

Screenshots and log output

  1. First run dbt snapshot
image image

dbt_snapshot_1_databricks.log

  1. Second run dbt snapshot.
    dbt_snapshot_2_databricks.log

System information

The output of dbt --version:

(core_databricks) core_snowflakemigueldichoso@Miguel-Dichoso Fundamentals % dbt --version
Core:
  - installed: 1.9.1
  - latest:    1.9.1 - Up to date!

Plugins:
  - databricks: 1.9.1 - Up to date!
  - spark:      1.9.0 - Up to date!

The operating system you're using: macOS Sequoia Version 15.2
The output of python --version: Python 3.12.7

Additional context

The dbt-snowflake adapter completes successfully for both dbt snapshot run. See below logs.
dbt_snapshot_1_snowflake.log
dbt_snapshot_2_snowflake.log

@migueldichoso migueldichoso added the bug Something isn't working label Dec 19, 2024
@benc-db
Copy link
Collaborator

benc-db commented Dec 19, 2024

Not certain this will help, but since we're on code freeze until Jan 6, it's the best suggestion I have:

snapshots:
  - name: customers_snapshot
    relation: ref('stg_customers')
    config:
      strategy: check
      unique_key: [id]
      check_cols: [id]
      snapshot_meta_column_names:
        dbt_valid_from: "`_scd_valid_from`"
        dbt_valid_to: "`_scd_valid_to`"
        dbt_scd_id: "`_scd_id`"
        dbt_updated_at: "`_scd_updated_att`"
        dbt_is_deleted: "`_scd_is_deleted`"

I've been moving to using backticks more liberally because there are places I don't have access to the 'quote' attribute from columns, but I need to quote columns in order to support names with otherwise unaccepted symbols. In dbt, they are doing exact match, so my hope is that this will pass the check.

@peterallenwebb
Copy link
Contributor

Other dbt-core users on dbt-datbricks have also reported this issue, as you can see in the cases referenced just above.

At root, this is happening because dbt-databricks has not yet been modified to account for recent changes to snapshots. Compare the implementation in dbt-databricks (here) with the corresponding implementation in the base adapter (here).

Note that dbt-databricks is using an older version of the snapshot validation function, and does not pass the column dictionary mapping meta columns to custom names. Resolving this issue will require using the newer function, and reviewing dbt-databricks' implementation against the recent snapshot changes to ensure that meta columns are handle correctly elsewhere as well.

@Jasper-Ma
Copy link

Jasper-Ma commented Jan 17, 2025

I've replaced this line with this:

{% set columns = config.get("snapshot_meta_column_names") or get_snapshot_table_column_names() %}
      {{ adapter.valid_snapshot_target(target_relation, columns) }}
  • To get the checked meta_column_names from the config or to use the default meta_column_names.
  • Then validate the snapshot_target with the columns, the valid_snapshot_target accepts an optional columns parameter.

I believe this issue will be resolved once this MR gets merged. (thanks @benc-db)

@benc-db
Copy link
Collaborator

benc-db commented Jan 17, 2025

Yeah, I missed that 1.9 final from dbt included these new snapshot behaviors, so the PR from yesterday is to remedy that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants