Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow customizing names of metadata fields in snapshots #10185

Closed
Tracked by #10151
gshank opened this issue May 20, 2024 · 12 comments · Fixed by #10608
Closed
Tracked by #10151

Allow customizing names of metadata fields in snapshots #10185

gshank opened this issue May 20, 2024 · 12 comments · Fixed by #10608
Assignees
Labels
snapshots Issues related to dbt's snapshot functionality user docs [docs.getdbt.com] Needs better documentation

Comments

@gshank
Copy link
Contributor

gshank commented May 20, 2024

From #9848

Description

There are a handful of Snapshot meta fields:

  • dbt_valid_from
  • dbt_valid_to
  • dbt_scd_id
  • dbt_updated_at

These are fields dbt creates when creating the snapshot.

Folks want the ability to rename these meta fields - “I want to rename dbt_valid_from to grace_valid_from”- because of some corporate standard

Customers currently have to create a view on top of the snapshot

select 
    dbt_valid_to as grace_valid_to,
    ...
from {{ ref('my_snapshot') }}

We should instead provide the ability to rename these meta fields within the actual snapshot configuration.

draft of YML config:

{{
    config(
      target_database='analytics',
      target_schema='snapshots',
      unique_key='id',

      strategy='timestamp',
      updated_at='updated_at',
      
      snapshot_meta_column_names={
              dbt_valid_from: grace_valid_from,
              dbt_valid_to: grace_valid_to
      }
    )
}}

Open questions/thoughts:

  • You should also be able to set a default for this at the project level
snapshots:
  my_project:
    +snapshot_meta_column_names={
      dbt_valid_from: grace_valid_from,
      dbt_valid_to: grace_valid_to
    }
  • If I have an existing snapshot, and I want to rename the fields (add this config)… what should happen?
    • [let's go with this to start] option 1: throw an error
      • the error message should indicate that they’ve have to go in and manually update their existing snapshots with the new column names
    • option 2: map old names → new names
    • option 3: don’t touch the snapshots, automatically create a view on top of it
      • this breaks our "1:1" relationship from snapshot to warehouse object :((
    • option 4: some sort of codegen that codegens the view for them
    • option 5: execute the rename for them
      • but we’d need both the old name (old config) and the new name
      • not possible :(
    • option 6: we create new fields, and coalesce with old?
      • same problems as option 5

Acceptance Critera

  • new config for snapshots to alias / override the names of the metadata fields dbt_field_name_overrides
  • can set that config for individual snapshot, or project default
  • throw an error if they have an existing snapshot and they try to add/change this config - this should error on insert if you change the meta field names, but [one idea] we could try to throw an error before running the snapshot DML by running adapters.get_columns_in_relation() on the existing snapshot and comparing the meta fields that exist in the current config to what exists in the warehouse
    • Create a follow-up issue for migration.
@graciegoheen graciegoheen added the snapshots Issues related to dbt's snapshot functionality label May 21, 2024
@graciegoheen graciegoheen added the user docs [docs.getdbt.com] Needs better documentation label May 29, 2024
@lukzard
Copy link

lukzard commented Jun 7, 2024

up vote!

this should have been considered already at implementation...

@ChenyuLInx
Copy link
Contributor

ChenyuLInx commented Jun 10, 2024

We need to think a bit about migration:

  • we need to rename the column when user change these configs
  • we only do it the first time when user does the change(maybe detect whether the snapshot query would fail).

Idea from @aranke : add comment to the dbt_valid_from field and leverage that information during migration.
https://docs.getdbt.com/reference/resource-configs/persist_docs

@graciegoheen
Copy link
Contributor

graciegoheen commented Jun 10, 2024

Migration Example:

  1. I update my config to:
snapshots:
  my_project:
    +snapshot_meta_column_names ={
      dbt_valid_from: grace_valid_from,
      dbt_valid_to: grace_valid_to
    }
  1. I run my snapshots, dbt handles my renames from dbt_valid_from -> grace_valid_from and so on for me.
  2. I update my config to:
snapshots:
  my_project:
    +snapshot_meta_column_names ={
      dbt_valid_from: gerda_valid_from,
      dbt_valid_to: gerda_valid_to
    }
  1. I run my snapshots, dbt handles my renames from grace_valid_from -> gerda_valid_from and so on for me.

@FishtownBuildBot
Copy link
Collaborator

Opened a new issue in dbt-labs/docs.getdbt.com: dbt-labs/docs.getdbt.com#6112

@GRVsati
Copy link

GRVsati commented Dec 19, 2024

{{
config(
target_database='ABC',
target_schema='snapshots',
unique_key='id',

  strategy='timestamp',
  updated_at='updated_at',
  
  snapshot_meta_column_names={
          'dbt_valid_from': 'date_eff_from',
          'dbt_valid_to': 'date_eff_to'
  }
)

}}

Running dbt-core = 1.7.18 with snowflake = 1.7.5 snapshot completed with no error but column didn't changed.

@graciegoheen
Copy link
Contributor

@GRVsati Hi! To avoid any unintentional data modification, dbt will not automatically apply any column renames. So if a user applies snapshot_meta_column_names config for a snapshot without updating the pre-existing table, they will get an error. We recommend either only using these settings for net-new snapshots, or arranging an update of pre-existing tables prior to committing a column name change.

@czasoprzestrzenny
Copy link

It does not work with spark databricks too. Even if i added in project.yml a snippet with default snapshot column names convention or to specific snapshot sql|yml file it does not work properly.
At the first time this dictionary is used and creating a renamed scd2 snapshot correctly, but always failed if i try to run dbt snapshot with exception like this.

Snapshot target is missing configured columns (missing "dbt_scd_id", "dbt_valid_from", "dbt_valid_to"). See https://docs.getdbt.com/docs/build/snapshots#snapshot-meta-fields for more information.

I guess that macro for databricks spark is not preparing a rendered sql correctly and is trying to use a default snapshot names.

This is snapshot created (running correctly only when i am invoking dbt snapshot -s first time

image

@graciegoheen
Copy link
Contributor

This is snapshot created (running correctly only when i am invoking dbt snapshot -s first time

@czasoprzestrzenny Hi - as I mentioned above, we recommend either only using these settings for net-new snapshots, or arranging an update of pre-existing tables prior to committing a column name change. You are getting this error, because dbt does not automatically manage the schema change from one column set to another.

@czasoprzestrzenny
Copy link

czasoprzestrzenny commented Jan 9, 2025

Hi @graciegoheen not sure whether i understand it correctly. I have created a snapshot from scratch with adjusted columns using a dictionary:

    snapshot_meta_column_names = {
        "dbt_valid_from": "from_date",
        "dbt_valid_to": "to_date",
        "dbt_scd_id": "scd_key",
        "dbt_updated_at": "dbt_updated_at",
        "dbt_is_deleted": "active_flag"
    }

Snapshot has been created with that adjusted dbt column names.

image

Issue is if i am trying to run snapshot second time. Query is not even sent to databrick cluster- it fails during compilation (Compilation Error in snapshot snapshot__animals_scd2_1 (snapshots/snapshot__animals_scd2_1.sql)

image

Can you advice me how to avoid that?
Thank you

@graciegoheen
Copy link
Contributor

That is strange! I'm going to open a new bug report so we can triage the issue

@czasoprzestrzenny
Copy link

czasoprzestrzenny commented Jan 9, 2025

Thank you very much @graciegoheen , if you need a more information let me know.

@peterallenwebb
Copy link
Contributor

@graciegoheen @czasoprzestrzenny I've investigate this issue and found that the issue is that the dbt-databricks adapter has not yet added support for custom meta column names. There is already an issue open against dbt-databricks: databricks/dbt-databricks#887

I will follow up with further details in that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
snapshots Issues related to dbt's snapshot functionality user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants