-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-339] Persist dbt tags to Snowflake using object tagging feature #104
Comments
Hiya 👋 @dweaver33 Thanks for submitting an issue, especially with a link to the docs -- helps by making sure we're both talking about the same thing 👍 I totally appreciate the want to minimize the surface area where errors may be introduced. DRY code principles are espoused so often for a reason--a darn good reason. However, my major concern with this feature isn't a demonstrated use case or lack of principle but rather that tags in dbt seem to be semantically different from tags in Snowflake. As you can see in our docs, dbt tags are string values that can be assigned to objects in lists. But, as seen in those docs you linked, Snowflake tags
dbt tags do not support that key-value pair functionality. They are a different kind of metadata mechanism (one can argue whether snowflake's or dbt's are truly more or less useful but that's another design discussion...). Therefore, to tie your dbt tags to your Snowflake tags is to make a deliberate design choice, namely to restrict your own ability to take advantage of Snowflake's featureset. That's a perfectly reasonable thing to do if your team decides that's in its best interest. That said, having that built-in to dbt-snowflake would be to force/encourage other users into this paradigm (i.e. where they may use Seeing as we don't have evidence is a widespread requested feature, and it doesn't seem to support different database management approaches other enterprise users may have, I think the best solution would be a macro as opposed to a built-in feature, as you said! |
@dweaver33 did you end up building a macro for this? Combined with tag-based masking policies, this would be a really great workflow |
@stkbailey , unfortunately I did not. |
I realise this is a closed issue, but I'm looking at this same thing and wanted to propose an alternative. Firstly to broaden this a little, BigQuery has an equivalent key-value pair concept called "labels". I also wouldn't be surprised to see something like this appear in Databrick's upcoming Unity Catalog. Redshift and Azure both have resource tagging, but they appear to relate to surrounding infrastructure rather than database objects so we can leave them out for now. The subtle differences between BigQuery and Snowflake are:
I want to propose that instead of leveraging dbt tags, we use a convention within the "meta" config element named "database_tags". Within that, you can define name-value tags at the table/view level and at the individual column level (Snowflake only).
I have a prototype of a macro which applies these tags during on-run-end, currently it's Snowflake-only but could be extended to BigQuery. Planning to publish it within a new package, unless anyone can think of a better place (dbt-utils?). I don't think it belongs in core, given it's a convention that not everyone will agree with. Happy to open a new issue with the different scope, just wanted to gauge people's reaction first. |
The other option is to use delimiters to shoehorn the key-value pairs into the tags, plus some indicator that it's a tag intended for the database. |
@jamesweakley, I would be very interested in seeing this macro. We are a snowflake shop. |
This is an incredibly rough draft, still some bugs but it should work as a POC.
|
Another snowflake shop here, and was looking for a way to make tag-based masking policies applied at dbt-runtime. |
@jamesweakley Thank you for sharing! I was wondering if this macro has published as a Package? The macro that you shared is it the final version? or since August there have been modifications to this? |
@mahsa-elemy what I linked is the extent of it at the moment, and I didn't end up putting it in a package yet. @MartinGuindon, would you be happy to have this macro in dbt-snowflake-utils? I'm happy to open a PR if you agree |
New to the chat here, was directed to this thread from our snowflake rep. @jamesweakley Appreciate the PoC and would love to see this as a macro in dbt-snowflake-utils. I did have to tweak the model_alias variable to include the schema. From a use case - I have tested the tagging through model configs, but would love to see the column based meta tags happen on .yml files next to the documentation and testing lines. If that can already be done any help on the syntax would be great! Still learning advanced usage. |
Sure. That package is in dire need of love, this would be the opportunity for us to spend some time on it! :) |
In the PR above, I switched it to use yml, check out the readme. I agree it looks nicer and better to manage, and allowed me to get rid of a level of nesting. |
I also think this is very interesting. Since, according to my information, tag-based masking has also been possible in Snowflake since yesterday (GA), I would find this extremely helpful for a holistic development workflow including the incorporation of data governance. |
We are working on getting tag-based masking in Snowflake using dbt, so also very keen on hearing from anyone trying this out. |
For those looking for a quick summary:
|
I don't think that this macro is a sufficient solution. It applies all tags in the on-run-end hook, which might cause 2-10 mins gaps (in some of our cases) between creating a populated table, and creating a tag. This is true for non-incremental only, but still is a deal-breaker for using the tag-based masking policies. We cannot have our data masked "some time" after inserting, it should be transactional. |
Hi folks. We've developed a dbt package for tag-based masking policy in Snowflake. Check it out here |
Describe the feature
When tags are defined for various objects (tables, columns, etc), allow the option to persist these tags down to Snowflake using Snowflake's object tagging feature.
Describe alternatives you've considered
I've considered writing a macro to do this, but development time would prefer something more built-in.
Additional context
Tags are used in Snowflake to apply security and masking. Having to maintain tags in two different areas is cumbersome and prone to error.
Who will this benefit?
Tags are useful in dbt for a number of reasons already known. Persisting them into Snowflake would allow users to utilize Snowflake's security methods and prevent having to define tags in two different areas.
Are you interested in contributing this feature?
I can certainly contribute expertise and technical assistance.
The text was updated successfully, but these errors were encountered: