-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-333] Macro for warehouse selection #103
Comments
Hi @javiCalvo. Appreciate the issue submission and the PR! Just to be clear, 'warehouse' here refers to "virtual warehouse," right? Want to make sure we define our terms at the outset!
I'm hoping I read this right. Although I'm familiar with both terms, I feel like I'm missing some context as to what "wildcard" and "pre hook" refer to in this precise context. I was scratching my brain about what could possibly make this more flexible. I can't necessarily give you Bash-style wildcard matching. Nor does Jinja 2 have a built-in filter by matching by full regexes. But, you can achieve a rather flexible selection of warehouses based on specified We have a built-in feature for this. Consider the following model configuration option: models:
+snowflake_warehouse: "EXTRA_SMALL"
my_project:
clickstream:
+snowflake_warehouse: "EXTRA_LARGE" For added flexibility in selecting the warehouse, you can use some house-favorite Jinja. This lets you build something like so: +snowflake_warehouse: |
{%- if false -%} any_boolean_can_be_used
{%- elif target.name == "qa" -%} use_any_custom_target_string
{%- elif "substring" in target.name -%} substring_matching
<any boolean you can achieve with Jinja>
{%- else -%} you_even_get_a_default_option
{%- endif -%} Lastly, you can use an automatically loaded regex python module to get a true regex match: {% set my_string = 's3://example/path' %}
{% set s3_path_pattern = 's3://[a-z0-9-_/]+' %}
{% modules.re.match(s3_path_pattern, my_string, re.IGNORECASE) %}a # returns a boolean, so you can use it in the pattern above I did some tests and this value does in fact change the This model configuration tag can be set with any level of desired variance for any singular model or group of models in your project hierarchy. Therefore, from where I'm currently standing, I feel like this issue is already addressed by this built-in feature. I'm going to lean this towards wont-fix and intend to close it. But, if I've missed something along the way in your reasoning/use-case, please let me know and I reassess that tagging :) Looking forward to your followup! |
Hi @VersusFacit ! Thanks a lot for the answer.
Yes, sorry for the ambiguity I was talking about snowflake's virtual warehouse. We are using already the The improve is precisely to avoid having so much jinja code in the dbt_project, specially repeated code. We have a project which involves 12 snowflake databases and we are using around 60 different virtual warehouses splitted in 3 different environments. Example of what we have now in the dbt_project: |
I saw that #115 is still open, and then came across this discussion to read more. Everything said above is super valid! After some more reflection, and time spent looking at #115, I'm going to reopen this issue for two reasons:
@javiCalvo If this is something you'd still be interested in contributing, I'd be open to it. I'll leave a comment or two on the PR with recommendations for the implementation. |
Sure @jtcohen6 ! I'm still interested on this. I will review your PR comments and proceed that way. Thanks! |
+1 for this. For anyone looking for a workaround, we ended up using a custom macro that gets called in the model config. config(
materialized='incremental',
snowflake_warehouse=get_warehouse(size='xxl'),
...
) The macro was similar to this: {% macro get_warehouse(size=none) %}
{% if target.name not in ('prod', '...') %}
{% do return(target.warehouse) %}
{% elif size == none %}
{% do return(target.warehouse) %}
{% else %}
{% set warehouse = "transforming_prod_" ~ size | lower %}
{% do return(warehouse) %}
{% endif %}
{% endmacro %} So this enabled models to "opt in" to the warehouse size they need by calling the |
It would be awesome to have such macro, and be able to choose warehouse depending on:
|
@javiCalvo closing due to being covered by pr #503 |
Re-opening as we've reverted the PR solving it, the implementation was causing side effects. |
@Fleid as requested, im sharing my last comment in #533 to keep track of the bug: my original intentions where to setup a macro that select different warehouses depending on whether an analyst is testing dbt locally versus running automated runs on airflow. We use environment variables on our profiles.yml
My original implementation for the macro was something like this:
this was failing with the error i mentioned. In the end some very helpful person at the dbt slack helped me and suggested replacing But still, it sounds like allowing environment variables in this macro like any other macro would be useful for other use cases. Another use case that just arose at my company, we want to have different warehouses per environment, but we want a single profiles.yml (since the prod dbt runs on docker ECS but local runs for multiple analysts) if we have an environment variable to define the environment dbt is running, we can include that environment variable in the mapping function i shared above |
is there any way I can help with getting this feature developed? At my company we are at the verge of needing this feature, and I would rather use a standard process instead of using a custom one. Thanks! |
Hi @manugarri, |
@Fleid is there any update on this issue? We're currently running a full refresh on our biggest model and it reminded me that our |
Nope, we had to lower the priority to get all of the materialized view implementations through the door. |
linking to a community Slack thread in #db-snowflake |
Following as this functionality would be very beneficial |
looking forward to it! |
huge~ |
@Fleid any updates on this ? |
On our side, we just specify which warehouse to use for each run using a variable in
So if we need to do a full refresh on a large table, we just need to do: The default can also come from an env var: Bonus: no Downsides:
|
Here's another idea to make this feature much more flexible and benefit other use cases as well: Rather than creating yet another macro which is specific only to this very use case, we could instead add the ability to use macros in models:
+snowflake_warehouse: "{{ get_snowflake_warehouse() }}" However, the assumption here is that In order to overcome that, we could consider adding another modifier (maybe another models:
++snowflake_warehouse: "{{ get_snowflake_warehouse() }}" That way, the To give an even better idea of how this would work, the following would basically be the equivalent of the ability to override macros which currently exists (but without having to override anything): models:
++database: "{{ generate_database_name() }}"
++schema: "{{ generate_schema_name() }}"
++alias: "{{ generate_alias_name() }}" And what is even nicer with this approach is that we can use different overrides for different parts of the config hierarchy, whereas macro overrides are inherently global by obligation: models:
++alias: "{{ generate_alias_name() }}"
project:
special_db:
++alias: "{{ generate_special_alias_name() }}" Obviously, this would require macros to be parsed first (before the configurations for models/sources/etc.). But it's not like macros have dependencies/refs on models (it's usually the other way around). And even if they do, this feature could be limited to a special "project macros" context, just like other dbt contexts (i.e. macros which have access to less stuff, namely cannot use One challenge though is around the order of execution. Right now (override or not), |
@mroy-seedbox thanks for the input, i dont think this feature is going to happen, there's been silence for almost a year :( |
Describe the feature
It would be nice to have a macro similar to
generate_schema_name
for the selection of the snowflake warehouse to use. For example for change the warehouse depending on the environment.Describe alternatives you've considered
I tried to do a pre hook with the
use warehouse
query but don't gives the flexibility needed, for example if you use a wildcard in the name all the queries that don't use the pre hook will fail.Additional context
Please include any other relevant context here.
Who will this benefit?
Every one that wants more control and flexibility on the warehouse selected.
Are you interested in contributing this feature?
Yes, I have some idea about how to do it. I've performed some local tests and it's working. With some help as it's my first contribution I will be able to raise a PR.
The text was updated successfully, but these errors were encountered: