Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] parallel create table statements #65

Closed
geoHeil opened this issue Jan 17, 2023 · 17 comments
Closed

[Feature] parallel create table statements #65

geoHeil opened this issue Jan 17, 2023 · 17 comments
Assignees
Labels
enhancement New feature or request

Comments

@geoHeil
Copy link

geoHeil commented Jan 17, 2023

Describe the Feature

How can I specify parallel hints for the CREATE TABLE (default materializations) which dbt-oracle is creating? If these are not yet part of dbt-oracle - do I need to run a full-blown custom materialization or is there an easier path forward?

How about specifying options on CREATE TABLE such as compression? I.e. : compress for query high?

Describe alternatives you've considered

No response

Who will this benefit?

No response

Anything else?

No response

@geoHeil geoHeil added the enhancement New feature or request label Jan 17, 2023
@aosingh aosingh self-assigned this Jan 17, 2023
@geoHeil
Copy link
Author

geoHeil commented Jan 19, 2023

I am currently using oracle_create_table_as

{% macro oracle__create_table_as(temporary, relation, sql) -%}
  {%- set sql_header = config.get('sql_header', none) -%}
  {%- set parallel = config.get('parallel', none) -%}

  {{ sql_header if sql_header is not none }}

  create {% if temporary: -%}temporary{%- endif %} table
    {{ relation.include() }}

    {% if parallel: %}parallel {{ parallel }} compress for query high {% endif %}
  as 
    {{ sql }}

{%- endmacro %}

@aosingh
Copy link
Member

aosingh commented Jan 19, 2023

@geoHeil

I have added support for parallel hint, table compression and fix for using alias (identifier too long for Oracle 11)

You can use parallel and table_compression_clause config parameters

{{config(materialized='table', parallel=4, table_compression_clause='COLUMN STORE COMPRESS FOR QUERY')}}

select c.cust_id, c.cust_first_name, c.cust_last_name, t.country_iso_code, t.country_name, t.country_region
from {{ ref('sales_internet_channel') }} s, {{ source('sh_database', 'countries') }} t, {{ source('sh_database', 'customers') }} c
WHERE s.cust_id = c.cust_id
AND c.country_id = t.country_id

or simpler COMPRESS clause

{{config(materialized='table', parallel=4, table_compression_clause='COMPRESS')}}

Could you test it with the first release cut uploaded to PyPI ?

pip install dbt-oracle==1.3.2rc1

Let me know your feedback.

@geoHeil
Copy link
Author

geoHeil commented Jan 21, 2023

create table schema.table
    parallel 8
    COLUMN STORE COMPRESS FOR QUERY
as
WITH foo AS (SELECT 1 as bar from dual)
select *
from foo
;

This fails with:

42000][905] ORA-00905: Schlüsselwort fehlt Position: 65

The invalid SQL code above is created by the current RC1

@geoHeil
Copy link
Author

geoHeil commented Jan 21, 2023

It looks like the other bug for the length is fixed though!

@geoHeil
Copy link
Author

geoHeil commented Jan 21, 2023

But the reason is the compression.
I think you need to modify the example from: COLUMN STORE COMPRESS FOR QUERY to compress for query high (at least for Oracle 11)

@geoHeil
Copy link
Author

geoHeil commented Jan 21, 2023

I can see many:

14:53:57 oracle adapter: Affected row count 0
14:53:57 oracle adapter: Affected row count 1

logs - but not only in the log file even in the normal DBT stdout output. Can this be disabled/ configured or ideally only stored in the outputted JSON?

@geoHeil
Copy link
Author

geoHeil commented Jan 21, 2023

Interestingly,

oracle adapter: Affected row count 0
15:13:18 oracle adapter: Affected row count 0
15:13:18 oracle adapter: Affected row count 0
15:13:18 oracle adapter: Affected row count 1

is logged usually only with 0 or 1 instead of the actual materialized row count number

@aosingh
Copy link
Member

aosingh commented Jan 21, 2023

@geoHeil

You can pass any compression clause in dbt config. That is dbt user controlled. Not hard-coded

{{config(materialized='table', parallel=4, table_compression_clause='COMPRESS')}}

@geoHeil
Copy link
Author

geoHeil commented Jan 22, 2023

Exactly. This is what I did. I just wanted to let you know that the default one you suggested in the docs was incompatible with Oracle 11.

What about the row count logs?

  • the logs on stdout seem to be polluted. Would it be possible to only send these to the DBT log file logs? Or perhaps only show these during debugging?
  • to me, it looks like these row counts are invalid i.e. 0 or 1 instead of the actual rows being inserted.

Is this related to #60?

@aosingh
Copy link
Member

aosingh commented Jan 22, 2023

It is related to #60

We use the row count value as returned.

I think it could be due to CTAS statement. The python-oracledb driver team can confirm how this value is set.

Anyway, if it is too noisy, I can remove it in 1.3.2. Just waiting for @Mirko-T to confirm on #60

@geoHeil
Copy link
Author

geoHeil commented Jan 22, 2023

I still had the old cx.oracle installed not yet the new python-oracledb could this be the rootcause for the bug that I only get Affected row counts logged as either 0 or 1 @aosingh ?

@geoHeil
Copy link
Author

geoHeil commented Jan 23, 2023

@aosingh I have upgraded to the new version (python-oracledb) but the rowcounts are still o or 1 and not the actual numbers.

@aosingh
Copy link
Member

aosingh commented Jan 23, 2023

@geoHeil

python-oracledb is installed with dbt-oracle. You did not have to install that explicitly

The default driver is still cx_oracle. Please read how you can switch driver modes here

In short, set the environment variable to use the new python driver

ORA_PYTHON_DRIVER_TYPE=thin

I simply use cursor.rowcount to set the value.

This read-only attribute specifies the number of rows that have currently been fetched from the cursor (for select statements), that have been affected by the operation (for insert, update, delete and merge statements), or the number of successful executions of the statement (for PL/SQL statements).

@geoHeil
Copy link
Author

geoHeil commented Jan 24, 2023

Interesting - but why can it be that this counter is 0/1 and not the actual value?

@aosingh
Copy link
Member

aosingh commented Jan 25, 2023

@geoHeil I will include you in #60 so that you are on the same page and all discussion is kept under the relevant issue

aosingh added a commit that referenced this issue Feb 2, 2023
Removed noisy log line as reported in #65
@aosingh
Copy link
Member

aosingh commented Feb 3, 2023

@geoHeil - Please test with the released dbt-oracle==1.3.2

@geoHeil
Copy link
Author

geoHeil commented Feb 7, 2023

seems to be quite well

@aosingh aosingh closed this as completed Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants