Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚗️ Test supporting Python Models (via Create-A-Derived-Table) #3293

Closed
5 tasks
jhpyke opened this issue Feb 12, 2024 · 2 comments
Closed
5 tasks

⚗️ Test supporting Python Models (via Create-A-Derived-Table) #3293

jhpyke opened this issue Feb 12, 2024 · 2 comments
Labels
data-platform-apps-and-tools This issue is owned by Data Platform Apps and Tools enhancement enhancing an existing feature stale

Comments

@jhpyke
Copy link
Contributor

jhpyke commented Feb 12, 2024

User Story

As a… user of Create-A-Derived-Table
I want to… be able to work with tables in Python code, rather than pure SQL
So that… I can run machine learning tasks in a reproducible deployment pipeline.

Value / Purpose

Currently, we do not support using Spark Via Athena. However, customers have very legitimate reasons they cannot deploy their modeled data via SQL alone, as they may be applying machine learning techniques, or other processes that are easier to implement Pythonically. As such, we should use data-engineering-sandbox to explore what we would need to do technically to allow a user to deploy a Python Model.

Useful Contacts

@jhpyke

User Types

No response

Hypothesis

If we... [do a thing]
Then... [this will happen]

Proposal

  1. Create a Spark enabled Athena Workgroup in Sandbox
  2. Use said workgroup to create a workbook via the console.
  3. Use this workbook to pull some TCP-DS data in and do some pythonic transformations (Modelling of Data, conditional transformation that would be hard to do in code, managing Datestamps that don't conform to standard SQL, etc.)

Additional Information

If you have time within the spike, you should then try and apply this information to Create-A-Derived-Table specifically.

  1. Can you use Create-A-Derived-Table to create a Python Model (if 📈 Update Create-A-Derived-Table to newest DBT-Core/DBT-Athena Versions #3290 is not complete before this ticket is done you will need to manually bump your dbt-core/dbt-athena-community versions to support this
  2. Attempt to deploy said model in sandbox
  3. If any changes are required to enable Athena With Spark to work with DBT in the account, document what these changes are.
  4. Document the user experience. In terms of deployment of models.
  • Can we ensure all output tables from Python Models follows current output naming conventions (hive compliant)
  • Ideally can we run a sizeable transformation so we can estimate costs.

Definition of Done

  • We are able to send queries using Athena-Via-Spark and get tables in the Glue Catalog as outputs
  • Those tables are themselves queryable using standard SQL for other models
  • An understanding of any technical changes required to facilitate this functionality is made
  • User journey of making python models (as opposed to SQL models) is understood.
  • Go/No go decision can be made as to whether we even want to explore user interest in supporting this can be made.
@jhpyke jhpyke added enhancement enhancing an existing feature data-platform-apps-and-tools This issue is owned by Data Platform Apps and Tools labels Feb 12, 2024
@jhpyke jhpyke changed the title ⚗️ Test supporting Python Models via Create-A-Derived-Table ⚗️ Test supporting Python Models (via Create-A-Derived-Table) Feb 14, 2024
@jacobwoffenden jacobwoffenden moved this to 👀 TODO in Analytical Platform Feb 15, 2024
Copy link
Contributor

This issue is being marked as stale because it has been open for 60 days with no activity. Remove stale label or comment to keep the issue open.

@github-actions github-actions bot added the stale label Apr 15, 2024
Copy link
Contributor

This issue is being closed because it has been open for a further 7 days with no activity. If this is still a valid issue, please reopen it, Thank you!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 23, 2024
@github-project-automation github-project-automation bot moved this from 👀 TODO to 🎉 Done in Analytical Platform Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-platform-apps-and-tools This issue is owned by Data Platform Apps and Tools enhancement enhancing an existing feature stale
Projects
Archived in project
Development

No branches or pull requests

1 participant