Skip to content

Releases: dyvenia/viadot

Viadot 0.2.9

29 Oct 15:26
12de657
Compare
Choose a tag to compare

Release due to CI/CD error

Viadot 0.2.8

29 Oct 15:23
1086f47
Compare
Choose a tag to compare

Changed

  • CI/CD: dev image is now only published on push to the dev branch
  • Docker:
    • updated registry links to use the new ghcr.io domain
    • run.sh now also accepts the -t option. When run in standard mode, it will only spin up the viadot_jupyter_lab service.
      When ran with -t dev, it will also spin up viadot_testing and viadot_docs containers.

Fixed

  • ADLSToAzureSQL - fixed path parameter issue.

Viadot 0.2.7

04 Oct 11:30
0cb2c5d
Compare
Choose a tag to compare

Added

  • Added SQLiteQuery task
  • Added CloudForCustomers source
  • Added CloudForCustomersToDF and CloudForCustomersToCSV tasks
  • Added CloudForCustomersToADLS flow
  • Added support for parquet in CloudForCustomersToDF
  • Added style guidelines to the README

Changed

  • Changed CI/CD algorithm
    • the latest Docker image is now only updated on release and is the same exact image as the latest release
    • the dev image is released only on pushes and PRs to the dev branch (so dev branch = dev image)
  • Modified ADLSToAzureSQL - read_sep and write_sep parameters added to the flow.

Fixed

  • Fixed ADLSToAzureSQL breaking in "append" mode if the table didn't exist (#145).
  • Fixed ADLSToAzureSQL breaking in promotion path for csv files.

Viadot 0.2.6

22 Sep 11:48
c39c75d
Compare
Choose a tag to compare

Added

  • Added flows library docs to the references page

[Changed]

  • Moved task library docs page to topbar
  • Updated docs for task and flows

Viadot 0.2.5

20 Sep 15:30
1a29407
Compare
Choose a tag to compare

Added

  • Added start and end_date parameters to SupermetricsToADLS flow
  • Added a tutorial on how to pull data from Supermetrics

Viadot 0.2.4

06 Sep 12:59
b2688e1
Compare
Choose a tag to compare

Added

  • Added documentation (both docstrings and MKDocs docs) for multiple tasks
  • Added start_date and end_date parameters to the SupermetricsToAzureSQL flow
  • Added a temporary workaround df_to_csv_task task to the SupermetricsToADLS flow to handle mixed dtype columns not handled automatically by DataFrame's to_parquet() method

Viadot 0.2.3

19 Aug 14:29
03883f9
Compare
Choose a tag to compare

Added

  • Added a test for SupermetricsToADLS flow
  • Added a test for AzureDataLakeList task
  • Added PR template for new PRs
  • Added a write_to_json util task to the SupermetricsToADLS flow. This task dumps the input expectations dict to the local filesystem as is required by Great Expectations.
    This allows the user to simply pass a dict with their expectations and not worry about the project structure required by Great Expectations
  • Added Shapely and imagehash dependencies required for full visions functionality (installing visions[all] breaks the build)
  • Added more parameters to control CSV parsing in the ADLSGen1ToAzureSQLNew flow
  • Added keep_output parameter to the RunGreatExpectationsValidation task to control Great Expectations output to the filesystem
  • Added keep_validation_output parameter and cleanup_validation_clutter task to the SupermetricsToADLS flow to control Great Expectations output to the filesystem

Changed

  • Modified RunGreatExpectationsValidation task to use the built in support for evaluation parameters added in Prefect v0.15.3
  • Modified SupermetricsToADLS and ADLSGen1ToAzureSQLNew flows to align with this recipe for reading the expectation suite JSON
    The suite now has to be loaded before flow initialization in the flow's python file and passed as an argument to the flow's constructor.
  • Modified RunGreatExpectationsValidation's expectations_path parameter to point to the directory containing the expectation suites instead of the
    Great Expectations project directory, which was confusing. The project directory is now only used internally and not exposed to the user
  • Changed the logging of docs URL for RunGreatExpectationsValidation task to use GE's recipe from the docs

Removed

  • Removed SupermetricsToAzureSQLv2 and SupermetricsToAzureSQLv3 flows
  • Removed geopy dependency

Viadot 0.2.2

27 Jul 16:16
480df8c
Compare
Choose a tag to compare

Added

  • Added support for parquet in AzureDataLakeToDF
  • Added proper logging to the RunGreatExpectationsValidation task
  • Added the viz Prefect extra to requirements to allow flow visualizaion
  • Added a few utility tasks in task_utils
  • Added geopy dependency
  • Tests
  • Tasks:
    • AzureDataLakeList - for listing files in an ADLS directory
  • Flows:
    • ADLSToAzureSQL - promoting files to conformed, operations,
      creating an SQL table and inserting the data into it
    • ADLSContainerToContainer - copying files between ADLS containers

Changed

  • Renamed ReadAzureKeyVaultSecret and RunAzureSQLDBQuery tasks to match Prefect naming style
  • Flows:
    • SupermetricsToADLS - changed csv to parquet file extension. File and schema info are loaded to the RAW container.

Fixed

  • Removed the broken version autobump from CI

Viadot 0.2.1

14 Jul 14:56
1038ec5
Compare
Choose a tag to compare

Added

  • Flows:
    • SupermetricsToAdls - supporting immutable ADLS setup

Changed

  • A default value for the ds_user parameter in SupermetricsToAzureSQLv3 can now be
    specified in the SUPERMETRICS_DEFAULT_USER secret
  • Updated multiple dependencies

Fixed

  • Fixed "Local run of SupermetricsToAzureSQLv3 skips all tasks after union_dfs_task" (#59)
  • Fixed the release GitHub action

Viadot 0.2.0

13 Jul 13:04
7b15a20
Compare
Choose a tag to compare

This release brings many new tasks and flows, as well as improved stability, security, and monitoring.

Added

  • Sources:

    • AzureDataLake (supports gen1 & gen2)
    • SQLite
  • Tasks:

    • DownloadGitHubFile
    • AzureDataLakeDownload
    • AzureDataLakeUpload
    • AzureDataLakeToDF
    • ReadAzureKeyVaultSecret
    • CreateAzureKeyVaultSecret
    • DeleteAzureKeyVaultSecret
    • SQLiteInsert
    • SQLiteSQLtoDF
    • AzureSQLCreateTable
    • RunAzureSQLDBQuery
    • BCPTask
    • RunGreatExpectationsValidation
    • SupermetricsToDF
  • Flows:

    • SupermetricsToAzureSQLv1
    • SupermetricsToAzureSQLv2
    • SupermetricsToAzureSQLv3
    • AzureSQLTransform
    • Pipeline
    • ADLSGen1ToGen2
    • ADLSGen1ToAzureSQL
    • ADLSGen1ToAzureSQLNew
  • Examples:

    • Hello world flow
    • Supermetrics Google Ads extract

Changed

  • Tasks now use secrets for credential management (azure tasks use Azure Key Vault secrets)
  • SQL source now has a default query timeout of 1 hour

Fixed

  • Fix SQLite tests
  • Multiple stability improvements with retries and timeouts