Releases: dyvenia/viadot
Releases · dyvenia/viadot
Viadot 0.2.9
Release due to CI/CD error
Viadot 0.2.8
Changed
- CI/CD:
dev
image is now only published on push to thedev
branch - Docker:
- updated registry links to use the new
ghcr.io
domain run.sh
now also accepts the-t
option. When run in standard mode, it will only spin up theviadot_jupyter_lab
service.
When ran with-t dev
, it will also spin upviadot_testing
andviadot_docs
containers.
- updated registry links to use the new
Fixed
- ADLSToAzureSQL - fixed path parameter issue.
Viadot 0.2.7
Added
- Added
SQLiteQuery
task - Added
CloudForCustomers
source - Added
CloudForCustomersToDF
andCloudForCustomersToCSV
tasks - Added
CloudForCustomersToADLS
flow - Added support for parquet in
CloudForCustomersToDF
- Added style guidelines to the
README
Changed
- Changed CI/CD algorithm
- the
latest
Docker image is now only updated on release and is the same exact image as the latest release - the
dev
image is released only on pushes and PRs to thedev
branch (so dev branch = dev image)
- the
- Modified
ADLSToAzureSQL
- read_sep and write_sep parameters added to the flow.
Fixed
- Fixed
ADLSToAzureSQL
breaking in"append"
mode if the table didn't exist (#145). - Fixed
ADLSToAzureSQL
breaking in promotion path for csv files.
Viadot 0.2.6
Added
- Added flows library docs to the references page
[Changed]
- Moved task library docs page to topbar
- Updated docs for task and flows
Viadot 0.2.5
Added
- Added
start
andend_date
parameters toSupermetricsToADLS
flow - Added a tutorial on how to pull data from
Supermetrics
Viadot 0.2.4
Added
- Added documentation (both docstrings and MKDocs docs) for multiple tasks
- Added
start_date
andend_date
parameters to theSupermetricsToAzureSQL
flow - Added a temporary workaround
df_to_csv_task
task to theSupermetricsToADLS
flow to handle mixed dtype columns not handled automatically by DataFrame'sto_parquet()
method
Viadot 0.2.3
Added
- Added a test for
SupermetricsToADLS
flow - Added a test for
AzureDataLakeList
task - Added PR template for new PRs
- Added a
write_to_json
util task to theSupermetricsToADLS
flow. This task dumps the input expectations dict to the local filesystem as is required by Great Expectations.
This allows the user to simply pass a dict with their expectations and not worry about the project structure required by Great Expectations - Added
Shapely
andimagehash
dependencies required for fullvisions
functionality (installingvisions[all]
breaks the build) - Added more parameters to control CSV parsing in the
ADLSGen1ToAzureSQLNew
flow - Added
keep_output
parameter to theRunGreatExpectationsValidation
task to control Great Expectations output to the filesystem - Added
keep_validation_output
parameter andcleanup_validation_clutter
task to theSupermetricsToADLS
flow to control Great Expectations output to the filesystem
Changed
- Modified
RunGreatExpectationsValidation
task to use the built in support for evaluation parameters added in Prefect v0.15.3 - Modified
SupermetricsToADLS
andADLSGen1ToAzureSQLNew
flows to align with this recipe for reading the expectation suite JSON
The suite now has to be loaded before flow initialization in the flow's python file and passed as an argument to the flow's constructor. - Modified
RunGreatExpectationsValidation
'sexpectations_path
parameter to point to the directory containing the expectation suites instead of the
Great Expectations project directory, which was confusing. The project directory is now only used internally and not exposed to the user - Changed the logging of docs URL for
RunGreatExpectationsValidation
task to use GE's recipe from the docs
Removed
- Removed
SupermetricsToAzureSQLv2
andSupermetricsToAzureSQLv3
flows - Removed
geopy
dependency
Viadot 0.2.2
Added
- Added support for parquet in
AzureDataLakeToDF
- Added proper logging to the
RunGreatExpectationsValidation
task - Added the
viz
Prefect extra to requirements to allow flow visualizaion - Added a few utility tasks in
task_utils
- Added
geopy
dependency - Tests
- Tasks:
AzureDataLakeList
- for listing files in an ADLS directory
- Flows:
ADLSToAzureSQL
- promoting files to conformed, operations,
creating an SQL table and inserting the data into itADLSContainerToContainer
- copying files between ADLS containers
Changed
- Renamed
ReadAzureKeyVaultSecret
andRunAzureSQLDBQuery
tasks to match Prefect naming style - Flows:
SupermetricsToADLS
- changed csv to parquet file extension. File and schema info are loaded to theRAW
container.
Fixed
- Removed the broken version autobump from CI
Viadot 0.2.1
Added
- Flows:
SupermetricsToAdls
- supporting immutable ADLS setup
Changed
- A default value for the
ds_user
parameter inSupermetricsToAzureSQLv3
can now be
specified in theSUPERMETRICS_DEFAULT_USER
secret - Updated multiple dependencies
Fixed
- Fixed "Local run of
SupermetricsToAzureSQLv3
skips all tasks afterunion_dfs_task
" (#59) - Fixed the
release
GitHub action
Viadot 0.2.0
This release brings many new tasks and flows, as well as improved stability, security, and monitoring.
Added
-
Sources:
AzureDataLake
(supports gen1 & gen2)SQLite
-
Tasks:
DownloadGitHubFile
AzureDataLakeDownload
AzureDataLakeUpload
AzureDataLakeToDF
ReadAzureKeyVaultSecret
CreateAzureKeyVaultSecret
DeleteAzureKeyVaultSecret
SQLiteInsert
SQLiteSQLtoDF
AzureSQLCreateTable
RunAzureSQLDBQuery
BCPTask
RunGreatExpectationsValidation
SupermetricsToDF
-
Flows:
SupermetricsToAzureSQLv1
SupermetricsToAzureSQLv2
SupermetricsToAzureSQLv3
AzureSQLTransform
Pipeline
ADLSGen1ToGen2
ADLSGen1ToAzureSQL
ADLSGen1ToAzureSQLNew
-
Examples:
- Hello world flow
- Supermetrics Google Ads extract
Changed
- Tasks now use secrets for credential management (azure tasks use Azure Key Vault secrets)
- SQL source now has a default query timeout of 1 hour
Fixed
- Fix
SQLite
tests - Multiple stability improvements with retries and timeouts