Release v2.1.0 · dyvenia/viadot

What's Changed

✨ Add complete proxy settings in SAPRFC example by @trymzet in #403
Update tests by @winiar93 in #396
SQLServer To DuckDB by @angelika233 in #404
✨ Added databricks-connect support by @afraijat in #409
✨Added databricks source to viadot by @afraijat in #434
Decrease Docker image size by @trymzet in #458
Added rollback feature, improved comments formatting by @afraijat in #452
🔊 Replaced Prefect logging with Python logging by @afraijat in #459
Add databricks cleanup task by @trymzet in #502
Fix orion prefect deployment by @trymzet in #515
Fix import error from datahub_cleanup_task by @trymzet in #516
✨ Added ExchangeRates source to the library by @djagoda881 in #535
✨ Add Sharepoint 2.0 source by @trymzet in #534
♻️ Update ExchangeRates to use new config by @trymzet in #536
🔥 Remove unused requirements after having removed tasks by @trymzet in #537
♻️ Readd Databricks to init by @trymzet in #538
🔥 Databricks - remove env parameter by @trymzet in #539
♻️ Readding the AzureDataLake source by @trymzet in #540
✨ Add a default cluster port to Databricks source by @trymzet in #541
🐛 Add missing info about the required org_id credential by @trymzet in #542
✨ Add TableDoesNotExist exception by @trymzet in #543
Add from_df() method to Azure Data Lake source by @trymzet in #546
✨ Added parameters to functions in config.py by @djagoda881 in #548
♻️ Update Databricks to work with 2.0 configs by @trymzet in #553
✨ Adding a column to the to_df function by @djagoda881 in #550
✨ Created decorator to include viadot source to df by @fgoiriz in #567
✨ Added migration of cloud_for_customers.py source to 2.0 by @fgoiriz in #552
🐛 Modified Sources init.py by @fgoiriz in #569
🐛 Databricks bug fix by @afraijat in #571
Add SAPRFC source by @AnnaGerlich in #582
Add s3 source by @trymzet in #587
🐛Fix credential handling to handle AWS region by @trymzet in #588
🐛 Fix logger being called before initialization by @trymzet in #589
🐛 Fix credential handling when not specified by @AnnaGerlich in #590
🐛 Fixed and Extended S3 Source by @TillPickha in #603
🐛 Fix handling of empty viadot config by @trymzet in #611
📝 Update docs by @trymzet in #612
Add VSCode setup by @trymzet in #613
Remove prefect as base image by @trymzet in #617
📌 Fix boto dependency hell by @trymzet in #618
📌 Fix boto to particular version by @trymzet in #619
Remove ARM architecture by @trymzet in #620
Remove old dependencies by @trymzet in #622
➕ Add missing pyyaml dependency by @trymzet in #624
➕ Add pydantic dependency by @trymzet in #625
⬆️ Bump viadot package version by @trymzet in #626
⬆️ Upgrade dependencies to fix conflicts by @trymzet in #628
✨ Add test_download_file() test for Sharepoint source by @AnnaGerlich in #608
📝 Minor docs change by @trymzet in #635
Added tests for the RedshiftSpectrum source by @AnnaGerlich in #636
Fixed bug in exchange rates unit tests by @djagoda881 in #638
Databricks replace bug by @djagoda881 in #658
Utils response function by @djagoda881 in #661
Genesys source migration by @djagoda881 in #665
Genesys bug fix by @djagoda881 in #668
Databricks snakecase column bug by @djagoda881 in #673
📝 Added howto migrate from viadot 1 to viadot 2 by @afraijat in #680
⚡️Enhanced S3() and RedshiftSpectrum() sources by @AnnaGerlich in #678
📝 Refined viadot migration docs by @afraijat in #684
Automatically create table folder in RedshiftSpectrum.from_df() by @trymzet in #693
Cleanup compose by @trymzet in #696
Upgrade Databricks connector for 11.3+ runtimes by @trymzet in #697
♻️ Standardize credentials validation by @trymzet in #699
Databricks pandas types casting by @djagoda881 in #701
✨ Added close_connection() to sap_rfc by @AnnaGerlich in #709
🚑 Fixed AWS credentials handling by @AnnaGerlich in #713
🚑 Fixed typo in chunksize parameter name by @AnnaGerlich in #718
✨ Add MSSQL ODBC driver to image by @trymzet in #719
Add Trino source by @trymzet in #726
Source MinIO by @trymzet in #729
✨ Added SAPRFCV2 source class by @AnnaGerlich in #835
Update PyPI pipeline to use Trusted Publishing by @trymzet in #837
✨ Add partition_cols param to Minio.from_df(). by @trymzet in #838
Update version to 2.0a15 by @trymzet in #839
✨ Optimize MinIO and Trino sources by @trymzet in #845
⬆️ Bump version by @trymzet in #859
Implement recursive param in MinIO.rm() by @trymzet in #861
Add Decimal type support to Trino by @trymzet in #862
🔖 Bump version by @trymzet in #864
Trino connection context manager by @trymzet in #865
⚡️ Performance upgrade of SAPRFCV2 and Dockerfile update by @marcinpurtak in #863
➕ Add missing dependencies in setup.py by @djagoda881 in #871
🔖 Bumped viadot2 version to 2.0a19 by @djagoda881 in #872
🐛 Added dependencies directly to setup.py by @djagoda881 in #875
⬆️ Upgraded dependencies versions by @djagoda881 in #876
✨ Added validate() util for dataframe validation by @burzekj in #869
🚀 Bumped viadot2 version to 2.0a20 by @djagoda881 in #877
🧱 Added new dependence management in the project by @djagoda881 in #878
Bump azure-identity by @trymzet in #879
♻️ Apply downstream improvements to the Dockerfile by @trymzet in #881
👷 Add CI to 2.0 by @trymzet in #882
🐛 Fix pip not installing to viadot user's .local dir by @trymzet in #883
🐛 Fix Dockerfile build failing due to file permissions by @trymzet in #884
📝 Improve contributing docs by @trymzet in #889
🐛 Fixing crash when no data returned in SAPRFCV2 by @marcinpurtak in #887
♻️ Added Databircks as optional dependency and source by @djagoda881 in #880
🚀 Added new functionality to saprfc source regarding where statement by @adrian-wojcik in #899
Modified reading the file and inspect provided URL by @Rafalz13 in #914
Bugfix/saprfcv2 column shift by @marcinpurtak in #916
📝 Make docs great again by @trymzet in #924
🐛 Fix MkDocs build failing due to incorrect YAML parsing by @trymzet in #925
🐛 Update lockfiles by @trymzet in #927
Sharepoint handle null values by @Rafalz13 in #931
🔖 Bump version by @trymzet in #933
Sharepoint convert data types to string by @Rafalz13 in #937
🔖 Bump version by @Rafalz13 in #938
⬆️ Relax sql-metadata version requirement by @trymzet in #940
✨ Add orchestration module by @djagoda881 in #917

New Contributors

@afraijat made their first contribution in #409
@fgoiriz made their first contribution in #567
@TillPickha made their first contribution in #603

Full Changelog: v0.4.3...v2.1.0

Old changelog

Added

Added new version of Genesys connector and test files.
Added new version of Outlook connector and test files.
Added new version of Hubspot connector and test files.
Added Mindful connector and test file.
Added sql_server_to_parquet Prefect flow.
Added sap_to_parquet Prefect flow.
Added duckdb_to_sql_server, duckdb_to_parquet, duckdb_transform Prefect flows.
Added bcp and duckdb_query Prefect tasks.
Added DuckDB source class.
Added sql_server_to_minio flow for prefect.
Added df_to_minio task for prefect
Added handling for DatabaseCredentials and Secret blocks in prefect/utlis.py:get_credentials
Added SQLServer source and tasks create_sql_server_table, sql_server_to_df,sql_server_query
Added basename_template to MinIO source
Added _empty_column_to_string and _convert_all_to_string_type to convert data types to string.
Added na_values parameter to Sharepoint class to parse N/A values coming from the excel file columns.
Added get_last_segment_from_url function to sharepoint file.
Added validate function to viadot/utils.py
Fixed Databricks.create_table_from_pandas() failing to overwrite a table in some cases even with replace="True"
Enabled Databricks Connect in the image. To enable, follow this guide
Added Databricks source
Added ExchangeRates source
Added from_df() method to AzureDataLake source
Added SAPRFC source
Added S3 source
Added RedshiftSpectrum source
Added upload() and download() methods to S3 source
Added Genesys source
Fixed a bug in Databricks.create_table_from_pandas(). The function that converts column names to snake_case was not used in every case. (#672)
Added howto_migrate_sources_tasks_and_flows.md document explaining viadot 1 -> 2 migration process
RedshiftSpectrum.from_df() now automatically creates a folder for the table if not specified in to_path
Fixed a bug in Databricks.create_table_from_pandas(). The function now automatically casts DataFrame types. (#681)
Added close_connection() to SAPRFC
Added Trino source
Added MinIO source
Added gen_split() method to SAPRFCV2 class to allow looping over a data frame with generator - improves performance
Added adjust_where_condition_by_adding_missing_spaces() to SAPRFC. The function that is checking raw sql query and modifing it - if needed.

Changed

Changed location of task_utils.py and removed unused/prefect1-related tasks.
Changed the way of handling NA string values and mapped column types to str for Sharepoint source.
Added SQLServerToDF task
Added SQLServerToDuckDB flow which downloads data from SQLServer table, loads it to parquet file and then uploads it do DuckDB
Added complete proxy set up in SAPRFC example (viadot/examples/sap_rfc)
Added Databricks/Spark setup to the image. See README for setup & usage instructions
Added rollback feature to Databricks source
Changed all Prefect logging instances in the sources directory to native Python logging
Changed rm(), from_df(), to_df() methods in S3 Source
Changed get_request() to handle_api_request() in utils.py
Changed SAPRFCV2 in to_df()for loop with generator
Updated Dockerfile to remove obsolete adoptopenjdk and replace it with temurin

Removed

Removed the env param from Databricks source, as user can now store multiple configs for the same source using different config keys
Removed Prefect dependency from the library (Python library, Docker base image)
Removed catch_extra_separators() from SAPRFCV2 class

Fixed

Fixed bcp prefect task to run correct.
Fixed the typo in credentials in SQLServer source

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.1.0

What's Changed

New Contributors

Old changelog

Added

Changed

Removed

Fixed

Contributors