v2.1.0
What's Changed
- ✨ Add complete proxy settings in
SAPRFC
example by @trymzet in #403 - Update tests by @winiar93 in #396
- SQLServer To DuckDB by @angelika233 in #404
- ✨ Added databricks-connect support by @afraijat in #409
- ✨Added databricks source to viadot by @afraijat in #434
- Decrease Docker image size by @trymzet in #458
- Added rollback feature, improved comments formatting by @afraijat in #452
- 🔊 Replaced Prefect logging with Python logging by @afraijat in #459
- Add databricks cleanup task by @trymzet in #502
- Fix orion prefect deployment by @trymzet in #515
- Fix import error from
datahub_cleanup_task
by @trymzet in #516 - ✨ Added
ExchangeRates
source to the library by @djagoda881 in #535 - ✨ Add
Sharepoint
2.0 source by @trymzet in #534 - ♻️ Update
ExchangeRates
to use new config by @trymzet in #536 - 🔥 Remove unused requirements after having removed tasks by @trymzet in #537
- ♻️ Readd Databricks to init by @trymzet in #538
- 🔥 Databricks - remove
env
parameter by @trymzet in #539 - ♻️ Readding the
AzureDataLake
source by @trymzet in #540 - ✨ Add a default cluster port to
Databricks
source by @trymzet in #541 - 🐛 Add missing info about the required
org_id
credential by @trymzet in #542 - ✨ Add
TableDoesNotExist
exception by @trymzet in #543 - Add
from_df()
method toAzure Data Lake
source by @trymzet in #546 - ✨ Added parameters to functions in config.py by @djagoda881 in #548
- ♻️ Update Databricks to work with 2.0 configs by @trymzet in #553
- ✨ Adding a column to the to_df function by @djagoda881 in #550
- ✨ Created decorator to include viadot source to df by @fgoiriz in #567
- ✨ Added migration of cloud_for_customers.py source to 2.0 by @fgoiriz in #552
- 🐛 Modified Sources init.py by @fgoiriz in #569
- 🐛 Databricks bug fix by @afraijat in #571
- Add SAPRFC source by @AnnaGerlich in #582
- Add s3 source by @trymzet in #587
- 🐛Fix credential handling to handle AWS region by @trymzet in #588
- 🐛 Fix logger being called before initialization by @trymzet in #589
- 🐛 Fix credential handling when not specified by @AnnaGerlich in #590
- 🐛 Fixed and Extended S3 Source by @TillPickha in #603
- 🐛 Fix handling of empty viadot config by @trymzet in #611
- 📝 Update docs by @trymzet in #612
- Add VSCode setup by @trymzet in #613
- Remove prefect as base image by @trymzet in #617
- 📌 Fix boto dependency hell by @trymzet in #618
- 📌 Fix boto to particular version by @trymzet in #619
- Remove ARM architecture by @trymzet in #620
- Remove old dependencies by @trymzet in #622
- ➕ Add missing
pyyaml
dependency by @trymzet in #624 - ➕ Add
pydantic
dependency by @trymzet in #625 - ⬆️ Bump
viadot
package version by @trymzet in #626 - ⬆️ Upgrade dependencies to fix conflicts by @trymzet in #628
- ✨ Add
test_download_file()
test forSharepoint
source by @AnnaGerlich in #608 - 📝 Minor docs change by @trymzet in #635
- Added tests for the
RedshiftSpectrum
source by @AnnaGerlich in #636 - Fixed bug in exchange rates unit tests by @djagoda881 in #638
- Databricks replace bug by @djagoda881 in #658
- Utils response function by @djagoda881 in #661
- Genesys source migration by @djagoda881 in #665
- Genesys bug fix by @djagoda881 in #668
- Databricks snakecase column bug by @djagoda881 in #673
- 📝 Added howto migrate from viadot 1 to viadot 2 by @afraijat in #680
- ⚡️Enhanced
S3()
andRedshiftSpectrum()
sources by @AnnaGerlich in #678 - 📝 Refined viadot migration docs by @afraijat in #684
- Automatically create table folder in
RedshiftSpectrum.from_df()
by @trymzet in #693 - Cleanup compose by @trymzet in #696
- Upgrade Databricks connector for 11.3+ runtimes by @trymzet in #697
- ♻️ Standardize credentials validation by @trymzet in #699
- Databricks pandas types casting by @djagoda881 in #701
- ✨ Added
close_connection()
tosap_rfc
by @AnnaGerlich in #709 - 🚑 Fixed AWS credentials handling by @AnnaGerlich in #713
- 🚑 Fixed typo in
chunksize
parameter name by @AnnaGerlich in #718 - ✨ Add MSSQL ODBC driver to image by @trymzet in #719
- Add Trino source by @trymzet in #726
- Source MinIO by @trymzet in #729
- ✨ Added
SAPRFCV2
source class by @AnnaGerlich in #835 - Update PyPI pipeline to use Trusted Publishing by @trymzet in #837
- ✨ Add
partition_cols
param toMinio.from_df()
. by @trymzet in #838 - Update version to 2.0a15 by @trymzet in #839
- ✨ Optimize
MinIO
andTrino
sources by @trymzet in #845 - ⬆️ Bump version by @trymzet in #859
- Implement
recursive
param inMinIO.rm()
by @trymzet in #861 - Add
Decimal
type support to Trino by @trymzet in #862 - 🔖 Bump version by @trymzet in #864
- Trino connection context manager by @trymzet in #865
- ⚡️ Performance upgrade of
SAPRFCV2
andDockerfile
update by @marcinpurtak in #863 - ➕ Add missing dependencies in
setup.py
by @djagoda881 in #871 - 🔖 Bumped
viadot2
version to2.0a19
by @djagoda881 in #872 - 🐛 Added dependencies directly to
setup.py
by @djagoda881 in #875 - ⬆️ Upgraded dependencies versions by @djagoda881 in #876
- ✨ Added
validate()
util for dataframe validation by @burzekj in #869 - 🚀 Bumped viadot2 version to
2.0a20
by @djagoda881 in #877 - 🧱 Added new dependence management in the project by @djagoda881 in #878
- Bump
azure-identity
by @trymzet in #879 - ♻️ Apply downstream improvements to the Dockerfile by @trymzet in #881
- 👷 Add CI to 2.0 by @trymzet in #882
- 🐛 Fix pip not installing to viadot user's
.local
dir by @trymzet in #883 - 🐛 Fix Dockerfile build failing due to file permissions by @trymzet in #884
- 📝 Improve contributing docs by @trymzet in #889
- 🐛 Fixing crash when no data returned in
SAPRFCV2
by @marcinpurtak in #887 - ♻️ Added Databircks as optional dependency and source by @djagoda881 in #880
- 🚀 Added new functionality to saprfc source regarding where statement by @adrian-wojcik in #899
- Modified reading the file and inspect provided URL by @Rafalz13 in #914
- Bugfix/saprfcv2 column shift by @marcinpurtak in #916
- 📝 Make docs great again by @trymzet in #924
- 🐛 Fix MkDocs build failing due to incorrect YAML parsing by @trymzet in #925
- 🐛 Update lockfiles by @trymzet in #927
- Sharepoint handle null values by @Rafalz13 in #931
- 🔖 Bump version by @trymzet in #933
- Sharepoint convert data types to string by @Rafalz13 in #937
- 🔖 Bump version by @Rafalz13 in #938
- ⬆️ Relax sql-metadata version requirement by @trymzet in #940
- ✨ Add orchestration module by @djagoda881 in #917
New Contributors
- @afraijat made their first contribution in #409
- @fgoiriz made their first contribution in #567
- @TillPickha made their first contribution in #603
Full Changelog: v0.4.3...v2.1.0
Old changelog
Added
- Added new version of
Genesys
connector and test files. - Added new version of
Outlook
connector and test files. - Added new version of
Hubspot
connector and test files. - Added
Mindful
connector and test file. - Added
sql_server_to_parquet
Prefect flow. - Added
sap_to_parquet
Prefect flow. - Added
duckdb_to_sql_server
,duckdb_to_parquet
,duckdb_transform
Prefect flows. - Added
bcp
andduckdb_query
Prefect tasks. - Added
DuckDB
source class. - Added
sql_server_to_minio
flow for prefect. - Added
df_to_minio
task for prefect - Added handling for
DatabaseCredentials
andSecret
blocks inprefect/utlis.py:get_credentials
- Added
SQLServer
source and taskscreate_sql_server_table
,sql_server_to_df
,sql_server_query
- Added
basename_template
toMinIO
source - Added
_empty_column_to_string
and_convert_all_to_string_type
to convert data types to string. - Added
na_values
parameter toSharepoint
class to parseN/A
values coming from the excel file columns. - Added
get_last_segment_from_url
function to sharepoint file. - Added
validate
function toviadot/utils.py
- Fixed
Databricks.create_table_from_pandas()
failing to overwrite a table in some cases even withreplace="True"
- Enabled Databricks Connect in the image. To enable, follow this guide
- Added
Databricks
source - Added
ExchangeRates
source - Added
from_df()
method toAzureDataLake
source - Added
SAPRFC
source - Added
S3
source - Added
RedshiftSpectrum
source - Added
upload()
anddownload()
methods toS3
source - Added
Genesys
source - Fixed a bug in
Databricks.create_table_from_pandas()
. The function that converts column names to snake_case was not used in every case. (#672) - Added
howto_migrate_sources_tasks_and_flows.md
document explaining viadot 1 -> 2 migration process RedshiftSpectrum.from_df()
now automatically creates a folder for the table if not specified into_path
- Fixed a bug in
Databricks.create_table_from_pandas()
. The function now automatically casts DataFrame types. (#681) - Added
close_connection()
toSAPRFC
- Added
Trino
source - Added
MinIO
source - Added
gen_split()
method toSAPRFCV2
class to allow looping over a data frame with generator - improves performance - Added
adjust_where_condition_by_adding_missing_spaces()
toSAPRFC
. The function that is checking raw sql query and modifing it - if needed.
Changed
- Changed location of
task_utils.py
and removed unused/prefect1-related tasks. - Changed the way of handling
NA
string values and mapped column types tostr
forSharepoint
source. - Added
SQLServerToDF
task - Added
SQLServerToDuckDB
flow which downloads data from SQLServer table, loads it to parquet file and then uploads it do DuckDB - Added complete proxy set up in
SAPRFC
example (viadot/examples/sap_rfc
) - Added Databricks/Spark setup to the image. See README for setup & usage instructions
- Added rollback feature to
Databricks
source - Changed all Prefect logging instances in the
sources
directory to native Python logging - Changed
rm()
,from_df()
,to_df()
methods inS3
Source - Changed
get_request()
tohandle_api_request()
inutils.py
- Changed
SAPRFCV2
into_df()
for loop with generator - Updated
Dockerfile
to remove obsoleteadoptopenjdk
and replace it withtemurin
Removed
- Removed the
env
param fromDatabricks
source, as user can now store multiple configs for the same source using different config keys - Removed Prefect dependency from the library (Python library, Docker base image)
- Removed
catch_extra_separators()
fromSAPRFCV2
class
Fixed
- Fixed
bcp
prefect task to run correct. - Fixed the typo in credentials in
SQLServer
source