Releases: dyvenia/viadot
Releases · dyvenia/viadot
Viadot 0.4.9
Added
- Added new column named
_viadot_downloaded_at_utc
in genesys files with the datetime when it is created. - Added sftp source class
SftpConnector
- Added sftp tasks
SftpToDF
andSftpList
- Added sftp flows
SftpToAzureSQL
andSftpToADLS
- Added new source file
mindful
to connect with mindful API. - Added new task file
mindful
to be called by the Mindful Flow. - Added new flow file
mindful_to_adls
to upload data from Mindful API tp ADLS. - Added
recursive
parameter toAzureDataLakeList
task
Viadot 0.4.8
Added
- Added
protobuf
library to requirements
Viadot 0.4.7
Added
- Added new flow -
SQLServerTransform
and new taskSQLServerQuery
to run queries on SQLServer - Added
duckdb_query
parameter toDuckDBToSQLServer
flow to enable option to create table
using outputs of SQL queries - Added handling empty DF in
set_new_kv()
task - Added
update_kv
andfilter_column
params toSAPRFCToADLS
andSAPToDuckDB
flows and addedset_new_kv()
task
intask_utils
- Added Genesys API source
Genesys
- Added tasks
GenesysToCSV
andGenesysToDF
- Added flows
GenesysToADLS
andGenesysReportToADLS
- Added
query
parameter toPrefectLogs
flow
Changed
- Updated requirements.txt
- Changed 'handle_api_response()' method by adding more requests method also added contex menager
Viadot 0.4.6
Added
- Added
rfc_character_limit
parameter inSAPRFCToDF
task,SAPRFC
source,SAPRFCToADLS
andSAPToDuckDB
flows - Added
on_bcp_error
andbcp_error_log_path
parameters inBCPTask
- Added ability to process queries which result exceed SAP's character per low limit in
SAPRFC
source - Added new flow
PrefectLogs
for extracting all logs from Prefect with details - Added
PrefectLogs
flow
Changed
- Changed
CheckColumnOrder
task andADLSToAzureSQL
flow to handle appending to non existing table - Changed tasks order in
EpicorOrdersToDuckDB
,SAPToDuckDB
andSQLServerToDuckDB
- casting
DF to string before adding metadata - Changed
add_ingestion_metadata_task()
to not to add metadata column when input DataFrame is empty - Changed
check_if_empty_file()
logic according to changes inadd_ingestion_metadata_task()
- Changed accepted values of
if_empty
parameter inDuckDBCreateTableFromParquet
- Updated
.gitignore
to ignore files with*.bak
extension and to ignorecredentials.json
in any directory - Changed logger messages in
AzureDataLakeRemove
task
Fixed
- Fixed handling empty response in
SAPRFC
source - Fixed issue in
BCPTask
when log file couln't be opened. - Fixed log being printed too early in
Salesforce
source, which would sometimes cause aKeyError
raise_on_error
now behaves correctly inupsert()
when receiving incorrect return codes from Salesforce
Removed
- Removed option to run multiple queries in
SAPRFCToADLS
Viadot 0.4.5
Added
- Added
error_log_file_path
parameter inBCPTask
that enables setting name of errors logs file - Added
on_error
parameter inBCPTask
that tells what to do if bcp error occurs. - Added error log file and
on_bcp_error
parameter inADLSToAzureSQL
- Added handling POST requests in
handle_api_response()
add added it toEpicor
source. - Added
SalesforceToDF
task - Added
SalesforceToADLS
flow - Added
overwrite_adls
option toBigQueryToADLS
andSharepointToADLS
- Added
cast_df_to_str
task inutils.py
and added this toEpicorToDuckDB
,SAPToDuckDB
,SQLServerToDuckDB
- Added
if_empty
parameter inDuckDBCreateTableFromParquet
task and inEpicorToDuckDB
,SAPToDuckDB
,
SQLServerToDuckDB
flows to check if output Parquet is empty and handle it properly. - Added
check_if_empty_file()
andhandle_if_empty_file()
inutils.py
Viadot 0.4.4
Added
- Added new connector - Outlook. Created
Outlook
source,OutlookToDF
task andOutlookToADLS
flow. - Added new connector - Epicor. Created
Epicor
source,EpicorToDF
task andEpicorToDuckDB
flow. - Enabled Databricks Connect in the image. To enable, follow this guide
- Added
MySQL
source andMySqlToADLS
flow - Added
SQLServerToDF
task - Added
SQLServerToDuckDB
flow which downloads data from SQLServer table, loads it to parquet file and then uplads it do DuckDB - Added complete proxy set up in
SAPRFC
example (viadot/examples/sap_rfc
)
Changed
- Changed default name for the Prefect secret holding the name of the Azure KV secret storing Sendgrid credentials
Viadot 0.4.3
Added
- Added
adls_file_name
inSupermetricsToADLS
andSharepointToADLS
flows - Added
BigQueryToADLS
flow class which anables extract data from BigQuery. - Added
Salesforce
source - Added
SalesforceUpsert
task - Added
SalesforceBulkUpsert
task - Added C4C secret handling to
CloudForCustomersReportToADLS
flow (c4c_credentials_secret
parameter)
Fixed
- Fixed
get_flow_last_run_date()
incorrectly parsing the date - Fixed C4C secret handling (tasks now correctly read the secret as the credentials, rather than assuming the secret is a container for credentials for all environments and trying to access specific key inside it). In other words, tasks now assume the secret holds credentials, rather than a dict of the form
{env: credentials, env2: credentials2}
- Fixed
utils.gen_bulk_insert_query_from_df()
failing with > 1000 rows due to INSERT clause limit by chunking the data into multiple INSERTs - Fixed
get_flow_last_run_date()
incorrectly parsing the date - Fixed
MultipleFlows
when one flow is passed and when last flow fails.
Viadot 0.4.2
Added
- Added
AzureDataLakeRemove
task
Changed
- Changed name of task file from
prefect
toprefect_data_range
Fixed
- Fixed out of range issue in
prefect_data_range
Viadot 0.4.1
Changed
Hot fix - bumped version
Viadot 0.4.0
Added
- Added
custom_mail_state_handler
function that sends mail notification using custom smtp server. - Added new function
df_clean_column
that cleans data frame columns from special characters - Added
df_clean_column
util task that removes special characters from a pandas DataFrame - Added
MultipleFlows
flow class which enables running multiple flows in a given order. - Added
GetFlowNewDateRange
task to change date range based on Prefect flows - Added
check_col_order
parameter inADLSToAzureSQL
- Added new source
ASElite
- Added KeyVault support in
CloudForCustomers
tasks - Added
SQLServer
source - Added
DuckDBToDF
task - Added
DuckDBTransform
flow - Added
SQLServerCreateTable
task - Added
credentials
param toBCPTask
- Added
get_sql_dtypes_from_df
andupdate_dict
util tasks - Added
DuckDBToSQLServer
flow - Added
if_exists="append"
option toDuckDB.create_table_from_parquet()
- Added
get_flow_last_run_date
util function - Added
df_to_dataset
task util for writing DataFrames to data lakes usingpyarrow
- Added retries to Cloud for Customers tasks
- Added
chunksize
parameter toC4CToDF
task to allow pulling data in chunks - Added
chunksize
parameter toBCPTask
task to allow more control over the load process - Added support for SQL Server's custom
datetimeoffset
type - Added
AzureSQLToDF
task - Added
AzureSQLUpsert
task
Changed
- Changed the base class of
AzureSQL
toSQLServer
df_to_parquet()
task now creates directories if needed- Added several more separators to check for automatically in
SAPRFC.to_df()
- Upgraded
duckdb
version to 0.3.2
Fixed
- Fixed bug with
CheckColumnOrder
task - Fixed OpenSSL config for old SQL Servers still using TLS < 1.2
BCPTask
now correctly handles custom SQL Server port- Fixed
SAPRFC.to_df()
ignoring user-specified separator - Fixed temporary CSV generated by the
DuckDBToSQLServer
flow not being cleaned up - Fixed some mappings in
get_sql_dtypes_from_df()
and optimized performance - Fixed
BCPTask
- the case when the file path contained a space - Fixed credential evaluation logic (
credentials
is now evaluated beforeconfig_key
) - Fixed "$top" and "$skip" values being ignored by
C4CToDF
task if provided in theparams
parameter - Fixed
SQL.to_df()
incorrectly handling queries that begin with whitespace
Removed
- Removed
autopick_sep
parameter fromSAPRFC
functions. The separator is now always picked automatically if not provided. - Removed
dtypes_to_json
task to task_utils.py