Documentation: https://viadot.docs.dyvenia.com
Source Code: https://github.com/dyvenia/viadot/tree/2.0
A simple data ingestion library to guide data flows from some places to other places.
Viadot supports several API and RDBMS sources, private and public. Currently, we support the UK Carbon Intensity public API and base the examples on it.
from viadot.sources.uk_carbon_intensity import UKCarbonIntensity
ukci = UKCarbonIntensity()
ukci.query("/intensity")
df = ukci.to_df()
print(df)
Output:
from | to | forecast | actual | index | |
---|---|---|---|---|---|
0 | 2021-08-10T11:00Z | 2021-08-10T11:30Z | 211 | 216 | moderate |
The above df
is a pandas DataFrame
object. It contains data downloaded by viadot
from the Carbon Intensity UK API.
Depending on the destination, viadot
provides different methods of uploading data. For instance, for databases, this would be bulk inserts. For data lakes, it would be file uploads.
For example:
from viadot.sources import UKCarbonIntensity
from viadot.sources import AzureDataLake
ukci = UKCarbonIntensity()
ukci.query("/intensity")
df = ukci.to_df()
adls = AzureDataLake(config_key="my_adls_creds")
adls.from_df(df, "my_folder/my_file.parquet")
We use Rye. You can install it like so:
curl -sSf https://rye-up.com/get | bash
pip install viadot2
In order to start using sources, you must configure them with required credentials. Credentials can be specified either in the viadot config file (by default, $HOME/.config/viadot/config.yaml
), or passed directly to each source's credentials
parameter.
You can find specific information about each source's credentials in the documentation.
Check out the documentation for more information on how to use viadot
.