You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In dask-deltatable, when calling dd.read_parquet, perhaps we can reuse the metadata already preserved in delta json, instead of collecting it from parquet files all over again.
I think adding dataset={"schema": dt.schema().to_pyarrow()} as a keyword to this read_parquet call
should do the trick. Though it'd be nice if someone could confirm this is the case.
I think Delta Log also contains columns stats, so maybe we can avoid gathering those.
In dask-deltatable, when calling dd.read_parquet, perhaps we can reuse the metadata already preserved in delta json, instead of collecting it from parquet files all over again.
Here:
dask-deltatable/dask_deltatable/core.py
Line 196 in cd731a9
It looks like
dd.read_parquet
will have to go through the parquet files to read the metadata, but theDeltaTable
should have all that info already.The text was updated successfully, but these errors were encountered: