Output npy of hdf5 file using the processor #475
-
Hi, I'm using Coffea in my physics analysis. I'm very curious about how to write npy or hdf5 files in a processor. I understand that the accumulator only can stack histograms and write histo output. Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
@kondratyevd has contributed a feature in #368 to optionally output a dask dataframe from |
Beta Was this translation helpful? Give feedback.
-
Hi @ico1036, The output will be a distributed Dask dataframe. Please let me know if you run into any issues, I will be happy to help. |
Beta Was this translation helpful? Give feedback.
-
@nsmith @kondratyevd |
Beta Was this translation helpful? Give feedback.
Hi @ico1036,
if you can convert the outputs of your processor to Pandas DataFrames, then you should be able to use Dask executor with argument
use_dataframes=True
.The output will be a distributed Dask dataframe.
If you want to continue working with it, or print out as a single dataframe, you will also need to call
output.compute()
after you retrieve the outputs fromrun_uproot_job
. Otherwise, you can directly save chunks of the output dataframe as Parquet files usingdd.to_parquet(df=output)
.Please let me know if you run into any issues, I will be happy to help.