-
Notifications
You must be signed in to change notification settings - Fork 284
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Parallelize encoding of a single row
Fields that contain data that is not natively supported by Parqyet format, such as numpy arrays, are serialized into byte arrays. Images maybe compressed using png or jpeg compression. Serializing fields on a thread pool speeds up this process in some cases (e.g. a row contains multiple images). This PR adds a pool executor argument to `dict_to_spark_row` enabling user to pass a pool executor that would be used for parallelizing this serialization. If no pool executor is specified, the encoding/serialization is performed on the caller thread.
- Loading branch information
Yevgeni Litvin
committed
Apr 20, 2020
1 parent
83a02df
commit 3fe68d4
Showing
2 changed files
with
73 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters