-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize feats extraction with opensmile #181
Conversation
I have addressed your comments here and parallelized feature extraction with opensmile. I followed @wilke0818's suggestion to create a custom serializer for the opensmile.Smile object. I remembered I tried some time ago with no success, but this time I had more time to study opensmile's documentation and made it work. The issue was that opensmile.Smile includes a reference to the process and the serializer doesn't like that. By removing that reference, everything seems to work fine.
@satra I followed your suggestion here to use
In case you want to try any alternative solutions, or have ideas, please let me know |
thanks @fabiocat93 for these enhancements and attempts. i think the parselmouth one is good enough for now, no need to try to make it more pickleable. efficient parallelization is going to be a combined function of dataset diversity (number of samples x duration of sample), the types of features we will be extracting, the resources (the hardware, job scheduler, etc.,.) needed. with the b2ai dataset i ran into many of these considerations (without even considering gpu options). so let's merge something like this in, and when we do the code review let's consider possible options for efficiency. also let's get feedback as people use this. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #181 +/- ##
==========================================
+ Coverage 60.24% 63.98% +3.74%
==========================================
Files 113 116 +3
Lines 4017 4101 +84
==========================================
+ Hits 2420 2624 +204
+ Misses 1597 1477 -120 ☔ View full report in Codecov by Sentry. |
could you perhaps merge the other PR that i had (without a release) and then release it with this? |
@fabiocat93 - upgrade to latest pydra release to try. and do post what the issues are with cf. |
defaulting to |
Done.
While testing pydra with
|
yes, i should have told you that (that's what i debugged over the weekend on linux). on macos |
see here: btw, there were some weird notions of that it would not work if placed it in cli.py under |
do you think we can merge now? @satra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just updated the multiprocessing bit with a try except
feel free to merge/release after tests pass.
thank you @fabiocat93 |
This PR introduces parallelization for feature extraction processes using opensmile
and praat_parselmouthto improve performance on large datasets. Key changes include:Improving the Pydra workflow for parselmouth audio processing (maybe by making parselmouth.Sound objects pickable)