Is it possible to run multiple files using FIleAudioSource and have the one final Inference result? #253
Replies: 1 comment 3 replies
-
Hi @esphoenixc, If you want to increase the duration of audio sent to the model at once, you should take a look at Concerning the splitting of files into smaller files to process sequentially. If you keep using the same May I ask why you want to split a file into multiple ones? The processing time will not be impacted by this splitting when you process in streaming. The only advantage I see is that you'd be able to "pause" and "resume" the pipeline. |
Beta Was this translation helpful? Give feedback.
-
I attempted to increase the duration (audio chunk size) for processing, but when I set it too high, errors like NaN values occur, and I never receive any diarization results.
audio_source = FileAudioSource( file="long_audio.wav", sample_rate=16000, block_duration=60.0 )
self.block_size = int(np.rint(block_duration * self.sample_rate))
My idea is to maintain the duration (audio chunk size) that the model is designed/trained for like 1 second. Instead of processing a single long audio file, I plan to split it into multiple smaller files. For example, a 10-minute audio file would be divided into five 2-minute chunks. Then, I would run FileAudioSource on each of these smaller files and combine the results to achieve a final inference for the entire 10-minute audio.
Is this approach feasible? If so, how might it impact the accuracy and processing speed compared to using FileAudioSource on the entire audio file at once? Specifically:
Accuracy: Will splitting the audio into smaller chunks affect the diarization performance? Could it potentially improve or degrade the results?
Performance: How will processing multiple smaller files compare in terms of speed and resource usage versus processing one large file?
Any insights or recommendations based on similar experiences would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions