How to interpret Pyannet results #1454
Closed
PaulSZH95
started this conversation in
Development
Replies: 1 comment 1 reply
-
https://herve.niderb.fr/posts/2022-10-23-One-speaker-segmentation-model-to-rule-them-all.html |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I am using pyannet for voice activity detection. All code used is the same as the voice_activity_detection notebook given in tutorials.
My question:
I am observing that the inference class gives probability for each 17 ms of and audio.
However, the inference.py of the pyannote repo sets default duration of each chunk to 2 seconds and thus self.step of 0.1 * duration gives 0.2 seconds.
May I know how a sliding window of length 2s with 0.2s step to offer prediction per 0.17ms frame of the audio.
I have taken a look at the inference.py script and couldn't quite figure out the missing link.
Much thanks for any help
Beta Was this translation helpful? Give feedback.
All reactions