Replies: 1 comment
-
There is no easy way to feed your VAD into the existing diarization pipeline as the latter does not explicitely rely on a VAD step. You'd have to design your own speaker diarization pipeline. You can use PretrainedSpeakerEmbedding to extract embeddings and then any clustering algorithm from scikit-learn for instance. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have already performed VAD step using a different model. If required, I can create a Pynote.Audio.Annotation object from my VAD data. I believe this data more or less represent the segmentation stage output. Is my understanding correct?
Now question is how can I feed this object to embedding and clustering step for performing diarization? I am using the develop branch currently.
Beta Was this translation helpful? Give feedback.
All reactions