Pyannote.audio toolkit with MFCC #1680

sumansamui · 2024-04-03T05:41:09Z

sumansamui
Apr 3, 2024

I have a few doubts:

How to use pyannote.audio setup if we want to extract MFCC, i.e., using the speech segmentation model with the MFCC feature. Is there any pre-trained model available for those settings? Or we have to train from scratch.
What is the impact of sample frequency on Sincnet? I know all the input audio is downsampled or upsampled to 16k.

We observed that Pyannote provides the same result for 8k and 16k versions of a WAV file in the case of SincNet architecture. Is it because of the same number of Sinc filters in the low-frequency range for both 8k and 16k.

hbredin · 2024-04-03T07:01:17Z

hbredin
Apr 3, 2024
Maintainer

There is no pretrained model available relying on MFCC. You would have to train them from scratch
I have only ever trained 16kHz models, so I do not really have any intuitions. I guess if you want to train 8kHz models, you would would have to slightly change the kernel size and stride of the first convolutional layer of Sincnet.

1 reply

sumansamui Apr 4, 2024
Author

Thank you Herve, for the clarification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pyannote.audio toolkit with MFCC #1680

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Pyannote.audio toolkit with MFCC #1680

sumansamui Apr 3, 2024

Replies: 1 comment · 1 reply

hbredin Apr 3, 2024 Maintainer

sumansamui Apr 4, 2024 Author

sumansamui
Apr 3, 2024

Replies: 1 comment 1 reply

hbredin
Apr 3, 2024
Maintainer

sumansamui Apr 4, 2024
Author