This is an implement of voice activity detction based on google webrtc.
A VAD <https://en.wikipedia.org/wiki/Voice_activity_detection>
_
classifies a piece of audio data as being voiced or unvoiced. It can
be useful for telephony and speech recognition.
The VAD that Google developed for the WebRTC <https://webrtc.org/>
_
project is reportedly one of the best available, being fast, modern
and free.
install all required packages by pip install -r requirements.txt
check if there is c++ library surport on your PC.
put your raw audio file into the webrtcvad folder. Run vad with python vad.py <aggressive level(0/1/2/3)> <frame length(10/20/30)> <padding duration(100-5000 recommended, depends on your purpose)> <filepath>
-
multichnnel audio accepted.
-
sample rate of 8000Hz,16000Hz, 32000Hz are accepted. If your sample rate is not among those 3, please resample your file by using
resample.py
-
.pcm
and.wav
are accepted. If not, please transfer format before doing vad. -
Wisely choose your own aggressive level, frame duration and padding duration. They affect final result hugely.
-
A one sec mute is to be attached at the beginning and the end of each chunk file. For another length, please adapt the parameter in
def(padding)
invad.py
king_32000.wav
is a 2-channels, 32000Hz sample rate audio file. Run vad by python vad.py 0 30 500 king_32000.wav
.
We can see 10 parts are seperated out from original file. For those noise parts mistakenly recognized as "voiced", we can either manually throw them away or write another code to filt them out.