The overall pipeline for training a speaker representation network has two main components, which are referred to in this repo as a generator and a classifier. The generator is the network which actually produces the embedding:
input (Acoustic features) -> generator (i.e. ResNet) -> embedding
Acting on this embedding is the classifier:
embedding -> classifier (i.e. NN projected to num_classes with Softmax) -> class prediction
In a classic scenario, this classifier is usually a feed forward network which projects to the number of classes, and trained using Cross Entropy Loss. This repo includes some alternate options such as angular penalty losses.
To install the Simple-tklia toolkit, do the following steps:
- We recomend using a conda venv : https://anaconda.org/anaconda/conda
- Install PyTorch (http://pytorch.org/).
- Clone the Simple-tklia repository:
git clone https://github.com/Chaanks/stklia
- Go into the project folder and Install the needed packages with:
pip install -r requirement.txt
Exemples script can be found in exemples/. These exemples are ready to be run on the LIA's servers. We recommend trying them out before going further 😉.
To use this toolkit, please prepare your data with kaldi. When specifying a kaldi dataset folder to our toolkit, please ensure that this folder contains thes files feats.scp
, utt2spk
, spk2utt
.
Tutorials on how to prepare some popular datasets can be found here.
The training and testing of a model is handled with the script run.py
:
python run.py [-h] -m {train,test} --cfg CFG [--checkpoint CHECKPOINT]
To train a model, simply specify the train mode and a configuration file to run.py
.
Exemple :
python run.py --mode train --cfg config/example_speaker.cfg
In order to resume an experiment from an existing checkpoint interval, add the --checkpoint
arguement.
Exemple:
python run.py --mode train --cfg config/example_speaker.cfg --checkpoint 1000
To test a model, simply specify the test mode and a configuration file to run.py
.
Exemple :
python run.py --mode test --cfg config/example_speaker.cfg
A checkpoint can be added to the argument to test. If no checkpoint is specified, the last iteration will be used. Exemple :
python run.py --mode test --cfg config/example_speaker.cfg --checkpoint 1250
Training and validation are saved in a tensorboard in the folder stklia/runs/
. To visualize the data, use the command :
tensorboard --logdir runs/
If launching tensorboard from a remote server, use the --port <your_port>
and --bind_all
options :
tensorboard --logdir runs/ --port <your_port> --bind_all
Note: make sure to be in the conda venv or to have tensorboard installed
To extract the x-vector of a dataset, use the extract.py script with the following command :
python extract.py [-h] -m MODELDIR [--checkpoint CHECKPOINT] -d DATA [-f {ark,txt,pt}]
--modeldir
should be the path to the trained model you want to extract the xvectors with. This folder should at least contain decheckpoints
folder and theexperiment_settings.cfg
file. Theexperiment_settings.cfg
(orexperiment_settings_resume.cfg
if present) will be used to create the generator.--checkpoint
Is an optional parameter, it can be used to specify a checkpoint for the extraction. If not specifed the last checkpoint will be used.--data
can be used in 2 manners : You can specify a kaldi folder, and the folder's data will be extracted. Or, you can simply specify test/eval/train, and the corresponding dataset ofexperiment_settings.cfg
will be extracted.--format
is used to specify the output format of the xvectors. It can be kaldi (ark
), text (txt
), or pytorch (pt
). Default is kaldi.
An example .cfg file for speaker training is provided below and in configs/example_speaker.cfg:
These are the locations of the datasets.
train
field is mandatory for train mode.
test
and test_trial
are mandatory for test mode.
eval
and eval_trial
are optional field. If they are not specified, no evaluation is done during training.
It is possible to specify multiple folders. If so, the folder will be merged into one dataset class containing all the data.
Make sure to specify the number of features of your data with features_per_frame
.
[Datasets]
train = path/to/kaldi/train/data/
eval = path/to/kaldi/eval/data/
eval_trial = path/to/trials/file1
test = path/to/kaldi/test/data/
path/to/kaldi/enroll/data
test_trial = path/to/trials/file1
path/to/trials/file2
path/to/trials/file3
features_per_frame = 61
The format of trials is as follows:
1 <utterance_a> <utterance_b>
0 <utterance_a> <utterance_c>
Most of these configurable hyper-parameters are fairly self-explanatory.
[Hyperparams]
lr = 0.2
batch_size = 128
max_seq_len = 400
no_cuda = False
seed = 1234
num_iterations = 2000 # total num batches to train
momentum = 0.5
scheduler_steps = [1000, 1500, 1750]
scheduler_lambda = 0.5 # multiplies lr by this value at each step above
multi_gpu = False # dataparallel
This section is used to specify the model size, the embeddings size, pooling mode.
Pooling can be min
, max
, mean
, std
, statistical
.
[Model]
emb_size = 256
layers = [3, 4, 6, 3]
num_filters = [32, 64, 128, 256]
zero_init_residual = True
pooling = statistical
The model_dir is the folder in which models are stored. At every checkpoint_interval iterations, both the generator and classifier will be stored as a .pt model inside this folder. Each model has the form: g_<iterations>.pt, c_<iterations>.pt. This is relevant to the above section of how to resume from a previous checkpoint. For example, to resume from the 1000th iteration, both g_1000.pt, c_1000.pt must exist in checkpoints_dir.
[Outputs]
model_dir = exp/example_exp_speaker # place where models are stored
checkpoint_interval = 10 # Interval to save models and also evaluate
checkpoints_dir = checkpoints # checkpoints will be stored in <model_dir>/<checkpoints_dir>/
log_interval = 1
With a batch size of 128
, since batch size too big can lead to Cuda out of memory.
#!/bin/bash
#SBATCH --job-name="👾"
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --partition=gpu
#SBATCH --gpus-per-node=2
#SBATCH --mem=16G
#SBATCH --time=120:00:00
source /etc/profile.d/conda.sh
source /etc/profile.d/cuda.sh
conda activate <venv name>
python run.py ...
https://github.com/cvqluu/dropclass_speaker
https://github.com/4uiiurz1/pytorch-adacos
https://github.com/ZhaoJ9014/face.evoLVe.PyTorch