LSHUC

BMVC'23 Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading

OverView

This is an official implementation, based on the 2023 paper titled "Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading." This is a speaker-adaptive lip-reading method.

File Structure

.
├── config.py              # Configuration file, likely for setting up parameters or environment variables
├── cvtransforms.py        # File for computer vision transformations
├── dataloader             # Folder for data loading utilities
│   ├── dataset_op.py      # Dataset if ID information is needed
│   ├── dataset_pl.py      # Dataset implement in pytorch lightning
├── img                    
│   ├── overview.jpg       
│   └── overview.pdf       
├── label_sorted.txt       # Text file containing sorted labels of LRW
├── models                 # Folder containing different model architectures
│   ├── model_enhance.py   # Model for feature enhancement
│   ├── model_ensemble.py  # Model for ensemble model
│   ├── model_r2plus1d.py  # Baseline model
│   ├── model_SD.py        # Speaker verification module
├── README.md              # Readme file for project documentation
├── requirements.txt       # File listing all the project's dependencies
├── scripts                # Scripts for data preparation
│   └── prepare_lrw.py     # Script for preparing the LRW dataset
├── train_baseline.py      # Training script for baseline model
├── train_enhance.py       # Training script for feature enhancement model
├── train_ensemble.py      # Training script for ensemble models
└── train_SD.py            # Training script for speaker verification module

Data Preparation

Prepare LRW Dataset

Download LRW Dataset:
Run scripts/prepare_lrw.py to generate training samples of LRW :

python scripts/prepare_lrw.py

The mouth videos, labels, and word boundary information will be saved in the .pkl format. We pack image sequence as jpeg format into our .pkl files and decoding via PyTurboJPEG. Please remember to set the path in config.py to the location where you have downloaded and pre-processed the dataset.

Prepare LRW-ID

LRW-ID is a re-partitioning of LRW data. You need to download the corresponding split from LRW-ID.

Please remember to set the split_path in config.py to the location where you have downloaded and pre-processed the dataset.

Setup

Set up environment

 pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu113

Training Steps

We propose a 3-step training strategy for our speaker adaptive model, focusing on separate tasks at each step. Please refer to the 'Training Details' subsection in the 'More Detailed Experiments' section of the supplementary materials.

Firstly, we train the left speaker verification module with $L^{ID}{triple}$ and the right lip reading modules with $L^{VSR}{CE}$ separately.

Then, we introduce the feature enhancement module together with the learned speaker verification module and the lip reading module to continue the training process.

Finally, we freeze the feature enhancement module and the speaker verification module to introduce the suppression module to continue training until convergence.

Speaker Verification Module

Please set up config.py, e.g.

path = '/data1/lrw_roi_80_116_175_211_npy_gray_pkl_jpeg/'
split_path = '/home/luosongtao/code/LRW_ID-main/Splits/'
random_seed = 251
batch_size = 130
gpus = 2
base_lr = 2e-4 * batch_size/32.0 
num_workers = 8
max_epoch = 40
resume_path =None
reg = 0.5
precision = 16
verison=0
alpha=0.1

and train with command:

python train_SD.py

Lip Reading Modules

Please set up config.py, e.g.

path = '/data1/lrw_roi_80_116_175_211_npy_gray_pkl_jpeg/'
split_path = '/home/luosongtao/code/LRW_ID-main/Splits/'
random_seed = 251
batch_size = 130
gpus = 2
base_lr = 2e-4 * batch_size/32.0 
num_workers = 8
max_epoch = 10
resume_path =None
reg = 0.5
precision = 16
verison=0
alpha=0.1

and train with command:

python train_baseline.py

Enhance Module

Please set up config.py, e.g.

path = '/data1/lrw_roi_80_116_175_211_npy_gray_pkl_jpeg/'
split_path = '/home/luosongtao/code/LRW_ID-main/Splits/'
random_seed = 251
batch_size = 130
gpus = 2
base_lr = 2e-4 * batch_size/32.0 
num_workers = 8
max_epoch = 10
resume_path =None
reg = 0.5
precision = 16
verison=0
alpha=0.1

and train with command:

python train_enhance.py --SD_model_path /home/luosongtao/code/LSHUC/SD_logs/crop_flip_cl_251/version_63/checkpoints/checkpoints-epoch=09-val_loss=0.01.ckpt --baseline_model_path /home/luosongtao/code/LSHUC/Baseline_logs/crop_flip_cl_251/version_1/checkpoints/checkpoints-epoch=37-val_loss=0.57-val_wer=0.13.ckpt

Ensemble Module

Please set up config.py, e.g.

path = '/data1/lrw_roi_80_116_175_211_npy_gray_pkl_jpeg/'
split_path = '/home/luosongtao/code/LRW_ID-main/Splits/'
random_seed = 251
batch_size = 130
gpus = 2
base_lr = 2e-4 * batch_size/32.0 
num_workers = 8
max_epoch = 10
resume_path =None
reg = 0.5
precision = 16
verison=0
alpha=0.1

and train with command:

python train_ensemble.py --enhance_model_path /home/luosongtao/code/LSHUC/Enhance_logs/crop_flip_cl_251/version_1/checkpoints/checkpoints-epoch=09-val_loss=0.55-val_wer=0.12.ckpt

Benchmark

Method	$L^{ID}_{{triple}}$	$ L^{Enh}{{triple}} $ & $ L^{Sup}{{triple}} $	$ L^{VSR}_{CE} $	Acc(%)
Baseline	-	-	✓	87.25
Ours	❌	❌	✓	87.73
Ours	✓	❌	✓	87.74
Ours	❌	✓	✓	87.75
Ours	✓	✓	✓	87.91

Contact

luosongtao18@mails.ucas.ac.cn

Citation

@inproceedings{Luo_2023_BMVC,
author    = {Songtao Luo and Shuang Yang and Shiguang Shan and Xilin Chen},
title     = {Learning Separable Hidden Unit Contributions for Speaker-Adaptive Visual Speech Recognition},
booktitle = {34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
publisher = {BMVA},
year      = {2023},
url       = {https://bmvc2022.mpi-inf.mpg.de/BMVC2023/0146.pdf}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LSHUC

OverView

File Structure

Data Preparation

Prepare LRW Dataset

Prepare LRW-ID

Setup

Training Steps

Speaker Verification Module

Lip Reading Modules

Enhance Module

Ensemble Module

Benchmark

Contact

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
dataloader		dataloader
img		img
models		models
scripts		scripts
.gitignore		.gitignore
README.md		README.md
config.py		config.py
cvtransforms.py		cvtransforms.py
label_sorted.txt		label_sorted.txt
requirements.txt		requirements.txt
train_SD.py		train_SD.py
train_baseline.py		train_baseline.py
train_enhance.py		train_enhance.py
train_ensemble.py		train_ensemble.py

jinchiniao/LSHUC

Folders and files

Latest commit

History

Repository files navigation

LSHUC

OverView

File Structure

Data Preparation

Prepare LRW Dataset

Prepare LRW-ID

Setup

Training Steps

Speaker Verification Module

Lip Reading Modules

Enhance Module

Ensemble Module

Benchmark

Contact

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages