Contrastive Semantic Alignment for Speech Referring Expression Comprehension

This repository contains the code for "Contrastive Semantic Alignment for Speech Referring Expression Comprehension (CSRef)".

Data Preparation

Download speech referring expressions, speech encoder weights from Contrastive Semantic Alignment (CSA) stage, and pre-processing annotations JSON file to the data folder, following the path in Google Drive
Download and unzip the LibriSpeech ASR dataset for CSA pre training to the data/audios/ folder
Download and unzip the train2014 images from COCO to the data/images folder
Download bert-base-uncased and wav2vec2-base from HuggingFace to the data/weights/ folder

Installation

Clone this repo
Create a conda virtual environment and activate it

conda create -n csref python=3.7.16

Install Pytorch
Install other packages in requirements.txt

Training

train for CSA stage

CUDA_VISIBLE_DEVICES=1,2,3,4 PORT=23450 bash tools/train_CSA.sh configs/csref_CSA_librispeech.py 4

train for SREC stage

CUDA_VISIBLE_DEVICES=0 PORT=23450 bash tools/train_speech.sh configs/csref_refcoco+_speech.py 1

Acknowledgement

Thanks to the following repos for their great works:

SimREC

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
csref		csref
tools		tools
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contrastive Semantic Alignment for Speech Referring Expression Comprehension

Data Preparation

Installation

Training

train for CSA stage

train for SREC stage

Acknowledgement

About

Releases

Packages

Languages

Liong-Steve/CSRef

Folders and files

Latest commit

History

Repository files navigation

Contrastive Semantic Alignment for Speech Referring Expression Comprehension

Data Preparation

Installation

Training

train for CSA stage

train for SREC stage

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages