Skip to content

Liong-Steve/CSRef

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contrastive Semantic Alignment for Speech Referring Expression Comprehension

This repository contains the code for "Contrastive Semantic Alignment for Speech Referring Expression Comprehension (CSRef)".

Data Preparation

  1. Download speech referring expressions, speech encoder weights from Contrastive Semantic Alignment (CSA) stage, and pre-processing annotations JSON file to the data folder, following the path in Google Drive
  2. Download and unzip the LibriSpeech ASR dataset for CSA pre training to the data/audios/ folder
  3. Download and unzip the train2014 images from COCO to the data/images folder
  4. Download bert-base-uncased and wav2vec2-base from HuggingFace to the data/weights/ folder

Installation

  • Clone this repo
  • Create a conda virtual environment and activate it
conda create -n csref python=3.7.16
  • Install Pytorch
  • Install other packages in requirements.txt

Training

train for CSA stage

CUDA_VISIBLE_DEVICES=1,2,3,4 PORT=23450 bash tools/train_CSA.sh configs/csref_CSA_librispeech.py 4

train for SREC stage

CUDA_VISIBLE_DEVICES=0 PORT=23450 bash tools/train_speech.sh configs/csref_refcoco+_speech.py 1

Acknowledgement

Thanks to the following repos for their great works:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published