AudioDiffCaps

This repository provides an audio data synthesizing tool for AudioDiffCaps and its captions. AudioDiffCaps dataset consists of (i) pairs of similar but slightly different audio clips and (ii) human-annotated descriptions of their differences.

Please consider citing our paper if you find this repository useful in your work.

@inproceedings{takeuchi2023audiodiffcaps,
    author = "Takeuchi, Daiki and Ohishi, Yasunori and Niizumi, Daisuke and Harada, Noboru and Kashino, Kunio”,
    title = "Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement",
    booktitle = "Proceedings of the 8th Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023)”,
    address = “Tampere, Finland”,
    month = “September”,
    year = "2023”,
}

What is the AudioDiffCaps dataset?

AudioDiffCaps dataset consists of (i) pairs of similar but slightly different audio clips and (ii) human-annotated descriptions of their differences. The pairs of audio clips were artificially synthesized by mixing foreground event sounds with background sounds taken from existing environmental sound datasets (FSD50K and ESC-50) using the Scaper library for soundscape synthesis and augmentation.

Getting Started

Install dependent packages according to the requirements.txt. This will install essential modules for running tools in this repository.

Step 0: Download FSD50K and esc50.

You can download them from the following URLs

FSD50k: https://zenodo.org/record/4060432
ESC-50: https://github.com/karolpiczak/ESC-50

After downloading, rewrite the two variables in utils.py (FSD50K and ESC50) to your environment.

Step 1: Preprocess audio files

Run the following to prepare audio files for synthesizing

python preprocess_org_audio.py

Step 2: Synthesize audio files

There are two scenes and two sprits in this dataset. Audio files of each scene and split are generated by following command. Rain_dev

python synthesize_audio -d datasets/adc_rain/dev

Rain_eval

python synthesize_audio -d datasets/adc_rain/eval

Traffic_dev

python synthesize_audio -d datasets/adc_traffic/dev

Traffic_eval

python synthesize_audio -d datasets/adc_traffic/eval

License

Please check the LICENSE.pdf for the detail.

References

E. Fonseca, X. Favory, J. Pons, F. Font, and X. Serra, “FSD50K: an open dataset of human-labeled sound events,” arXiv preprint arXiv:2010.00475, 2020.
K. J. Piczak, “ESC: Dataset for Environmental Sound Classification,” in Proc. 23rd Annual ACM Conf. Multimedia, pp.1015–1018.
J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello, “Scaper: A library for soundscape synthesis and augmentation,” in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA). IEEE, 2017, pp. 344–348.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
audio_csv		audio_csv
datasets		datasets
.gitignore		.gitignore
LISENCE.pdf		LISENCE.pdf
README.md		README.md
preprocess_org_audio.py		preprocess_org_audio.py
requirements.txt		requirements.txt
synthesize_audio.py		synthesize_audio.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AudioDiffCaps

What is the AudioDiffCaps dataset?

Getting Started

Step 0: Download FSD50K and esc50.

Step 1: Preprocess audio files

Step 2: Synthesize audio files

License

References

About

Releases

Packages

Languages

nttcslab/audio-diff-caps

Folders and files

Latest commit

History

Repository files navigation

AudioDiffCaps

What is the AudioDiffCaps dataset?

Getting Started

Step 0: Download FSD50K and esc50.

Step 1: Preprocess audio files

Step 2: Synthesize audio files

License

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages