Skip to content

Latest commit

 

History

History
86 lines (64 loc) · 4.02 KB

README.md

File metadata and controls

86 lines (64 loc) · 4.02 KB

Audio Latent Composition

This project is part of a paper titled "An Example-Based Framework for Perceptually Guided Audio Texture Generation" under review.

Paper | Demo Webpage | Citation

In this paper, we employ an exemplar based approach in conjunction with a pre-trained StyleGAN2 and GAN inversion techniques to find user-defined directions for semantic controllability.

We generate synthetic examples based on William Gaver's "Everyday Listening" approach and find their matching real-world samples by inverting the synthetic samples from the latent space of a pre-trained StyleGAN2. These samples and their respective latent space embeddings are used to derive directional vectors to provide semantic guidance over audio texture generation. Such vectors are able to provide "synthesizer-like" continuous control while generation sounds from the latent space of the GAN.

This repo is adapted and modified for use with audio from Chai et al., "Using latent space regression to analyze and leverage compositionality in GAN". Paper and Code.

Table of Contents

Setup

  • Clone this repo
  • Install dependencies by creating a new conda environment called audio-latent-composition
conda env create -f environment.yml

Add the newly created environment to Jupyter Notebooks

python -m ipykernel install --user --name audio-latent-composition

Notebooks

Notebooks outline how to generate synthetic Gaver sounds (see paper for algorithms) and invert them to real-world audio. Directional vectors generated in the notebooks can be used to edit any randomly generated audio sample.

Training

We use pre-trained StyleGAN2 on audio textures of the Greatest Hits Dataset and Water Filling a Container. All StyleGAN2 checkpoints are downloaded when you run the notebooks in the section above.

Kickstart training of encoder. See config.json for various parameter settings.

python -m training.train_sgan_encoder

Interfaces

We demonstrate the ease of using the directional vectors developed using this method to edit randomly generated samples by actualizing the vectors as sliders on a web-interface.

The interfaces are developed using Streamlit

To run the interface to generate Gaver sounds and perform analysis-by-synthesis in the latent space of the StyleGAN2 (as shown in the demo video) -

cd interface
streamlit run gaver-sounds-interface.py

To perceptually edit any randomly generated samples for Greatest Hits dataset -

streamlit run dim-control-interface-greatesthits.py

To perceptually edit any randomly generated samples for Water dataset -

streamlit run dim-control-interface-water.py

Citation

If you use this code for your research please cite as:

@ARTICLE{kamath2024example,
  author={Kamath, Purnima and Gupta, Chitralekha and Wyse, Lonce and Nanayakkara, Suranga},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={Example-Based Framework for Perceptually Guided Audio Texture Generation}, 
  year={2024},
  volume={},
  number={},
  pages={1-11},
  doi={10.1109/TASLP.2024.3393741}
}