StereoSampleGAN: A computationally inexpensive approach high fidelity stereo audio sample generation.
- Optional but highly reccomended: Set up a Python virtual environment.
- Audio loader package
librosa
requires an outdated version of Numpy
- Audio loader package
- Install requirements by running
pip3 install -r requirements.txt
Specify sample count to generate, output, etc usage_params.py
- Generate audio from the Curated Kick model by running
python3 src/run_pretrained/generate_curated_kick.py
- Generate audio from the Diverse Kick model by running
python3 src/run_pretrained/generate_diverse_kick.py
- Generate audio from the Instrument One Shot model by running
python3 src/run_pretrained/generate_instrument_one_shot.py
Specify training data paramaters in usage_params.py
- I reccomend anywhere between 4,000-8000 training examples, any multiple of 8 and audio <1.5 sec long (longer hasn't been fully tested)
- Prepare training data by running
python3 src/data_processing/encode_audio_data.py
- Train model by running
python3 src/stereo_sample_gan.py
- Generate audio (based on current
usage_params.py
) by runningpython3 src/generate.py
Training progress visualization (training Diverse Kick Drum Model):
Kick drum generation model trained on ~8000 essentially random kick drums.
- More variation between each generated sample, audio is occasionally inconsistent and noisy.
Kick drum generation model trained on ~4400 kick drums with closer matching overall characteristics.
- Less variation between each drum sample's decay and auditory tone.
Instrument one shot generation model, trained on ~3000 semi-curated instrument one shots.
- Demonstrates model's capability to generate longer audio, yet fails to generate coherent and useable instrument one shots.
outputs
: Trained model and generated audiopaper
: Research paper / model writeupstatic
: Static resourcessrc
: Model source codeutils
: Model and data utilitiesdata_processing
: Training data processing scripts