The complete dataset (197 GB) can be downloaded directly from the CVSSP server:
Alternatively, the dataset is available through mirrors across four Zenodo repositories:
The dataset consists of 1,344 one-hour recordings obtained in the homes of 8 participants in Belgium during the year 2023. In each participant's home, an AudioMoth was installed in the Living Room and another one in the Kitchen, recording simultaneously. Subsequently, the anonymization sorfware presented below was used to identify the spoken voice and remove it. In addition, the 7 best estimates of sound events calculated using PANNs (the model Cnn14_DecisionLevelAtt_mAP=0.425) for each audio fame of 0.021 seconds are provided.
The dataset is segmented into 14 folders, with each set of two consecutive folders representing recordings from the same house. The dataset has been distributed across three separate Zenodo repositories due to its size.
This repository contains the resources and Jupyter Notebook developed as part of the research paper: "The Sounds of Home: A Speech-Removed Residential Audio dataset for Sound Event Detection," authored by Thomas Deacon, Gabriel Bibbó, Arshdeep Singh, and Mark D. Plumbley from the Center for Vision, Speech and Signal Processing, University of Surrey, UK.
The project focuses on developing a privacy-compliant audio dataset to support sound event detection research aimed at promoting wellbeing for older adults in smart home environments. The core of this research involves an automated speech removal pipeline that detects and eliminates spoken voice segments while preserving other sound events within residential audio recordings.
-
Anonymize.ipynb: An interactive Jupyter Notebook that demonstrates the speech removal process used in our research. This notebook allows users to upload their audio files and apply our pre-trained models to anonymize the audio by removing speech segments.
-
Scripts: Includes all the Python scripts used for audio processing and sound event detection as detailed in our research.
-
Data: Sample data files used for demonstration in the notebook.
The image below provides a visualization of the speech removal process applied to a sample audio recording. This visualization consists of three main components:
-
Audio Spectrogram: The top panel shows the spectrogram of the original audio recording. Here, you can observe various sound events, including speech, as indicated by the brighter areas which represent higher energy or intensity at various frequencies.
-
Sound Event Predictions: The middle panel displays the probabilities of different sound events detected by the model. Each color represents a different type of sound, such as male speech, narration, or other noises like clicking. This graph helps in identifying the temporal occurrence and likelihood of different sound events.
-
Spectrogram with Speech Removed: The bottom panel shows the spectrogram of the audio after the speech has been removed. Notice the significant reduction in intensity in areas where speech was previously detected, indicating the effectiveness of the speech removal process.
Before you begin, ensure you have the following installed:
- Python 3.6+
- Jupyter Notebook or JupyterLab
- Required Python libraries:
librosa
,numpy
,matplotlib
, and others as listed inrequirements.txt
.
Clone this repository to your local machine using:
git clone https://github.com/gbibbo/voice_anonymization.git
Navigate to the cloned directory and install the necessary Python packages:
pip install -r requirements.txt
To use the Anonymize.ipynb notebook:
Open the notebook in Jupyter Notebook or JupyterLab. Follow the instructions within the notebook to upload your audio file and run the cells to process the audio. You can access the notebook directly in Google Colab for an interactive experience without any local setup:
Anonymize Audio Notebook
If you use the data or the anonymization pipeline in your research, please cite our paper:
@inproceedings{bibbo2024sounds, title={The Sounds of Home: A Speech-Removed Residential Audio dataset for Sound Event Detection}, author={Bibbò, Gabriel and Deacon, Thomas and Singh, Arshdeep and Plumbley, Mark D.}, booktitle={INTERSPEECH 2024}, year={2024}, organization={University of Surrey} }
This work was supported by the Engineering and Physical Sciences Research Council (EPSRC) under Grant EP/T019751/1 "AI for Sound (AI4S)."