Skip to content

Official repo of INTERSPEECH 2024 paper Genhancer: High-Fidelity Speech Enhancement via Generative Modeling on Discrete Codec Tokens. This repo provides additional audio samples.

Notifications You must be signed in to change notification settings

haiciyang/Genhancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

High-Fidelity Speech Enhancement via Generative Modeling on Discrete Codec Tokens

Official repo for INTERSPEECH 2024 paper Genhancer: High-Fidelity Speech Enhancement via Generative Modeling on Discrete Codec Tokens, providing additional audio samples.

Abstract

We present a high-fidelity generative speech enhancement model, Genhancer, which generates clean speech as discrete codec tokens while conditioning on the input speech features. Discrete codec tokens provide an efficient latent domain in place of the conventional time or time-frequency domain of signals, so as to enable complex modeling of speech and allow generative modeling to enforce speaker consistency and content continuity. We provide insights into the best-fit generation scheme for enhancement among parallel prediction, auto-regression, and masking to demonstrate the benefits of conditioning on both pre-trained and jointly learned speech features. Subjective and objective tests show that Genhancer significantly improves audio quality and speaker-identity retention over the SOTA baselines, including conventional and generative ones while preserving content accuracy. title

About

Official repo of INTERSPEECH 2024 paper Genhancer: High-Fidelity Speech Enhancement via Generative Modeling on Discrete Codec Tokens. This repo provides additional audio samples.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published