Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Style / Prosody guidence? #23

Open
richardburleigh opened this issue Nov 10, 2022 · 3 comments
Open

Style / Prosody guidence? #23

richardburleigh opened this issue Nov 10, 2022 · 3 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@richardburleigh
Copy link

This is really impressive work!

Do you have any ideas or code changes to guide the generated speech style? For example, having the appropriate emotion if we are reading a news story about a tragedy.

I understand there are a few Tacotron projects that achieve this, but their methods often lead to degraded voice quality (in my opinion).

One crazy idea that is easy to try, but probably won't work, is to train on a new dataset and embed the emotion into the generated sequence encoding.

@shivammehta25
Copy link
Owner

shivammehta25 commented Nov 10, 2022

Hello! Thank you for your interest in our work!

We tried something similar to have more control over the synthesised speech in one of our succeeding works which is submitted already.
But the key idea we used, is that you can use the additional information by having an additional encoder process into some controllable space. Then the concatenation of the output of the additional encoder with the output of the text encoder will form the input sequence of states to the neural HMM. In order to provide stronger conditioning we also concatenated the additional information into the output net of the system. It worked and gave us control over the feature space.

Hope this helps! Feel free to ping in case you have further questions :D

Also, PS: In the upcoming days, we are releasing an upgraded model to neural HMM that performs better than the other baseline systems including Neural-HMM, Tacotron 2 with Postnet and Glow TTS in terms of clarity and naturalness.

@shivammehta25
Copy link
Owner

Hello!

We have released OverFlow. In OverFlow not only do we get better naturalness and more accurate pronunciations but we also show speaker adaptation in a low resource setting by simply fine-tuning the model with a way smaller dataset.

Hopefully, it will be useful for your use case as well.

@shivammehta25 shivammehta25 added documentation Improvements or additions to documentation enhancement New feature or request labels Nov 23, 2022
@richardburleigh
Copy link
Author

Amazing work!! Really appreciate you and the team for pushing TTS further 🥇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants