Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How streaminig evaluation is performed for Wav2Vec2.0 Model #3

Open
steventan0110 opened this issue Nov 26, 2023 · 0 comments
Open
Labels
question Further information is requested

Comments

@steventan0110
Copy link

Hi authors,

Thanks for releasing the code for your great work! After reading the paper, I am trying to replicate the simultaneous translation experiment using CIF's output as the pre-decision policy.

However, I have a question about the underlying model used -- Wav2Vec2.0. From my understanding, it requires the complete audio input to encode the feature because it uses normal (non-causal) convnet and tf-encoder and does not support streaming, so how is the inference performed for the wav2vec_cif model? From the paper, it seems that you didn't perform any additional training for Wav2Vec2 to make it support partial input during stream-like inference.

I would greatly appreciate it if you could provide additional details about how evaluation is performed for the streaming case!

@steventan0110 steventan0110 added the question Further information is requested label Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant