Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning: No pronunciation for word #71

Open
DanielSWolf opened this issue Jan 5, 2022 · 2 comments
Open

Warning: No pronunciation for word #71

DanielSWolf opened this issue Jan 5, 2022 · 2 comments

Comments

@DanielSWolf
Copy link

When running phonetisaurus-apply, I sometimes get the warning "No pronunciation for word: ...". These warnings are very rare (in my case, about one word in 3,000) and usually indicate some spelling error in the word.

What baffles me, however, is that the output does include these words, along with (mostly) plausible pronunciations.

So I wonder:

  • Are there scenarios where Phonetisaurus actually won't generate a pronunciation for a given word? (This is important to me because my pipeline requires a pronunciation for each word.)
  • Might it make sense to change the wording of the warning to something more nuanced?
@DanielSWolf
Copy link
Author

BTW, I'm using the Docker version of Phonetisaurus.

@danijel3
Copy link
Contributor

danijel3 commented Oct 4, 2022

Just to make sure you're on the right track: phonetisaurus-apply is a python script which runs phonetisaurus-g2pfst in the background. So the functioning can be affected by both of these programs.

I recommend you try phonetisaurus-g2pfst alone as well to see what are the differences in the output.

The phonetisaurus-apply adds a bit of more functionality in that it allows to use a predefined lexicon in parallel to the FST model. This means that the program first looks in your lexicon and uses the model only if the word is not in the lexicon already. This has two benefits:

  1. it speeds up processing as it serves as a kind of "cache" for already processed words
  2. it allows you to define manual exceptions to how the model works

Apart from that, not sure what the problem could be. In my experience, if there is a word that cannot be processed, it won't. Maybe you have some duplicate words with minor differences? Maybe its a matter of hidden characters (unicode can be quite a minefield)? You'd have to make a minimal example to be sure. Next time you get this output, try and extract one of these words to a separate file and see how it will work alone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants