-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Method Hebrew.no_niqqud
is misleading
#21
Comments
Oh wow, thank you! This is really valuable feedback! As far as adding a method to intelligently add letters, this will need some study. If we are claiming to do this, we have to do it correctly. Are there other replacements besides the ones you mentioned? Are there cases where you do not? etc. This should definitely be possible, thanks for bringing this up! |
Hi Avi! Thank you for the kind words. I'll make sure to add the new As for the Ktiv Male rules, you can find all of them in this link. As you can see, some of them are really simple (like changing every Kibutz to Vav), and some of them are are more complicated (like when to add Vav for the O vowel). If we are to implement this method, I think we'll have to do it step by step. It means that some of the rules will not be implemented right-away and we'll have some gaps for a while. Let me know what you think. |
I don't think it's proper to publish a method that does not do everything it claims, as users we would be quite upset at that. I'm afraid it's all or nothing on the letter replacement. But no reason not to roll out the depreciation separately. Thank you so much for the Ktiv Male rule source, looks like we have an excellent place to pull some unit tests from 😀 It'll take some careful design to implement that pattern, with consideration for things like the other non letter characters that are not nikudit, so the trop. Can we maintain those characters while adding letters? Should those chars be moved over to the new letter in some cases (in which case it likely will not work to keep them). Efficiency needs to be considered, and certainly the grapheme characters need to be considered throughout. |
I'll admit that I have zero knowledge about Taamim, so I'm not sure how to answer your question, and I don't want to say anyhting that might mislead you. As for your comment about the all-or-nothing approach with the Kamatz can be either Kamatz Gadol which is pronounced as the A sound (meaning that no added latter needed), or Kamatz Katan which is pronounced as O sound (meaning we need to add Vav after the previous letter). While there are two different characters for Kamatz Gadol (Unicode: U+05B8) and Kamatz Katan (Unicode: U+05C7), I'm afraid some of the users may use the Kamatz Gadol character for ALL Kamatz appearances, which will lead for wrong results in the It would be a reasonable decision to avoid implementing the Ktiv Male method all together, as it is not an easy method to implement, especially if we're in an all-or-nothing situation. I leave it to you as the owner of the library to make the decision if to go through with it or not. If the former is what you choose, I'll do my best to help you with that. P.S. If you still live in Florida, you and I are in the same time zone. Maybe it would be easier to schedule a Zoom meeting between you and I and discuss all of the possible issues we might face implementing Ktiv Male. Feel free to email me so we can talk :) |
Hi there!
Thank you for this awesome library. It is very useful! I'm currently using it to create a machine-learning model for automatic niqqud.
I looked at the method
Hebrew.no_niqqud
, and its current form is very misleading. This method strips all of the niqqud characters from the word, but this alone doesn't make the word without niqqud.Here is an example:
Look at the word אֹהֶל (tent). If I want to write this word without niqqud I need to add the letter ו to the word: אוהל.
Currently, the function
Hebrew.no_niqud
will turn the word to אהל, which is an incorrect translation.My suggestion:
Rename the method
Hebrew.no_niqqud
toHebrew.strip_niqqud
. This is a much more accurate name for the method that is less misleading.After that, create a new function named
Hebrew.ctiv_male
(full writing) that removes niqqud smartly, adding ו and י (vav and yud) whenever needed.Let me know what do you think about the idea!
The text was updated successfully, but these errors were encountered: