-
Notifications
You must be signed in to change notification settings - Fork 10
question: is it true / on purpose that a text is "corrected" to modern stanard Yiddish? #85
Comments
and now I found it in Zalmen Rejzns Leksikon, Vol. 2, col 314 as well! װוּנטש is written with Aleph in the original! |
I also stumbled over the 'correction' with adding a khirek under the second yud in the word yidish. Neither Rejzen not Weinreich has it. This is also a correction from modern point of view, not reflecting the original. |
I came here to report this exact issue. I think it's absolutely wrong for anyone to change the orthography. I can see the need to correct where the OCR has made a mistake, either in encoding or interpretation or both. But to change to some other orthography, modern standard or otherwise, is absolutely wrong. The whole purpose is to recognize text as written accurately. I've come across examples of this and found them very disconcerting. (I'll try to send one of my own if I find it.) Where is there a set of conventions and rules for editing the OCR output? Who monitors it? |
It is true and it is wrong to do so. This will be corrected in the next version of Jochre. The original text will remain exactly as it was. The set of conventions and rules are simple: you write exactly what is on the page, including typographical errors (if there are any), and including full niqqud as it appears on the printed page. The fixes are automatically applied, but we can easily undo them (including all fixes from a given user) if we find the user is over-fixing or fixing wrongly. However, in the current case, it's Jochre itself who was over-fixing. |
Just read #85 (comment), after I just noticed this in Yudel Mark's Heft far Yidish (https://archive.org/details/nybc204715/page/20/mode/2up). I was moved to make one correction, וווּ => וואו, but it would be a slog to go through each case. Will this book, or ones like it, ever be automatically rescanned? Can that be done? Of course, I could see it being a huge waste of work if books get rescanned and actually valid corrections get thrown away. On the other hand, it's too onerous to go through by hand to make all these corrections. |
@markhdavid All of the books will be re-analyzed using the new version of Jochre (currently being written). We've made good progress, but it isn't yet ready. I say "re-analyzed" and not "re-scanned", since there is no plan to re-digitize the books, only to re-analyze the digital content using the OCR software. The plan is to re-ocr everything, and then to re-apply the user corrections. So no: there is no need to manually correct everything. |
Dear Assaf and Mark ***@***.*** <https://github.com/markhdavid>),
Many thanks for bringing this up again and for working at this problem!
We are looking forward for the new version!
(And good you answer en passant the question on training the program from
our corrections.)
Best,
Mirjam
Mirjam Gutschow
Mariotteplein 15
1098 NW Amsterdam
Nederland
+31-20-3202086
…On Sun, Sep 10, 2023 at 4:11 PM Assaf Urieli ***@***.***> wrote:
@markhdavid <https://github.com/markhdavid> All of the books will be
re-analyzed using the new version of Jochre (currently being written).
We've made good progress, but it isn't yet ready.
I say "re-analyzed" and not "re-scanned", since there is no plan to
re-digitize the books, only to re-analyze the digital content using the OCR
software.
The plan is to re-ocr everything, and then to re-apply the user
corrections. So no: there is no need to manually correct everything.
We will also try to learn from the manual user corrections, but that's a
later phase.
—
Reply to this email directly, view it on GitHub
<#85 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWQNOBGFFRKAFM4HH2S3WTXZXDBNANCNFSM572WRUIA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I just went over a section of nybc200407, p. 11f. I noticed that some words were reproduced in a standardized form, whereas the Yiddish source clearly has a non-modern YIVO-klal-spelling. And just here it is important for a reader nowadays to know that even the great Max Weinreich's texts were first spelled in a different spelling. Two examples: vu with melupn-vov instead of original tsvey-vovn - aleph - vov. (https://bit.ly/3wARw8A)
I noticed some other instances were a letter gimel at the end of a word was "corrected" into a kuf, for example the word באַװײַזנדיג
(https://bit.ly/3wDtEkW)
(There are other issues here too: « » is not recognized, Latin-lettered text is not recognized, there are spaces before punctuation marks. And as for correcting in Jochre, it is tedious that the text cannot be scrolled forward, ie it is hardly possible to correct larger portions of text.)
see screenshot
The text was updated successfully, but these errors were encountered: