question: is it true / on purpose that a text is "corrected" to modern stanard Yiddish? #85

mirjam-amsterdam · 2022-08-28T08:54:35Z

I just went over a section of nybc200407, p. 11f. I noticed that some words were reproduced in a standardized form, whereas the Yiddish source clearly has a non-modern YIVO-klal-spelling. And just here it is important for a reader nowadays to know that even the great Max Weinreich's texts were first spelled in a different spelling. Two examples: vu with melupn-vov instead of original tsvey-vovn - aleph - vov. (https://bit.ly/3wARw8A)
I noticed some other instances were a letter gimel at the end of a word was "corrected" into a kuf, for example the word באַװײַזנדיג
(https://bit.ly/3wDtEkW)
(There are other issues here too: « » is not recognized, Latin-lettered text is not recognized, there are spaces before punctuation marks. And as for correcting in Jochre, it is tedious that the text cannot be scrolled forward, ie it is hardly possible to correct larger portions of text.)
see screenshot

mirjam-amsterdam · 2022-08-29T07:44:25Z

and now I found it in Zalmen Rejzns Leksikon, Vol. 2, col 314 as well! װוּנטש is written with Aleph in the original!
If all dots in the letters beys and kaf are kept as in the original, also the alef has to be kept!

mirjam-amsterdam · 2022-08-29T07:47:37Z

I also stumbled over the 'correction' with adding a khirek under the second yud in the word yidish. Neither Rejzen not Weinreich has it. This is also a correction from modern point of view, not reflecting the original.

mirjam-amsterdam · 2022-09-18T12:06:06Z

At least not only Weinreich and Rejzen get corrected on their old-fashioned spelling, it happens to Der Pinkes as well.

markhdavid · 2023-03-30T19:40:34Z

I came here to report this exact issue. I think it's absolutely wrong for anyone to change the orthography. I can see the need to correct where the OCR has made a mistake, either in encoding or interpretation or both. But to change to some other orthography, modern standard or otherwise, is absolutely wrong. The whole purpose is to recognize text as written accurately. I've come across examples of this and found them very disconcerting. (I'll try to send one of my own if I find it.) Where is there a set of conventions and rules for editing the OCR output? Who monitors it?

urieli · 2023-04-26T12:22:33Z

It is true and it is wrong to do so.

This will be corrected in the next version of Jochre.

The original text will remain exactly as it was.
We will make an attempt to guess what was meant in YIVO spelling, and store this as a hidden synonym, to facilitate the search mechanism (so that a search for "װוּ" will still return "װאו").

The set of conventions and rules are simple: you write exactly what is on the page, including typographical errors (if there are any), and including full niqqud as it appears on the printed page.

The fixes are automatically applied, but we can easily undo them (including all fixes from a given user) if we find the user is over-fixing or fixing wrongly.

However, in the current case, it's Jochre itself who was over-fixing.

markhdavid · 2023-09-09T20:06:36Z

Just read #85 (comment), after I just noticed this in Yudel Mark's Heft far Yidish (https://archive.org/details/nybc204715/page/20/mode/2up). I was moved to make one correction, וווּ => וואו, but it would be a slog to go through each case. Will this book, or ones like it, ever be automatically rescanned? Can that be done? Of course, I could see it being a huge waste of work if books get rescanned and actually valid corrections get thrown away. On the other hand, it's too onerous to go through by hand to make all these corrections.

urieli · 2023-09-10T14:11:23Z

@markhdavid All of the books will be re-analyzed using the new version of Jochre (currently being written). We've made good progress, but it isn't yet ready.

I say "re-analyzed" and not "re-scanned", since there is no plan to re-digitize the books, only to re-analyze the digital content using the OCR software.

The plan is to re-ocr everything, and then to re-apply the user corrections. So no: there is no need to manually correct everything.
We will also try to learn from the manual user corrections, but that's a later phase.

mirjam-amsterdam · 2023-09-10T18:49:11Z

Dear Assaf and Mark ***@***.*** <https://github.com/markhdavid>), Many thanks for bringing this up again and for working at this problem! We are looking forward for the new version! (And good you answer en passant the question on training the program from our corrections.) Best, Mirjam Mirjam Gutschow Mariotteplein 15 1098 NW Amsterdam Nederland +31-20-3202086

…

On Sun, Sep 10, 2023 at 4:11 PM Assaf Urieli ***@***.***> wrote: @markhdavid <https://github.com/markhdavid> All of the books will be re-analyzed using the new version of Jochre (currently being written). We've made good progress, but it isn't yet ready. I say "re-analyzed" and not "re-scanned", since there is no plan to re-digitize the books, only to re-analyze the digital content using the OCR software. The plan is to re-ocr everything, and then to re-apply the user corrections. So no: there is no need to manually correct everything. We will also try to learn from the manual user corrections, but that's a later phase. — Reply to this email directly, view it on GitHub <#85 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKWQNOBGFFRKAFM4HH2S3WTXZXDBNANCNFSM572WRUIA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question: is it true / on purpose that a text is "corrected" to modern stanard Yiddish? #85

question: is it true / on purpose that a text is "corrected" to modern stanard Yiddish? #85

mirjam-amsterdam commented Aug 28, 2022

mirjam-amsterdam commented Aug 29, 2022 •

edited

Loading

mirjam-amsterdam commented Aug 29, 2022

mirjam-amsterdam commented Sep 18, 2022

markhdavid commented Mar 30, 2023 •

edited

Loading

urieli commented Apr 26, 2023

markhdavid commented Sep 9, 2023

urieli commented Sep 10, 2023

mirjam-amsterdam commented Sep 10, 2023 via email

question: is it true / on purpose that a text is "corrected" to modern stanard Yiddish? #85

question: is it true / on purpose that a text is "corrected" to modern stanard Yiddish? #85

Comments

mirjam-amsterdam commented Aug 28, 2022

mirjam-amsterdam commented Aug 29, 2022 • edited Loading

mirjam-amsterdam commented Aug 29, 2022

mirjam-amsterdam commented Sep 18, 2022

markhdavid commented Mar 30, 2023 • edited Loading

urieli commented Apr 26, 2023

markhdavid commented Sep 9, 2023

urieli commented Sep 10, 2023

mirjam-amsterdam commented Sep 10, 2023 via email

mirjam-amsterdam commented Aug 29, 2022 •

edited

Loading

markhdavid commented Mar 30, 2023 •

edited

Loading