Skip to content
This repository has been archived by the owner on Jul 26, 2024. It is now read-only.

Incomplete Word, Nothing to Correct - there should be a way to flag such cases #116

Open
markhdavid opened this issue Dec 27, 2023 · 0 comments

Comments

@markhdavid
Copy link

markhdavid commented Dec 27, 2023

There should be a way to flag that a word is beyond needing to be corrected in the narrow sense of a few characters being misread, and it's missing something completely.

Here's a case of an incomplete word to correct. It's on this page:

https://archive.org/details/doslidfundemyidi00rose/page/n72/mode/1up

This is the OCR for the same:

https://ocr.yiddishbookcenter.org/contents?doc=doslidfundemyidi00rose#page73

It shows a fragment of a word, supposedly עט, but this is in fact just the last two letters of the word. The entire word on the page is אַרבעט, but in the graphic that's shown for correction, only the last two letters appear. So how can this be corrected? There should be a way to flag this "word" as needing to be rescanned completely. It would no make no sense to correct the image of just ״עט״ to be ״אַרבעט״.

Here are images:

The word in context, with the actual entire word surrounded in red and the fragment mistaken for an entire word highlighted in gray:
bad fragment in context - Screenshot 2023-12-27 at 9 20 00 AM

The correction dialog for this fragment of the word:
bad fragment - Screenshot 2023-12-27 at 9 18 54 AM

OK, I see the instruction in the correction dialog

אױב אַ װאָרט איז שלעכט סעגמענטירט (ד“ה אױב נאָר אַ טײל פֿונעם װאָרט באַװײַזט זיך אױבן), טאָר מען עס נישט אױסבעסערן.

(translation: if a word is badly segmented, i.e., if only a part of the word shows up above, you must not correct it), but what are you supposed to do? There should be a way to flag such cases, so this stuff can get corrected. What's the plan?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant