-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detects correct rotation/degrees, but fails to read the text #940
Comments
Thanks for making this new issue, and providing a sample document. I was able to replicate this using the provided code and image. I am confused as to why this is happening, however this appears to be a bug inherited from the main Tesseract codebase rather than something introduced in the Tesseract.js repo. I tested with my local version of the Tesseract CLI, and experienced the same behavior. Regarding a path forward, we should check for existing issues in the Tesseract GitHub page to see if this has already been reported. I would assume a bug this notable would have already been reported at some point. The possible outcomes as they pertain to Tesseract.js are:
I am not sure what the root cause is, but I tested this image at various angles, and it appears to recognize correctly at 0 degrees and 90 degrees, but incorrectly at 180 and 270 degrees. Therefore, it appears that orientation is sometimes working as intended, which makes this more perplexing. |
Hi Balearica, Did you already file an issue with Teserract main codebase or did you find a similar issue there? Curious to know what you found out. |
@apexkid I have not made any progress since writing the post above. |
Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)
https://cdn.jsdelivr.net/npm/tesseract.js@5/dist/tesseract.min.js
Describe the bug
When uploading pictures that are either rotated to the left, right og upside down, TesseractJS successfully reports 90, 180 or 270 degrees in most cases. But when reading the text, it only writes "rubbish" like this:
The same image uploaded with correct rotation gives this output, which is great:
To Reproduce
Save the following code in a .html file, and load it in the browser. Upload the image below.
Image used:
Expected behavior
Uploading the image gives this:
osdAngle: 270 (degrees)
autoRotateAngle: 0 (degrees)
totalAngle: 270 (degrees)
Which is the correct angle. But should TesseractJS be able to turn the image to 0 degrees and then perform OCR and get a more correct output?
Device Version:
The text was updated successfully, but these errors were encountered: