Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract fails on MacOS when passed an absolute path to files in /tmp #4333

Open
philmcmahon opened this issue Oct 28, 2024 · 7 comments
Open

Comments

@philmcmahon
Copy link

philmcmahon commented Oct 28, 2024

Current Behavior

On my M2 Mac, when I run tesseract on a file /tmp/sample.png I get a leptonica 'image file not found' error:

➜  / tesseract /tmp/sample.png /tmp/out
Error in fopenReadStream: failed to open locally with tail sample.png for filename /tmp/sample.png
Leptonica Error in findFileFormat: image file not found: /tmp/sample.png
Error in fopenReadStream: failed to open locally with tail �PNG for filename �PNG
Leptonica Error in pixRead: image file not found: �PNG
Image file �PNG cannot be read!
Error during processing.

If, rather than passing the absolute path I provide a relative path, it works fine:

cd /
tesseract tmp/sample.png /tmp/out
<works>

Similarly, passing an absolute path to e.g. a file in my home directory doesn't cause any problems. This problem does not occur on the Ubuntu EC2 instance I tested it on.

Sample png here but it applies to every PNG file I've tested.

Expected Behavior

I would expect tesseract to work when passed absolute paths to files in the /tmp directory.

Suggested Fix

No response

tesseract -v

tesseract 5.4.1
leptonica-1.85.0
libgif 5.2.2 : libjpeg 8d (libjpeg-turbo 3.0.4) : libpng 1.6.44 : libtiff 4.7.0 : zlib 1.2.12 : libwebp 1.4.0 : libopenjp2 2.5.2
Found NEON
Found libarchive 3.7.7 zlib/1.2.12 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
Found libcurl/8.6.0 SecureTransport (LibreSSL/3.3.6) zlib/1.2.12 nghttp2/1.61.0

Operating System

macOS 14 Sonoma

Other Operating System

14.5

uname -a

Darwin 31814.gnm.int 23.5.0 Darwin Kernel Version 23.5.0: Wed May 1 20:14:38 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6020 arm64

Compiler

No response

CPU

Apple Silicon M2 Max

Virtualization / Containers

No response

Other Information

No response

@philmcmahon philmcmahon changed the title Tesseract fails when passed an absolute path to files in /tmp Tesseract fails on MacOS when passed an absolute path to files in /tmp Oct 28, 2024
@stweil
Copy link
Member

stweil commented Oct 28, 2024

That's a "feature" of Leptonica for macOS and Windows. An easy workaround is using //tmp or /./tmp instead of /tmp because these variants don't trigger Leptonica's internal translation.

@stweil
Copy link
Member

stweil commented Oct 28, 2024

Maybe we should apply the workaround automatically for image names in Tesseract code. But then users might be confused why Tesseract changes the image name which is visible in log messages and some OCR output formats.

@philmcmahon
Copy link
Author

Thanks @stweil - following your comment I found DanBloomberg/leptonica#735

Happy to close this or proceed as you think.

@stweil
Copy link
Member

stweil commented Oct 28, 2024

This is a duplicate of issue #4233. I am afraid that people will have the same problem again and again, unless we provide either a more detailed error message or a fix.

@amitdo
Copy link
Collaborator

amitdo commented Oct 28, 2024

CC: @DanBloomberg

@DanBloomberg
Copy link

Yes, we've been here before! Several times, in fact. Trying to find something that works for MacOS has been the latest struggle in the "/tmp rewrite" saga. Within the last year we had a fairly extensive discussion about what to do with MacOS, and in the end we decided not to support its particular form of tmp directory.

Given how much effort has already gone into the rewrite code, perhaps we can consider two possible approaches:
(1) revisit the 'fix' we came up about a year ago for MacOS
(2) put an error message in an appropriate place -- likely in Tesseract, but additionally in leptonica -- that offers a work-around for MacOS

@amitdo
Copy link
Collaborator

amitdo commented Oct 29, 2024

+1 for option 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants