Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting error SEVERE: Cannot read JBIG2 image: jbig2-imageio is not installed #117

Open
pgarz opened this issue May 27, 2021 · 1 comment

Comments

@pgarz
Copy link

pgarz commented May 27, 2021

Describe the bug
A clear and concise description of what the bug is.

I'm getting the following stack trace error when running pdftotree on a PDF that contains scientific chemical information:

SEVERE: Cannot read JBIG2 image: jbig2-imageio is not installed
[DEBUG] pdftotree.TreeExtract - Tabula recognized 0 table(s).
Traceback (most recent call last):
  File "/opt/anaconda3/envs/noble_app_env/bin/pdftotree", line 94, in <module>
    args.visualize,
  File "/opt/anaconda3/envs/noble_app_env/lib/python3.7/site-packages/pdftotree/core.py", line 66, in parse
    pdf_html = extractor.get_html_tree()
  File "/opt/anaconda3/envs/noble_app_env/lib/python3.7/site-packages/pdftotree/TreeExtract.py", line 319, in get_html_tree
    page.appendChild(table_element)
  File "/opt/anaconda3/envs/noble_app_env/lib/python3.7/xml/dom/minidom.py", line 114, in appendChild
    if node.nodeType == self.DOCUMENT_FRAGMENT_NODE:
AttributeError: 'NoneType' object has no attribute 'nodeType'

I've installed the latest Java version for Mac OS X. pdftotree seems to work just fine on simple PDFs. I've also haven't been able to figure out how to even attempt trying to install jbig2-imageio manually. I'm not familiar with how to install that JAR file into the pdftotree installation

To Reproduce
Steps to reproduce the behavior:

  1. Install the Java JDK for Mac OSK
  2. Install ImageMagick with brew
  3. Attempt to run hOCR extraction with pdftotree on a file with chemical molecule images

Expected behavior
A clear and concise description of what you expected to happen.

For the proper hOCR output to be generated and for the command to execute successfully

Error Logs/Screenshots
If applicable, add error logs or screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: Mac OS X 10.15
  • pdftotree Version: [e.g. v0.5.0]
  • pdfminer.six Version: [e.g. 20201018]

Additional context
Add any other context about the problem here.

@redbrain
Copy link

Same here, any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants