Is there a way to access the Image Input of the OCR? #696

AIMPED · 2025-01-07T13:51:00Z

AIMPED
Jan 7, 2025

Hello everyone!

Take this example:

from docling.document_converter import DocumentConverter

conv_res = DocumentConverter().convert("https://pdfobject.com/pdf/sample.pdf")
print(conv_res.model_dump()["pages"])

prints:

[{'page_no': 0, 'size': {'width': 612.0, 'height': 792.0}, 'cells': [{'id': 0, 'text': 'Sample PDF', 'bbox': {'l': 72.0, 't': 72.48399999999992, 'r': 252.648, . . .

My guess would be, that the PDF has been converted to an image with the given size and used as input for the OCR.

I would like to draw the bounding boxes on this image- hence the question, if I can access it anywhere.

TIA!

Answered by AIMPED

Jan 7, 2025

Well, I found a way to access the images, but is there a better way to do so?

from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption

pipeline_options = PdfPipelineOptions()
pipeline_options.generate_page_images = True

doc_converter = DocumentConverter(format_options={InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)})

conv_res = doc_converter.convert("https://pdfobject.com/pdf/sample.pdf")
image_strings = []
for k,v in conv_res.document.pages.items():
    image_strings.append(v.image.uri.unicode_string())

View full answer

AIMPED · 2025-01-07T16:04:33Z

AIMPED
Jan 7, 2025
Author

Well, I found a way to access the images, but is there a better way to do so?

from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption

pipeline_options = PdfPipelineOptions()
pipeline_options.generate_page_images = True

doc_converter = DocumentConverter(format_options={InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)})

conv_res = doc_converter.convert("https://pdfobject.com/pdf/sample.pdf")
image_strings = []
for k,v in conv_res.document.pages.items():
    image_strings.append(v.image.uri.unicode_string())

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to access the Image Input of the OCR? #696

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Is there a way to access the Image Input of the OCR? #696

AIMPED Jan 7, 2025

Replies: 1 comment

AIMPED Jan 7, 2025 Author

AIMPED
Jan 7, 2025

AIMPED
Jan 7, 2025
Author