need info working with `page.images` #1217

mratanusarkar · 2024-10-19T15:41:43Z

The current page.images[0] dump looks like:

{'x0': 37.4602, 'y0': 180.816, 'x1': 53.6929, 'y1': 196.9833, 'width': 16.2327, 'height': 16.16730000000001, 'stream': <PDFStream(24254): raw=213, {'BitsPerComponent': 1, 'DecodeParms': {'Quality': 65}, 'Filter': /'JBIG2Decode', 'Height': 34, 'ImageMask': True, 'Intent': /'RelativeColorimetric', 'Length': 213, 'Subtype': /'Image', 'Type': /'XObject', 'Width': 34}>, 'srcsize': (34, 34), 'imagemask': True, 'bits': 1, 'colorspace': [None], 'mcid': None, 'tag': None, 'object_type': 'image', 'page_number': 33, 'top': 647.7407000000001, 'bottom': 663.908, 'doctop': 27679.82469999998}
{'x0': 56.6807, 'y0': 471.272, 'x1': 317.3447, 'y1': 795.7760000000001, 'width': 260.664, 'height': 324.5040000000001, 'stream': <PDFStream(145): raw=47341, {'BitsPerComponent': 8, 'ColorSpace': <PDFObjRef:74050>, 'Filter': /'JPXDecode', 'Height': 676, 'Intent': /'RelativeColorimetric', 'Length': 47341, 'Subtype': /'Image', 'Type': /'XObject', 'Width': 543}>, 'srcsize': (543, 676), 'imagemask': None, 'bits': 8, 'colorspace': [[/'Separation', /'Black', /'DeviceCMYK', {'C0': [0, 0, 0, 0], 'C1': [0, 0, 0, 1], 'Domain': [0, 1], 'FunctionType': 2, 'N': 1, 'Range': [0, 1, 0, 1, 0, 1, 0, 1]}]], 'mcid': None, 'tag': None, 'object_type': 'image', 'page_number': 33, 'top': 48.94799999999998, 'bottom': 373.45200000000006, 'doctop': 27081.03199999998}

I need help working with this and extracting the image data. I would like to export it to a png image or use pillow. at this point, getting hold of the images in nay format would work, and I can convert and use it as desired.

could anyone help me get access to the image data from page.images? I am trying to extract and export all images, figures, diagrams from each page of a PDF.

#1207 helps a bit, but I am struggling with some errors and issues with that!

some insight on this might even encourage me or someone to write image handling class in pdfplumber.

Thanks!

The text was updated successfully, but these errors were encountered:

jsvine · 2024-11-22T01:51:10Z

Hi @mratanusarkar — can you provide a minimal, runnable Python script and PDF that reproduces the errors you're encountering?

mratanusarkar added the feature-request All feature requests receive this label initially, can be upgraded to "enhancement" label Oct 19, 2024

Repository owner deleted a comment Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

need info working with `page.images` #1217

need info working with `page.images` #1217

mratanusarkar commented Oct 19, 2024

jsvine commented Nov 22, 2024

need info working with page.images #1217

need info working with page.images #1217

Comments

mratanusarkar commented Oct 19, 2024

jsvine commented Nov 22, 2024

need info working with `page.images` #1217

need info working with `page.images` #1217