You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a document of 39 pages, the orientation is portrait for 29 pages and landscape for 10 others. The text itself is normal (vertical, not rotated) only the orientation is different. Docling doesn't read the landscape pages. All pages have tables in them, tables are not read correctly either. However, for portrait pages, tables are read fine.
Steps to reproduce
A PDF file that has multiple orientations, one portrait and one landscape. then convert PDF to markdown.
Docling version
2.8.3
Python version
3.10.14
The text was updated successfully, but these errors were encountered:
Hi Nikos
have the same problem.
I provide you an example for a landscape pdf. Some pages are working fine, others are not working at all. Marketing.pdf
After checking closer, @JeandeBalzac your issue does not appear to be connected to portrait layout. It is simply because there are many elements identified as figures, and these will export as bitmap resources in the markdown / HTML. The contained text elements of figures are in the JSON representation of the DoclingDocument but not exported to the other formats by default.
Hi. Yes we are aware, that the pages are included as images. However, our goal is to extract text and not images. Therefore, this is still a bug for us.
I can also provide another landscape pdf, which is messed up quite a bit. We analyzed the problem. x and y are changed, when landscape and x works differently. X increases from right to left and not as usal in portrait form left to right. Moreover, top-left point is no longer top-left. The same is true for right-bottom point.
Bug
I have a document of 39 pages, the orientation is portrait for 29 pages and landscape for 10 others. The text itself is normal (vertical, not rotated) only the orientation is different. Docling doesn't read the landscape pages. All pages have tables in them, tables are not read correctly either. However, for portrait pages, tables are read fine.
Steps to reproduce
A PDF file that has multiple orientations, one portrait and one landscape. then convert PDF to markdown.
Docling version
2.8.3
Python version
3.10.14
The text was updated successfully, but these errors were encountered: