-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix viewer.py Adapt page size on the fly if necessary #2492
base: master
Are you sure you want to change the base?
Conversation
It fixes the issue of documents with page sizes different from the first one, these might not be viewed correctly afterwards since the viewer assumes all pages being the size of the first page. This is a typical scenario in documents where a first page with extra information is added, for instance on exporting this PDF https://eudml.org/doc/202780 (Access to full text + Export)
The approach in this PR renders the content of all pages in bounding boxes of the same size, choosing a bounding box large enough to fit the largest page. This means that, for smaller pages, very large and usightly page gaps are produced. For example compare the output produced with this document by this approach when compared to choosing a bounding box for each page that exactly fits the page, as is done in PR #2480. Result when using this PRResult when using PR #2480(Disclaimer: I am the author of PR #2480) |
Hi Jorge, testing your approach #2480 "choosing a bounding box for each page that exactly fits the page" with the example pdf, I see that the zoom changes on each page in order to do that fit of the page. But I don't think that this decision should be taken automatically. If the PDF is designed with pages of different sizes for whatever reason this should be kept in that way. Both Chrome and Edge browsers keep the different sizes of the pages in the example pdf, but certanly in contrast to my approach (#2492 ) they center the pages and eliminate the gaps. |
Hi Alejandro, Thank you for taking the time to look at this.
In PR #2480 The zoom changes automatically when the pdfviewer is set to "fit page width" (the initial setting of the pdfviewer widget) or "fit page height" . In that case, the viewer adjusts the zoom for each page so the page fits the width or height as intended. When the pdfviewer is set to display at a constant zoom then the zoom stays at the target zoom and each page is scaled to that level of zoom. IMHO this is the intended and most correct behavior. In all cases the page gap is adjusted to not leave extraneous space between pages. If other people are interested in this topic, I think it would be helpful to get them to test both approaches and get their opinion. Also, if anyone has real-world examples of pdf's with different page sizes that can be made publicly available, that would be helpful. (I have such examples, hence the motivation for my PR, but they contain private patient data and I cannot share them) |
Hi Jorge,
In general, not only your change but in all try / except AttributeError in viewer.py a "except:" line should be added with the right action, for instance in RenderPage it could be simply return
regarding the adaption to the zoom on each page, testing your example pdf in other viewers all them in "page width mode" take the maximum width in the document and show all pages centered to it, which I think is the right approach.
by now personally I will only work on pdfViewerMuPDF.py and I will publish it in my fork when finished. |
Hello Alejandro, Thank you again for your time. I am glad we have reached consensus. I wonder what is the best link to move/copy the discussion to PR #2480 since we have agreed that is the approach we would like to merge. Being able to handle partially broken documentsI think this would be a good feature. And from your comments, it seems its implementation is straight forward. I can add support for it in PR #2480 before it gets merged or later in a separate PR if it gets merged before then. Could you add there a link to the broken document(s) you used for testing this, if they are public, so I can test this new feature when the user navigates to broken pages, or scrolls around broken pages? Thank you. Dropping support for PyPDF2I don't use PyPDF2 either. The last time I tried was four year ago and I could not get it to work either. But I just checked and the library seems to have a new version that is actively developed: https://github.com/py-pdf/pypdf, so perhaps before considering to drop support for it, we should upgrade to the latest version and try it. Regardless, I think this work should be done as part of a separate PR though. |
It fixes the issue of documents with page sizes different from the first one, these might not be viewed correctly afterwards since the viewer assumes all pages being the size of the first page. This is a typical scenario in documents where a first page with extra information is added, for instance on exporting this PDF https://eudml.org/doc/202780
(Access to full text + Export)
Related with pull request #2480 "Add support to python only pdfviewer for displaying pdf files where not all pages are the same size"
but with the adaption solution it is not needed that all pages of the document are computed in advance