Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with pypdf 5 #8

Open
wyatt-wong opened this issue Jan 16, 2025 · 7 comments
Open

Compatibility with pypdf 5 #8

wyatt-wong opened this issue Jan 16, 2025 · 7 comments

Comments

@wyatt-wong
Copy link

I got this error when I execute:

python pypdf_strreplace.py --input pdfs/Inkscape.pdf

error.txt

@hoehermann
Copy link
Owner

hoehermann commented Jan 16, 2025

Thank you for the report. Unfortunately, it does not happen on my machine. Please check your version of pypdf: python -c 'import pypdf ; print(pypdf.__version__) It should be 4.0.2.

@wyatt-wong
Copy link
Author

python -c 'import pypdf ; print(pypdf.__version__)

Looks like your code need to update to support latest pypdf. I am using pypdf 5.1.0

python -c 'import pypdf ; print(pypdf.version)'
5.1.0

@hoehermann hoehermann changed the title Error in getting lines from pdfs/Inkscape.pdf Compatibility with pypdf 5 Jan 17, 2025
@hoehermann
Copy link
Owner

Should be working now with 6e37bb4.

@wyatt-wong
Copy link
Author

Should be working now with 6e37bb4.

Thanks. I found it work now. But there is another problem.

The PDF file I used to replace text have size of 14,308,334 bytes. After replaced the text, I found the file size growth to 18,296,859 bytes which is about 28% increase in file size.

@hoehermann
Copy link
Owner

For editing the text, the tool needs to uncompress the data streams. I guess, they are not automatically compressed when writing the output file. You may try running the output through qpdf. qpdf's option compress-streams is enabled by default. This compression method is lossless.

@wyatt-wong
Copy link
Author

wyatt-wong commented Jan 25, 2025

For editing the text, the tool needs to uncompress the data streams. I guess, they are not automatically compressed when writing the output file. You may try running the output through qpdf. qpdf's option compress-streams is enabled by default. This compression method is lossless.

I have no idea what you are talking about qpdf. I just follow your sample command line below:

pypdf_strreplace.py --input mypdf.pdf --search "some value" --replace "another value" --output output.pdf

Am I responsible for compressing the modified data stream and output it to another PDF file by myself ? Or is that the job of your script to handle the compression of the modified data stream ?

@hoehermann
Copy link
Owner

I added the option --compress for you. Thanks to pypdf, this was trivial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants