Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Warn on unicode decoding errors in PDF annotations #1195

Merged
merged 5 commits into from
Dec 9, 2024

Conversation

stolarczyk
Copy link
Contributor

@stolarczyk stolarczyk commented Aug 30, 2024

In certain scenarios, annotations may include invalid or extraneous data that can obstruct the annotation processing workflow. To mitigate this, the warn_unicode_error parameter in the PDF initializer and the .open() method provides a configurable option to bypass these errors and generate warnings instead, ensuring smoother handling of such anomalies.

Example warning, if activated:

UserWarning: Could not decode contents for annotation. Annotation contents will be missing.

in some cases the the annotations may contain some junk that hinders annotations processing altogether. This allows to ignore the error and warn instead, which is configurable via warn_unicode_error arguments in the PDF initializer and/or open() method.
@stolarczyk stolarczyk force-pushed the warn-on-unicodedecodeerror branch from 37a8e39 to 396c5e3 Compare August 30, 2024 08:24
@jsvine
Copy link
Owner

jsvine commented Sep 9, 2024

Hi @stolarczyk, and thank you for this suggestion. It makes sense to provide such warnings, although I'd lean toward a more generalizable approach rather than specifying parameters for each type of warning. To that end, I'm more inclined to use Python's built-in warning filtering. I'm open to other opinions, though. What do you think?

@stolarczyk
Copy link
Contributor Author

stolarczyk commented Sep 10, 2024

thanks for looking into this, @jsvine!

I’m not entirely clear on how your idea regarding built-in warning filtering would address the issue I'm focusing on. The proposed change turns an exception (UnicodeDecodeError) into a warning, which prevents the PDF processing from crashing entirely. So the warning is the result, not the issue at hand.

@jsvine
Copy link
Owner

jsvine commented Oct 3, 2024

My apologies for the misunderstanding! I think the name of the proposed parameter threw me off, but I also should have looked more closely. I think I understand it now. This proposal makes sense, though what about tweaking the name?: raise_unicode_errors=True/False?

additionally change the default accordingly
@stolarczyk
Copy link
Contributor Author

Thanks for the suggestion. Just renamed it.

@jsvine
Copy link
Owner

jsvine commented Nov 22, 2024

Thanks, @stolarczyk — I've pushed a small tweak, above, so that the linter is happy. But looks like we're missing a bit of test coverage:

@stolarczyk
Copy link
Contributor Author

@jsvine, the test has been added.

@jsvine jsvine merged commit 871770a into jsvine:stable Dec 9, 2024
5 checks passed
jsvine added a commit that referenced this pull request Dec 9, 2024
@jsvine
Copy link
Owner

jsvine commented Dec 9, 2024

Thanks! Merged into develop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants