-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
original_path extraction error regarding LTCurve #1057
Comments
Hi @KaboChow, and thanks for providing this interesting example. It appears to relate to It seems that there's some discussion of this general issue here: pdfminer/pdfminer.six#861 (comment) As it happens, however, the piece of One solution would be to propose reverting the behavior so that it does not decompose complex paths, with the downside being that some clearly rectangle-like things do not get recognized as such. Another would be to tweak the behavior so that it mostly does not decompose complex paths except in the case of those composed entirely of rectangles. The downside would be that this may be a confusing rule, and also that some all-rectangle complex paths are still intending to be understood as shapes with holes in them. Thanks again. Will keep thinking on this, and welcome suggestions from others, too. |
@jsvine Thank you for your answer. |
Hello @jsvine, I found a problem regarding the 'evenodd' value of the object。 |
Thank you for these additional examples, @KaboChow. I'm still unsure of the best solution, given the tradeoffs described above and that any changes would have to be made to |
Thank you @jsvine. The incorrect value of 'evenodd' has a great impact on my project. I have been looking for ways to solve it recently. If there is a new solution, please be sure to notify me |
See discussion at pdfminer above. The issue is that pdfminer doesn't apply any fill rules in layout analysis. Ideally, you should be looking at the |
You're right @dhdaines, the "fill" property is a good way to determine whether it is a hole, but the "fill" property value of the LTCurve child object that is currently split out is inherited from the parent object, obviously the "fill" property value of the LTCurve child object is incorrect, and my ability is limited and I can't solve this problem, so I finally chose to clear the rule of splitting the LTCurve shape |
During the process of extracting shape data from a PDF, I converted the created text letter 'o' into a shape object.
Here is the curve data I obtained.
Normally, there should only be one set of curve data.
However, it seems that there are two in this case. Here is the graphic created on the canvas using the obtained data:
The filling color obtained for the second set of curve data is incorrect.
This is the PDF I conducted the test on:
LTCurve.pdf
Is there any way to resolve this?
Thank you very much.
The text was updated successfully, but these errors were encountered: