You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ok, this is a bit complicated because the Document AI Custom Splitter specifically detected those two "form1" entries as separate documents.
If we combine them together by default, it could create ambiguity when there are multiple separate documents of the same type in a file.
We could create a parameter like combine_like_document_types or something like that, but I think this issue would be best resolved on the Custom Splitter itself.
holtskinner
changed the title
split_pdf splits too much, since it does not take into account that different entities might have same type (but different confidence)split_pdf splits too much, since it does not take into account that different entities might have same type (but different confidence)
Jul 15, 2024
Here is entities example returned from splitter:
In this case we see that all pages are actually of same type and we should not split. However document.Document.split_pdf would not detect that.
The text was updated successfully, but these errors were encountered: