You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We consider the array layout_dets and for each element we check the presence of the text key. If the key is present, we extract the text with its corresponding poly, which encodes the position information: coordinates (x,y) for top-left, top-right, bottom-right, bottom-left corners of the bounding box.
We extract the same information from the Megaparse output, i.e. for each page we extract the text and and the bounding box, group the pages per block category, document type, language, layout type, and:
in each page, find the matching bounding boxes (within some errors). We want to compare texts for matching bounding boxes, otherwise we can consider that the problem is in the layout detection step.
From the experiment tracker, retrieve the JSON containing the ground-truth layout, which, for each PDF page, looks like
We consider the array
layout_dets
and for each element we check the presence of thetext
key. If the key is present, we extract the text with its correspondingpoly
, which encodes the position information: coordinates (x,y) for top-left, top-right, bottom-right, bottom-left corners of the bounding box.We extract the same information from the Megaparse output, i.e. for each page we extract the text and and the bounding box, group the pages per block category, document type, language, layout type, and:
We can also have compute the metrics above across all document types.
The text was updated successfully, but these errors were encountered: