You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I am trying to parse md files for chunking, first i have used partition_md but due to many open issues related to it i was not able to parse my md file directly so i parsed md file using markdown-it and then used partition_html. The issue i am facing is that strong tag within the paragraph tag is considered a title by partition_html which is a problem when chunking_by_title. To Reproduce
``
from unstructured.partition.html import partition_html
import json
text = "
Example:
"
elements = partition_html(text=text)
element_dict = [el.to_dict() for el in elements]
print(json.dumps(element_dict,indent=2)) ``
Expected behavior
it should not be parsed as title it should be parsed as NarrativeText
Screenshots
code
output
Environment Info
Name: unstructured
Version: 0.16.11
Python 3.11.9
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Describe the bug
I am trying to parse md files for chunking, first i have used partition_md but due to many open issues related to it i was not able to parse my md file directly so i parsed md file using markdown-it and then used partition_html. The issue i am facing is that strong tag within the paragraph tag is considered a title by partition_html which is a problem when chunking_by_title.
To Reproduce
``
from unstructured.partition.html import partition_html
import json
text = "
Example:
"elements = partition_html(text=text)
element_dict = [el.to_dict() for el in elements]
print(json.dumps(element_dict,indent=2)) ``
Expected behavior
it should not be parsed as title it should be parsed as NarrativeText
Screenshots
code
output
Environment Info
Name: unstructured
Version: 0.16.11
Python 3.11.9
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: