`<?xml-stylesheet ?>` element breaks relative xpaths #81

uhlikfil · 2024-11-10T09:57:20Z

The same XPath query with a relative path from the root child node does not return results if there is a <?xml-stylesheet ?> tag present in the XML document.

Minimal repro code:

import elementpath
from lxml import etree

# xml1 contains the xml-stylesheet tag
xml1 = b"""<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type='text/xsl' href='test.xsl'?>
<root>
    <first>
        <second>
            value
        </second>
    </first>
</root>
"""

# the same as xml1, but without the xml-stylesheet tag
xml2 = b"""<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
    <first>
        <second>
            value
        </second>
    </first>
</root>
"""

root1 = etree.XML(xml1)
root2 = etree.XML(xml2)
query = "first/second"

elementpath.select(root1, query)
# returns []
elementpath.select(root2, query)
# returns [<Element second at ... >]

Is that expected? Why is it happening?

The text was updated successfully, but these errors were encountered:

brunato · 2024-12-15T20:40:53Z

Hi,
if you parse with etree.XML API the result tree in general is a fragment (an XML without a document node, an ElementTree instance in this case, that is wrapped in a DocumentNode instance). But xml1 has a root sibling other than the XML standard declaration, so it's interpreted as a document. You can force to skip the PI sibling providing fragment=True, so the result will be the same:

 elementpath.select(root1, query, fragment=True))
 # returns [<Element second at ... >]

Anyway the behavior may be not as intended by the argument description:

:param fragment: if `True` a root element is considered a fragment, if `False` \
a root element is considered the root of an XML document. If `None` is provided, \
the root node kind is preserved.

so something have to be fixed, at least when fragment is False or None.

thank you

brunato · 2024-12-21T08:30:31Z

A fix for fragment argument usage is available with v4.7.0. The default is changed to None, providing False a document node part is added to the tree.

For default the root node kind is not changed, except the cases like xml1 with lxml, where an effective document part is added, if you not provide fragment=True.

This default behavior with lxml could be changed, but with the drawback that root siblings can't be selected (in this case an explicit fragment=False will be needea).

Waiting for a feedback on this or close the issue.

Thank you

uhlikfil · 2024-12-30T09:31:34Z

Just to be clear. Given an XML containing a PI.

If I want to use a relative query starting from the root (e.g. first/second), I need to set fragment=True. However, with fragment=True I am not able to select the root node (e.g. //root)? Is there a way to make both cases work? The lxml Element.xpath method works in both cases:

import elementpath
from lxml import etree

xml = b"""<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type='text/xsl' href='test.xsl'?>
<root>
    <first>
        <second>
            value
        </second>
    </first>
</root>
"""

root = etree.XML(xml)
relative_query = "first/second"
root_query = "//root"

elementpath.select(root, relative_query, fragment=True)
root.xpath(relative_query)
# both return the same element now thanks to the fragment changes

elementpath.select(root, root_query, fragment=True)
# returns []
root.xpath(root_query)
# returns [<Element root at ...>]

brunato · 2025-01-06T09:38:58Z

Hi,
a fragment doesn't have a root document so an absolute path (/ or //) forcedly goes on root's children. Lxml in this case consumes the document position, like elementpath does for non-fragments.

If you have to use both relative and absolute paths a solution is to provide item=root argument to selector, that keeps the XML tree as a document but set the initial item position to the root Element instead of document.

import elementpath
from lxml import etree

xml = b"""<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type='text/xsl' href='test.xsl'?>
<root>
    <first>
        <second>
            value
        </second>
    </first>
</root>
"""

root = etree.XML(xml)
relative_query = "first/second"
root_query = "//root"

res1 = elementpath.select(root, relative_query, item=root)
res2 = root.xpath(relative_query)
assert res1 == res2 == [root[0][0]]
# both returns [<Element second at ...>]

res1 = elementpath.select(root, root_query, item=root)
res3 = elementpath.select(root, root_query)
res2 = root.xpath(root_query)
assert res1 == res2 == res3 == [root]
# all returns [<Element root at ...>]

uhlikfil closed this as completed Dec 30, 2024

uhlikfil reopened this Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`<?xml-stylesheet ?>` element breaks relative xpaths #81

`<?xml-stylesheet ?>` element breaks relative xpaths #81

uhlikfil commented Nov 10, 2024 •

edited

Loading

brunato commented Dec 15, 2024

brunato commented Dec 21, 2024

uhlikfil commented Dec 30, 2024

brunato commented Jan 6, 2025

<?xml-stylesheet ?> element breaks relative xpaths #81

<?xml-stylesheet ?> element breaks relative xpaths #81

Comments

uhlikfil commented Nov 10, 2024 • edited Loading

brunato commented Dec 15, 2024

brunato commented Dec 21, 2024

uhlikfil commented Dec 30, 2024

brunato commented Jan 6, 2025

`<?xml-stylesheet ?>` element breaks relative xpaths #81

`<?xml-stylesheet ?>` element breaks relative xpaths #81

uhlikfil commented Nov 10, 2024 •

edited

Loading