-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MathML support in the HTML Sanitizer API #227
Comments
I see the current sanitization algorithms have a configurable on/off toggle for
Similarly to |
Right, I believe we should have both 1) a strict default subset implemented natively in browsers and 2) a way to relax it for web developers using the API. I haven't read the spec for a while, but that's how I had understood the situation for HTML. It would be good if MathML folks could spend some time to ensure this is the case for MathML too. |
I think that MathML in general is safe. Any element except maybe those who contain script oriented ones should be in the accept-list. |
As requested in the Math WG meeting on March 28, here is a link to the MathML-related CVEs on record, currently 8: NIST national vulnerabilities database, keyword MathML I remembered noticing back in the day that cases such as CVE-2021-38193 and CVE-2020-26870 appeared to be examples where switching between parsing contexts hosted exploits. This was the reason I flagged
|
Hello @fred-wang and all, we discussed the subject in the MathML-core meeting yesterday and I think that the following seems to have met everyone's agreement: We converged on the fact that the skeptiscism about the security of
Other than that we see sanitziation needs to wipe-out:
We have also considered it important that this issue carries a few examples of potentials that the sanitizer's inclusion of the MathML elements may bring. Finally, we have highlighted the potentials of TrustedTypes as an application that may be relevant for the sanitizers. But so far, I see this as a potential only. I would suggest that we request that the MathWG or Math CG be "called back" when TrustedTypes may intersect the sanitizer APIs beyond its current scope (which I understand to be a baseline converter to transform web-content in something that can be exchanged in a way considered safer further than the browser's current page). Do you agree with the approach proposed in the numbers 1 to 4. Then I suggest we go to the sanitizer API issues and make that suggestion as a safe list. thanks in advance. Paul |
Hi, Sorry for the late reply, I overlooked this was directed to me. In general I don't have strong opinion on this, the sanitizer API is implemented in a relatively part of browser code that is relatively independent from MathML rendering. It should be fine to go ahead and talk to the people working on the sanitizer API spec, finding a consensus there. I didn't check what was the latest status regarding non-HTML namespace. Probably the main thing to pay attention is that MathML Core is targetted for browsers while MathML Full is used in other applications. So we would need to decide whether we only accept MathML Core markup or allow MathML Full markup (with maybe more sanitization for security/privacy sensitive markup that will need to be figured out). 1-4 seems to be about things that are not in MathML Core. Note that Firefox's sanitization currently accepts content markup but at the cost of adding many atomic strings for each content MathML tag: https://bugzilla.mozilla.org/show_bug.cgi?id=1787594#c8 Regarding security/safety in browsers, the one I'm aware of are described in https://w3c.github.io/mathml-core/#security-considerations and https://w3c.github.io/mathml-core/#privacy-considerations ; in particular href is the one that can cause problems (unfortunately the discussions regarding its inclusion in MathML Core is on hold). Note also the case of maction statusline (whose support was removed from browsers). |
Your comment made me wonder why/how the elements annotation, annotation-xml and their container semantics made it into MathML Core. If they are to be useful, their contents should be able to survive sanitization, at least in some cases. If not, maybe they are better thought of as MathML Full elements? As a cross-spec thought: SVG has a construct similar to Content MathML is indeed the classic use of |
Maybe the right thing to do here is to be minimal first so that something comes through. The While I agree we should strive for something useful, and |
Per the working group meeting today, we resolved that we feel it is ok to begin with MathML-core and we'd like to move that forward |
Something near this was discussed in, with, and about the fediverse tools:
|
Here is my proposal:
I was unsure about two aspects:
Thanks for your feedback. |
I think all of the global event handlers (we just use the standard mixins) would be handled by If I'm honest though, I do think there's a kind of hard "race" between these two specs, bc in practice it probably has to force some conversations about things that Core Level 1 specifically punted on because we couldn't get the people talking - mainly, for example, links in MathML-Core (see whatwg/html#5248 (comment) and #29) Currently there are a few kinds of disagreements here in implementation see https://wpt.fyi/results/mathml/relations/html5-tree/link-color-001.tentative.html?label=master&label=experimental&aligned&q=mathml%20link for example) but also (unless this has changed) in what can be linked... It feels difficult to reason about standardized sanitization genuinely without addressing that. Also, |
From the comments on [Sanitizer's 103](https://github.com/WICG/sanitizer-api/issues/103] it seems that the more minimal we provide a set, the more chances it has to succeed. From your comment @bkardell , I seem to understand that it is unclear if links will be there or not. But can't we make a recommendation independently of this decision? I agree |
We have discussed the possible relations of the sanitizer-safe-list and the evolution of the last bits of specs with @bkardell and have come to the following (probably minimal) version of elements and attributes that should be processed by the sanitizer. We should discuss and decide on this list on the next MathML-core meeting at the end of January 2025. MathML Safe ListShort VersionMathML-core considers all elements and attributes of MathML-core (as listed in section 2.1 of MathML-core) as safe and not needing a sanitziation except the following elements. We recommend the Sanitzer API to sanitize MathML by keeping all elements and attributes except the follwing:
Detailed VersionMathML-core considers the following elements and attributes of MathML-core as safe and not needing sanitization: Safe "as-is" Elements of MathML-core: Attributes of MathML-core: Moreover, the following attributes have their syntax and semantics specified in the HTML specification. The sanitizer behaviour on these attributes should be as is done on HTML elements: The elements of MathML-core which need treatment by the sanitizers are the following:
|
I have created mathml-safe-list to track the evolution of this document and to consider for furhter inclusion. There was the error that the Both are done in the commit 1eb208 of the mathml-doc repository. |
What's wrong with
Sanitizing that away would quietly break content, wouldn't it? Shouldn't it get at least marked as deprecated if so? Edit: I see there is also a "compatibility-only" note here. So likely good enough, or only needing that note extended with a warning it may get sanitized away by default. |
@dginev see also WICG/sanitizer-api#103 (comment) - I'm not sure why it's not linked up already, but basically there are 3 not 2 variants here. One would be the 'default' which would remove |
At the meeting yesterday, the was shortly discussed and consensus emerged against removing |
Hi Paul,
Thanks for the update.
Regards
Louis Maher
Phone: 713-444-7838
Email: ***@***.***
From: Paul Libbrecht ***@***.***>
Sent: Friday, January 17, 2025 9:53 AM
To: w3c/mathml-core ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [w3c/mathml-core] MathML support in the HTML Sanitizer API (Issue #227)
At the meeting yesterday, the was shortly discussed and consensus emerged against removing mphantom as part of the sanitizer.
-
Reply to this email directly, view it on GitHub<#227 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AP4IYUMTGBIXVKI3LGJOD7T2LERMVAVCNFSM6AAAAABNP3K2ISVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJYGY2TSMJTGU>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.******@***.***>>
|
See https://wicg.github.io/sanitizer-api/
Some work has been done to hande mathml/svg namespaces but the spec should likely specify a default safelist, see WICG/sanitizer-api#103 (comment) (IIRC, the API allows web dev to accept more element/attributes that are not in the safelist, though)
So this issue is about discussing what we want to suggest as a default safelist for MathML.
In another issue, I had commented to try and follow MathML Core as much as possible as that's what browsers are expected to implement: WICG/sanitizer-api#167 (comment)
Some more comments:
Firefox has some safe list already but I guess it is not very strict, for example it still allows XLink href or content mathml markup. The bug is https://bugzilla.mozilla.org/show_bug.cgi?id=1787594
For Chromium, I don't remember without checking more. But probably it does not include more than what is in MathML Core, since we never implemented more.
I'm not sure if the sanitzer api is actually being implemented in webkit.
The text was updated successfully, but these errors were encountered: