-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement HTML escaping for arbitrary string input #31
Comments
Okay the following seems to fail let element:HTMLElement = HTMLElement(tagName: "div")
element.innerHTML = "This is an <b>email</b>: John Do <john@do.com>"
print("\(element.textContent)") outputs: What do I have to do to make this work so that it ignores anything that doesn't look like HTML? |
@guidedways Hey there. Let me see if I understood you correctly. You want to input a HTML string and have all HTML tags stripped, as in If so, then the easiest way to do it, is to escape all HTML reserved characters to prevent interpreting them as HTML. In your case: let element: HTMLElement = HTMLElement(tagName: "div")
element.innerHTML = "This is an <b>email</b>: John Do <john@do.com>"
print("\(element.textContent)")
// This is an email: John Do <john@do.com> Some Details
What does it mean? Well, your input gets parsed to this DOM:
Take a look here for more info: MDN Element.innerHTML Does this answer you question? Do you have any followup questions? |
Yes that is the output I'm after, but I am not in control of the string being received from the user. It could be anything |
I should explain. I'm receiving input directly from the user as notes. The notes could be actual HTML or could be partial / invalid HTML. There's no way to tell since they're free to type in whatever they wish. What I need to do is be able to parse HTML and extract the plain text version of whatever they entered, however I need to retain any such odd entries, links etc that aren't otherwise entered as HTML. |
@guidedways I see, currently I'll rename the issue then and mark as feature request. |
Thank you, that would be extremely helpful! |
This looks like a powerful library to navigate around HTML nodes, however what would be the simplest method of obtaining cleaned up 'plain text' from HTML input? I'd like it to preserve any 'invalid' non-html tags such as
John Do <john@do.com>
and not try and parse it asNSAttributedString
'sinitWithHTML
does.The text was updated successfully, but these errors were encountered: