Implement HTML escaping for arbitrary string input #31

guidedways · 2018-05-10T18:30:28Z

This looks like a powerful library to navigate around HTML nodes, however what would be the simplest method of obtaining cleaned up 'plain text' from HTML input? I'd like it to preserve any 'invalid' non-html tags such as John Do <john@do.com> and not try and parse it as NSAttributedString's initWithHTML does.

The text was updated successfully, but these errors were encountered:

guidedways · 2018-05-10T18:40:40Z

Okay the following seems to fail

let element:HTMLElement = HTMLElement(tagName: "div")
element.innerHTML = "This is an <b>email</b>: John Do <john@do.com>"
print("\(element.textContent)")

outputs: This is an email: John Do

What do I have to do to make this work so that it ignores anything that doesn't look like HTML?

iabudiab · 2018-05-10T20:34:48Z

@guidedways Hey there. Let me see if I understood you correctly.

You want to input a HTML string and have all HTML tags stripped, as in This is an <b>email</b>: John Do <john@do.com> should return This is an email: John Do <john@do.com>?

If so, then the easiest way to do it, is to escape all HTML reserved characters to prevent interpreting them as HTML. In your case:

let element: HTMLElement = HTMLElement(tagName: "div")
element.innerHTML = "This is an <b>email</b>: John Do &lt;john@do.com&gt;"
print("\(element.textContent)")
// This is an email: John Do <john@do.com>

Some Details

innerHTML in HTMLKit behaves like it would in a browser, i.e. it sets the HTML content of an element to the string that is passed. The string is then interpreted as a HTML fragment and is parsed inside the element as its parent context.

What does it mean? Well, your input gets parsed to this DOM:

<div>This is an  <b>email</b>: John Do <john@do.com></john@do.com></div>

Take a look here for more info: MDN Element.innerHTML

Does this answer you question? Do you have any followup questions?

guidedways · 2018-05-10T20:37:15Z

Yes that is the output I'm after, but I am not in control of the string being received from the user. It could be anything <some strange non-html tag>. I need the library to be able to do this for me so I can escape < as <. Can HTMLKit find and escape non-html 'tags' for me?

guidedways · 2018-05-10T20:40:03Z

I should explain. I'm receiving input directly from the user as notes. The notes could be actual HTML or could be partial / invalid HTML. There's no way to tell since they're free to type in whatever they wish. What I need to do is be able to parse HTML and extract the plain text version of whatever they entered, however I need to retain any such odd entries, links etc that aren't otherwise entered as HTML.

iabudiab · 2018-05-10T20:42:58Z

@guidedways I see, currently HTMLKit does not provide this functionality. I'll see if I could implement this in the next couple of days. Will let you know as soon as I have something.

I'll rename the issue then and mark as feature request.

guidedways · 2018-05-10T20:57:01Z

Thank you, that would be extremely helpful!

iabudiab added the question label May 10, 2018

iabudiab changed the title ~~Plain text?~~ Implement HTML escaping for arbitrary string input May 10, 2018

iabudiab added the feature request label May 10, 2018

iabudiab mentioned this issue May 13, 2018

textContent strips <br/>s #32

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement HTML escaping for arbitrary string input #31

Implement HTML escaping for arbitrary string input #31

guidedways commented May 10, 2018 •

edited

Loading

guidedways commented May 10, 2018

iabudiab commented May 10, 2018

guidedways commented May 10, 2018

guidedways commented May 10, 2018

iabudiab commented May 10, 2018

guidedways commented May 10, 2018

Implement HTML escaping for arbitrary string input #31

Implement HTML escaping for arbitrary string input #31

Comments

guidedways commented May 10, 2018 • edited Loading

guidedways commented May 10, 2018

iabudiab commented May 10, 2018

Some Details

guidedways commented May 10, 2018

guidedways commented May 10, 2018

iabudiab commented May 10, 2018

guidedways commented May 10, 2018

guidedways commented May 10, 2018 •

edited

Loading