Let's say I allow users to enter raw HTML in a web page. Let's say users are stupid and don't know how to write validate HTML. How can I clean up and simplify their code?

Examples:

<b>lorem</b><b> ipsum</b>
is simplified in

<b>lorem ipsum</b>
---

<b>lorem <i>ipsum</b> dolor</i>
is cleaned up in

<b>lorem <i>ipsum</i></b> dolor (or something similar).

Is there any Ext JS function or plugin to do that? Or any external JS library?
I've been trying to make my own algorithm but it's not really trivial...

skirtle

30 Nov 2011, 12:31 PM

Please give your threads meaningful titles. 'mp' doesn't give much of a hint what the thread is about.

What about just setting it as the innerHTML of a DOM node then reading it back out again?

Araberen

30 Nov 2011, 12:47 PM

Sorry... I put a good title, but something has probably happend with my keyboard then... ^^ (and now I can't edit the post title).

I seems to work to clean up the code. At least, when I'm typing <b>lorem <i>ipsum</b> dolor</i>, my Chrome inspector elements creates a good HTML tree. So I guess I could read it from the DOM and get the tags in the right order.

But I also (and mostly) want to simplify the code. Stuff like <b><b>OK</b></b> shouldn't exist...

skirtle

30 Nov 2011, 1:08 PM

Technically there's nothing wrong with nested <b> tags. Given suitable CSS the inner tag could easily be styled differently from the outer tag. While I understand where you're coming from, the requirement not to have nested <b> tags isn't really part of HTML, that's a requirement you've layered on top. As such I suspect you'll struggle to find a library to do it.

My first thought for how I'd implement this is also to use DOM nodes. That avoids any issues with invalid HTML and you can then navigate the tree and make your modifications before reading back the contents.

Given your markup rules don't appear to match the rules of HTML you might want to consider using an alternative markup format that can be converted to HTML, like the one used by Wikis. Conventions like using stars to surround bold text avoid the nesting issue as the opening and closing tag are identical.