I am a leader by default, because nature abhors a vacuum. (Desmond Tutu)

July 01, 2009

Movable Type rich text editors and stripping weird tags

Another part of day 3 -

We're going through Movable Type training again today (thanks Natalie!) And the question came up, because it's going to be part of the process, what are people going to do so that the workflow of someone pasting in what they write in some other text editor gets published in such a way that it doesn't include weird tags, weird HTML, or funky or smelly javascript. Microsoft Word is one version of that process; there's lots more paths that get there.

I know that when I used to use Typepad, that the rich text editor I had was pretty good about stripping out tags and attributes that were weird. Font tags disappeared, for instance, which made it possible to be sure that stuff didn't show up as Comic Sans unless I really wanted them to be in Comic Sans.

The Rich Text Editor produces tons of ugly fucking code behind the screen, especially when using Safari. So it's an old-school editor such Midas, only a slight improvement over WYSIFUC in the sense that it hides it from you (unless you select "none" in the Format menu or "HTML Mode" and scream in horror at the FUC
it generates, like <span class="Apple-style-span" style="font-style:
italic;">...</span> instead of <em>...</em>!). For
example, on Safari it generates a DIV per paragraph, while on Firefox
it just spits out <br> tags or cosmetic <i> or <b>
tags (not even <br />, or <strong> or <em> tags, mind
you).

And generally that's my experience to date; weird tags, bad semantics, and unpredictable behavior. What I don't know yet (and hope to figure out) is what the data path is through the system that would let things be cleaned up and filtered on the back end so that they are consistent, or fixed on the front end so that the generated code from the rich text editor is better.

The funny thing is, this used to work better: old versions of Typepad, and old versions of MT, both had a very conservative rich text editor that denatured the text quite a bit, removing a lot of tags and operating in "safe mode" much more so than "rich mode".