For HTML

html=page=(E.html(# create an Element called "html"E.head(E.title("This is a sample document")),E.body(E.h1("Hello!",CLASS("title")),E.p("This is a paragraph with ",E.b("bold")," text in it!"),E.p("This is another paragraph, with a","\n ",E.a("link",href="http://www.python.org"),"."),E.p("Here are some reservered characters: <spam&egg>."),)))

For meta-programming

The Reason

— Oh, so we have this markup that we can generate simply using string concatenation? Yeah, let’s just do that.

— But if some string contains <img/src/onerror=alert(1)>, it will cause JavaScript code execution!

— Ugh, let’s write the htmlspecialchars function and every time we put some string into HTML, pass the string through this function…

— But if you write "<img class=" + htmlspecialchars($_GET['cls']) + " src=dot.gif>", you can still inject JavaScript!

— But only an idiot would write that code.

— What should we do if our code emits invalid HTML in the first place, without string substitution?

— Just write it carefully next time.

So here we are, generating HTML using string concatenation.

Of course, if you are being really, really careful, you can emit valid HTML and SQL with no injection problems.

But you don’t use manual memory management and pointer arithmetics when you generate your web-site, do you?

Why would you walk on a razor’s edge if you can program in a safe way?

Browsers

There is, of course, a funnier side to this story: browsers.

Browser vendors had to tune their parsers for broken HTML because many web-sites gave the browsers invalid HTML.
People chose browsers that were able to process nearly arbitrary chunks of bytes as HTML because these browsers were able to display their favorite web-sites.

Browser vendors had to implement XSS filters, because web-sites are prone to putting raw user requests straight into HTML.
The rationale for XSS filters is simple: if browser vendor can prevent 90% of XSS attacks on the user, the user will be happy.

These two examples are browsers dealing with symptoms of the problem, and not with the problem itself.
The problem is in the head of a programmer who thinks that it is reasonable to generate dynamic HTML using string manipulation.

Conclusion

The state of HTML as a structured data format is terrible because people started manipulating it as a string from the very beginning.
There are many problems (including, but not limited to: XSS, invalid HTML, browser parsing differences) caused by this mistreating of HTML format.

Maybe, just maybe, if available tools did not encourage people to generate HTML as a string, the web would be a better place.

Maybe, just maybe, if we chose a different serialization format for documents on the web, we would not treat it as a string that can be written using printf.

We would definitely have less vulnerabilities if programmers did not think that constructing structured data formats using string functions is an acceptable idea.