Tuesday, October 26, 2004

The Triumph of Semantic Markup

Semantic markup is the new big thing, the semantic web and all that. I see it less as a cool new thing and more of a returning to our roots,or at least the roots of the web as I saw them when it first started in the early 90s.

Back in the early days of the web (1993), I had a mantra I used when explaining HTML markup to people "Semantic, not Literal". That is, an html H1 tag does not mean "bigger font". It means "1st level heading". HTML markup was meant to be semantic, and not for any aesthetic reason. It wasn't even for the reason of browser independence (though, had people stuck to semantic markup, this whole WML thing would have been simpler). It was for the reason that semantic markup is much more powerful from an information access point of view. Specifying a font color is nice, but it came at a cost of semantic ambiguity. Of course, HTML was never really fully semantic markup, it was just an appealing dream. People fell in love with the FONT tag and a variety of other literal markup tags and mechanisms. The hope for semantic markup was lost, and HTML became very much a literal presentation markup language.

Then, to add insult to injury, things like DHTML came along, flaunting the fact that HTML had become a presentation markup language, not a semantic markup language. I resisted learning DHTML for a while because of this. Once you give in and accept HTML as a presentation language, DHTML is sort of neat. When I first learned about CSS, I had some hope that perhaps it restored some of that semantic markup quality. Sadly, it did not. It does do a decent job of abstracting out "style", but HTML is still essentially purely presentational. CSS allows it to presented in a variety of styles, though, and it's nice having a relatively universal standard for those style details.

But looking at the web now, the power of semantic markup is winning out again. href=http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html>RSS has become an extremely rapidly growing and popular scheme for information delivery. RSS is a pure semantic markup with all the presentation details left up to HTML. I currently read well over 100 RSS feeds, only a fraction of which are from blogs which prompted the success of RSS. This blog itself is accessible via RSS and many of the readers access it that way. I read the comics, the Times, a grab bag of web sites, a few search engine queries, and even a mailing list, all via RSS. It's very gratifying to see "semantic, not literal" finally getting a lot of traction.

Two other random thoughts on RSS: I expect it won't be long before some RSS feeds start including advertising entries. It surprises me I haven't really seen them yet. I just hope when the time comes it isn't overwhelming. There are some feeds I would keep reading even with a reasonable dose of ads. Most, however, I'd stop reading if ads started being included. Until then, it's a nice little garden. The other thought is that blogs, as the progenitor of RSS, share other qualities with the early days of the web. In the early days of the web, the thing to do was have your own personal homepage you created. Rapidly, as most people had nothing they wanted to put on such a page, having a homepage became something that while many people had them, it was more commonly something a university or corporation had, not an individual. Further, in the early days of the web, the goal (as much as there was a "goal") was to share information. You weren't trying to drive banner hits, collect demographic data or derive revenue. While some (many?) blog writers may now have those goals, the biggest goal I've observed is wanting to be read. RSS is a great way to make it easier to be read.

(Footnote: If you're looking for a great way to read RSS feeds, there are a variety of href=http://www.google.com/search?hl=en&ie=ISO-8859-1&q=rss%20aggregator&btnG=Google+Search>RSS aggregator applications, but I will highly recommend Bloglines as an outstanding web based reader.)

Personal

Professional

I am a Engineering Director at Google. My team and I work on Search.

Previously, I was the CTO at an 802.11 location and security company, Newbury Networks in Boston. In June, 1999 I received my Masters degree from the MIT Media Lab. I graduated from MIT (undergraduate) in June, 1997, in physics. Prior to that I was CTO of net.Genesis from 1994 to 1996.