"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for January, 2004 - Dare Obasanjo's weblog

Download it from here. Differences between v1.2.0.61 and v1.2.0.89 below

FEATURE: Support for customizing position of Reading Pane to give Outlook 2003 style reading pane layout. This is located under the View menu.

FEATURE: One can search downloaded feeds using full text, regular expression, XPath or date based searches. Searches can be saved in persistent "Search Folders". By default, an "Unread" items search folder is created on installation.

FEATURE: Visual indication provided when downloading feeds and when errors occur when processing a feed.

FEATURE: Popup (Toast) windows displayed when new items received for a feed. By default this feature is off but can be enabled on a per feed basis.

FEATURE: Enhanced the Web-Search-Engines dialog, to support Web search engines that produce an RSS feed such as Feedster. Issuing such a search will result in a temporary search folder, that display the items return in the listview.

FEATURE: All links to RSS feeds found on visited web pages are provided in a drop down list on the toolbar to enable easy subscribing.

FEATURE: Support for uploading/downloading feed subscription list as OPML to and from a dasBlog 1.5 weblog.

FEATURE: Support for importing feeds from an OPML file given a URL.

FEATURE: drag/drop of a feed link to the tree view now tries to gather all information required to subscribe to the feed (no dialog on success, displ. the new feed dialog on errors e.g. feed needds authentication)

FEATURE: double-clicking on a node now opens the feed's home page.

FEATURE: Added Copy to context menu of items in list view and feeds in tree view.

FIXED: Can't type a space character in the name of a feed or category during a rename.

FIXED: Ftp remote storage not always working (passive/active connection mode is now auto-detected)

FIXED: information about GUI layout not always saved.

FIXED: Carriage return and linefeeds are now stripped from item titles

FIXED : Sometimes old items are no longer displayed for some feeds if you hit [Update All Feeds] on startup

FIXED: Clicking [Locate Feed] for some URIs results in an exception about the "incorrect child node being inserted"

This is your chance to own a piece of Internet history. This is the book shown on TV, Internet, magazines and talked about on the radio and seen by millions of people world-wide. I am selling the WIPO book with the 25-page letter I received from Microsoft's lawyers on January 14/2004. I have two copies of these and I will be keeping one for my own personal memoirs. This inch-thick book contains copies of web pages, registrations, trade marks, other WIPO cases, emails between me and Microsoft's lawyers and much more. There are 27 annexes filled with information. This package also comes with the 25-page complaint transmittal coversheet that was sent with the inch-thick book. In this letter you can find policies, rules, supplemental rules, model responses, copy of complaint and much more.

I wonder how high the bidding is going to go for this piece of Internet history.

Box said technologies such as Java's Remote Method Invocation (RMI) and CORBA (Common Object Request Broker Architecture) all suffered similar problems. "The metaphor of objects as a primary distribution media is flawed. CORBA started out with wonderful intentions, but by the time they were done, they fell into the same object pit as COM."

The problem with most distributed object technologies, Box said, is that programs require particular class files or .jar files (referring to Java), or .dll files (Microsoft's own dynamic linked libraries). "We didn't have (a) true arms-length relationship between programs," Box said. "We were putting on an appearance that we did, but the programs had far more intimacy with each other than anyone felt comfortable with."

"How do we discourage unwanted intimacy?" he asked. "The metaphor we're going to use for integrating programs (on Indigo) is service orientation. I can only interact by sending and receiving messages. Message-based (communications) gives more flexibility

I guess Don didn't get the memo - OO is all about the messages between the objects, and less about the actual objects themselves. Look at that last sentence - "Message based communications" gives more flexibility? What does he think a OO is about? You know, CORBA can be simple - in VisualWorks, it's amazingly, astoundingly simple. It takes a curly brace language like Java or C# to make it complex (at the developer level - I'm not talking implementation layer here).

James Robertson completely misses the point of Don's comments on distributed computing with objects versus using message passing. An example of a service oriented architecture that uses message passing is HTTP on the World Wide Web. It is flexible, scalable and loosely coupled. No one can say with a straight face that using CORBA, Java RMI or DCOM is as scalable or as loosely coupled unless they're trying to sell you something. What Don and the folks on the Indigo team are trying to do is apply the lessons learned from the Web solving problems traditionally tackled by distributed object systems.

It is just an unfortunate naming collision that some object oriented languages use the term “message passing” to describe invoking methods on objects which I'm sure is what's confused James Robertson given that he is a Smalltalk fan.

OK media consumers, let's look forward to 2006. It's always good to look at where you'll end up when you consider buying into a platform of any kind -- and both Apple and Microsoft want you to look at their offerings as just a piece of their platform offerings. It's sort of like picking a football team -- if you're gonna be locked into a team for a few years, wouldn't you rather pick a Superbowl winner than someone who'll go 1-18?

Over the next three years, it won't be uncommon for many of you to buy 500 songs if you want to buy legitimate music from legitimate sources (translate: official services approved by the recording industry like Napster or iTunes). That'll cost you $300 to $500. It's pretty clear that the world will come down to two or three major "systems." Disclaimer: MSN is rumored to be working on such a system. See, when you buy music from a service like Apple's iTunes or Napster (or MSN), it comes with DRM attached.

When you hear DRM think "lockin." So, when you buy music off of Napster or Apple's iTunes, you're locked into the DRM systems that those applications decided on. Really you are choosing between two competing lockin schemes.

But, not all lockin schemes are alike, I learned on Friday. First, there are two major systems. The first is Apple's AAC/Fairtunes based DRM. The second is Microsoft's WMA

Let's say it's 2006. You have 500 songs you've bought on iTunes for your iPod. But, you are about to buy a car with a digital music player built into it. Oh, but wait, Apple doesn't make a system that plays its AAC format in a car stereo. So, now you can't buy a real digital music player in your car. Why's that? Because if you buy songs off of Apple's iTunes system, they are protected by the AAC/Fairtunes DRM system, and can't be moved to other devices that don't recognize AAC/Fairtunes. Apple has you locked into their system and their devices. (And, vice versa is true, as any Apple fan will gladly point out to you). What does that mean if you buy into Apple's system? You've gotta buy an FM transmitter that transmits songs from your iPod to your car stereo. What does that do? Greatly reduces the quality. How do I know that? Cause the Microsoft side of the fence has FM transmitters too. I saw a few on Friday. But, what we have on our side is a format (WMA) that's already being adopted by car stereo manufacturers. So, now when you buy a new song on Napster, it can play on your car stereo, or on your portable music player. Is the choice to do that important to you? If not, then you can buy an iPod and music off of iTunes.

I'm not going to be too critical about Scoble's post since he's basically doing his job as an evangelist and the last thing I want is yet more hate mail from folks in the B0rg cube who believe that every personal blog by a Microsoft employee should be a mini-pep rally for Microsoft products. But I do want to point out some counter arguments that I believe people on both sides of the debate [especially in the B0rg cube] should pay attention to. The first is Cory Doctorow's rant Protect your investment: buy open . He writes

Well, says Scoble, all of the music that we buy from these legit services is going to have DRM use-restriction technology ("See, when you buy music from a service like Apple's iTunes or Napster (or MSN), it comes with DRM attached."). So the issue becomes "choosing between two competing lockin schemes."

And in that choice, says Scoble, Microsoft wins, because it has more licensees of its proprietary, lock-in format. That means that when you want to play your music in your car, it's more likely that you'll find a car-stereo manufacturer that has paid Microsoft to play Microsoft music than that you'll find one that has coughed up to Apple to play Apple music.

And this is the problem with Scoble's reasoning. We have a world today where we can buy CDs, we can download DRM-music, we can download non-DRM music from legit services, we can download "pirate" music from various services, and we can sometimes defeat DRM using off-the-shelf apps for Linux (which has a CD recovery tool that handily defeats CD DRM), the Mac (with tools like AudioHijack that make it easy to convert DRM music to MP3s or other open formats) and Windows (I assume, since I don't use Windows, but as Scoble points out, there's lots of Windows software out there.).

In this world where we have consumer choices to make, Scoble argues that our best buy is to pick the lock-in company that will have the largest number of licensees

That's just about the worst choice you can make.

If I'm going to protect my investment in digital music, my best choice is clearly to invest in buying music in a format that anyone can make a player for.

I have an iPod and I have to agree with Cory. I don't buy DRMed music but I do buy CDs and sometimes look for remixes of singles not available in stores anymore on Kazaa. I use a tape deck connector to plug my iPod into my car stereo and it often sounds better than CDs. An argument about how many devices can play Microsoft's file formats versus Apple's sounds silly to me given that I'll only ever use one player at a time. Scoble's argument [which I hope isn't a marketing strategy that Microsoft is seriously going to pursue] is that folks will transfer music between multiple players during regular usage which in practice just isn't likely. And even if it was, the best bet for people in such cases would be to use the most widely supported format in which case it would be the MP3 format. Either way an iPod still seems like an attractive buy. Arguing about music file formats for portable music players is like arguing about formats for address books in cell phones and trying to make the fact that you can move your address book easily between two cell phones that run the same OS than others is some sort of selling point that is of interest to regular people.

The saddest part of all this is watching Scoble describe feedback from people pointing out the obvious holes in his sales pitch as hate mail. It isn't his fault, we all act that way once we've been assimilated. :) I just don't see this "more choice" argument convincing many people. Scoble is better off focusing on price points and design aesthetics of competing media players to the iPod than the artificial differences he's trying to construct. I was particularly fond of his statement

It's interesting the religiousness of the debates. Brings me back to when I was a Macintosh fanatic back in the late 1980s. Oh, if only religious support won markets. Course if that were the case, I'd be working for Steve Jobs now in Cupertino, huh?

3. Programming language neutrality. Here's a statement, from an early Jeff Richter article about .NET, that provoked oohs and ahhs at the time: "It is possible to create a class in C++ that derives from a class implemented in Visual Basic." Well, does anybody do this now? Is it useful? Meanwhile, the dynamic language support we were going to get, for the likes of Perl and Python, hasn't arrived. Why not?

The primary benefit of programming language neutrality is that library designers can build a library in one language and developers using other languages can use them without having to worry about language differences. The biggest example of this is the .NET Framework's Base Class Library, it is mainly written in C# but this doesn't stop VB.NET developers from writing .NET applications, implementing interfaces or subclassing types from the various System.* namespaces.

Examples closer to home for me are in RSS Bandit which depends on a couple of third party libraries such as Chris Lovett's SgmlReader and Tim Dawson's SandBar. I've personally never had to worry about what language they were written in nor do I care. All I need to know is that they are targetted at the .NET Framework.

On the other hand, when the .NET Framework was first being hyped there were a lot of over enthusiastic evangelists and marketters who tried to sell the programming language neutrality as something that would also allow you to have developers working on a single project use different languages. Although theoretically possible, that always seemed like an idea that would be unwise to implement in practice. I can imagine how problematic it would be if Torsten wrote managed C++ code and I wrote VB.NET code for the different parts of RSS Bandit we worked on. Fixing each others bugs would be a painful experience.

Actually we've decided to rethink having a Validate() method on the class that is currently called XPathDocument2 because it may lead users down the wrong path. Our worry is that users will end up loading an XML document and then call Validate() on it thus incurring the cost of two passes over the document as opposed to the more efficient approach of loading the document with a validating XmlReader. For this reason we've removed the Validate() method from the class.

Also there is no plan to have XmlDocument support any DOM L3 feature. Moving forward, the primary representation of in-memory XML documents on the .NET Framework will be the class currently called XPathDocument2 and that is where the Microsoft WebData XML team's efforts will be spent.

As Joshua Allen mentioned in his blog our team recently had a WinFX Review. This is basically a design review with a number of top architects from across the .NET Framework to ensure that the API you are building is consistent with the design guidelines for an API that will be shipping in the next version of Windows (i.e. LongHorn). We got a lot of good feedback which we are in the process of responding to and has caused a few design changes. The good news is that we've come up with a story for XPathNavigator2, XPathEditor and XPathDocument2 that most people who've heard are happy with.

After the review we were pinged by Anders Hejlsberg who missed the original design review and asked if we could do a mini-review with just him. He gave lots of good feedback, questioned some of our scenarios and was quite amiable. I think he was mostly satisfied with the design decisions we'd made but thought we could do more in making processing XML dead easy as opposed the current situation where the developer needs to know a bit too much about XML and our programming model. He also talked about the tradeoffs of going to a cursor based model (XPathNavigator2/XPathEditor) from a tree based model (XmlNode) and the disconnects developers may feel once they make the shift. I suspect it will be similar to the disconnect developers initially felt when moving from MSXML & Java which had a push-based model (SAX) for processing XML in a streaming fashion to the .NET Framework which uses a pull-based model (XmlReader). At first it was unfamiliar but once they started using it and saw the benefits they preferred it to the old way.

That said we do need to think some more about how to better benefit the “XML as config file format AKA CSV on steroids” demographic. A large number of developers just see XML as nothing more than a format for configuration and log files in which case a lot of the cruft of XML is meaningless to them such as entities, processing instructions and CDATA sections.

OMAHA, Neb. -- Four Westside High School students are suspended for promoting a white student for an African-American award. Flyers featured junior Trevor Richards, a South African native who moved to the United States in 1997.

Trevor said he is as African as anyone else.

This is just ridiculous. The kid is obviously African-American if the words are expected to be taken literally. Of course, like most euphemisms the words aren't meant to be taken literally but instead are supposed to map to a [sometimes unrelated] concept. I hope the kids take the school district to court.

I just registered into Orkut (Google's version of Friendster according to Slashdot), thanks to an invitation from Don Park -- thanks Don. Part of the registration process contained one my pet peeves, a question about ethnicity that had [african american (black)] as one of the options. As if both terms are interchangeable. I almost picked [other] since the designers of the software seemed to think that people of African descent that aren't citizens of the United States weren't a large enough demographic to have their own option in the drop down list. I ended up going with [african american (black)] since I didn't want to confuse people who'd be looking up my profile.

Checking out a couple of people's friend networks it seems the misgivings I had about Friendster which kept me from using it when I first heard about it are accurate if Orkut is anything like it. Online folks seem to have a weird definition of friend. When I think friend, I think of someone you'd give a call and say "Hello, I just killed someone" and after a pause their response is "Shit, so what are we going to do about the body?" That isn't to be taken literally but you get the idea. A friend is someone you'd go to the ends of the Earth for and who'd do the same for you. People with whom my primary interaction involves reading their weblogs and exchanging mail on various mailing lists don't really fall into the "friend" category for me. Lumping those people together with folks I've known all my life who've been with me through thick and thin who've done things like let me hold their bank card with the PIN number to use in case of emergencies when I was broke, trusted me to come up with my share of the rent and bills when I had no job and no prospects because I gave my word, and helped me get out of trouble when I thought I was in over my head just seems wrong to me.

There are acquaintences, friends and folks I'd die for. Lumping them all into one uber-category called friends just doesn't jibe with me. I'll play with the site some more later today but I doubt I'll be on it for long. I've got some stuff coming in from IKEA this morning.

Theres been some recent surprise by blogcrazy about the recent democratic party caucus in Iowa in which John Kerry won 38 percent of the state convention delegates, with 32 percent for John Edwards, 18 percent for Howard Dean and 11 percent for Gephardt. Many had assumed that Howard Dean's highly successful Internet campaign with its adoption of blogging technologies and support by many bloggers were an indication of strong grass roots support. Yeah, right.

Along these lines, by tomorrow, I'm sure there'll be more than one person that will gnash their teeth and write "weblogs failed Dean."

Well, the weblog hype did get overboard the past few weeks. Weblogs do matter. Why? The influentials read weblogs. The press. The insiders. The passionate ones.

But, the average Joe doesn't read these. Come on, be real. Instapundit gets, what, 100,000 to 200,000 visitors a day? I get 2,000. That's a small little dinky number in a country of 290 million.

Weblogs and online technologies have helped Dean and others collect a lot of money, but you still gotta have a TV persona that hits home. Just reality in 2004. I'm not bitter about that.

The lessons for big-company evangelism (or small company, for that matter) are the same. If your product isn't something that average people like, it doesn't matter how good the weblogs are.

Considering that Robert Scoble is one of the weblog hypesters who may have gone “overboard“ as he puts it I find his post particularly telling. Folks like Robert Scoble have trumpetted that weblogs would be triumphant against traditional marketting and in many posts he's berated product teams at the company he works for [Microsoft] that don't consider weblogging as part of their marketing message. Weblogs are currently a fairly low cost way of communicating with a certain class of internet savvy people. However nothing beats traditional communication channels such as television, billboards and the print media for spreading a message amongst all and sundry.

Don't be blinded by the hype.

There's one other response to the recent events in Iowa that made me smile. Doc Searles wrote

I see that my positive spin yesterday on Howard Dean's "barbaric yawp" speech got approximately no traction at all. Worse, the speech was (predictably) mocked by everybody in the major media from Stern in the morning to Letterman and Leno in the evening.

Clearly, its effects were regretable. It hurt the campaign. But it was also honest and authentic, and in the long run that can only help, for the simple reason that it was real. So. What to do? Here's my suggestion... Look at media coverage as nothing more than transient conditions, like weather. And navigate by the stars of your own constituency. The main lesson from Cluetrain is "smart markets get smarter faster than most companies." The same goes for constituencies and candidates. Your best advice will come from the people who know you best, who hear your voice, who understand the missions of your campaign and write about it clearly, thoughtfully and with great insight. They're out there. Your staff can help you find them. Navigate by their stars, not the ones on television.

I've always found people who espouse the Cluetrain Manifesto as seeming particularly naive as to the realities of markets and marketing. Telling someone to ignore media coverage and keep it real is not how elections are won in America. Any student of recent American history knows the increased significance of the media in presedential elections ever since the televised Kennedy-Nixon debates in the elections of the1960s.

Being a hobbyist developer interested in syndication technologies I'm always on the look out for articles that provide useful critiques of the current state of the art. I recently stumbled on an article entitled 10 reasons why RSS is not ready for prime time by Dylan Greene which fails to hit the mark in this regard. Of the ten complaints, about three seem like real criticisms grounded in fact while the others seem like contrived issues which ignore reality. Below is the litany of issues the author brings up and my comments on each

1) RSS feeds do not have a history. This means that when you request the data from an RSS feed, you always get the newest 10 or 20 entries. If you go on vacation for a week and your computer is not constantly requesting your RSS feeds, when you get back you will only download the newest 10 or 20 entries. This means that even if more entires were added than that while you were gone, you will never see them.

Practically every information medium has this problem. When I travel out of town for a week or more I miss the newspaper, my favorite TV news shows and talk radio shows which once gone I'll likely never get to enjoy. With blogs it is different since most blogs provide an online archive but on the other hand most news sites archive their content and require paid access after they're no longer current.

In general, most individual blogs aren't updated regularly enough that being gone for a week or two means that entries are missed. On the other hand most news sites are. In such cases one could leave their aggregator of choice connected and reduce its refresh rate (something like once a day) and let your fingers do the walking. That's exactly what one would have to do with a TiVo (i.e. leave your cable box on).

2) RSS wastes bandwidth. When you "subscribe" to an RSS feed, you are telling your RSS reader to automatically download the RSS file on a set interval to check for changes. Lets say it checks for news every hour, which is typical. Even if just one item is changed the RSS reader must still download the entire file with all of the entries.

The existing Web architecture provides a couple of ways for polling based applications to save bandwidth including HTTP conditional GET and gzip compression over HTTP. Very few web sites actually support both well-known bandwidth saving techniques including the Dylan Green based on a quick check with Rex Swain's HTTP Viewer. Using both techniques can save bandwidth costs by an order of magnitude (by a factor of 10 for the mathematically challenged). Before coming up with sophisticated hacks for perceived problems it'd be nice if website administrators actually used existing best practices before trying to reinvent the wheel in more complex ways.

That said, it would be a nice additional optimization for web sites to only provide only the items that hadn't been read by a particular client for each request for the RSS feed. However I'd like to see us learn to crawl before we try to walk.

3) Reading RSS requires too much work. Today, in 2004, we call it "browsing the Web" - not "viewing HTML files". That is because the format that Web pages happen to be in is not important. I can just type in "msn.com" and it works. RSS requires much more than that: We need to find the RSS feed location, which is always labeled differently, and then give that URL to my RSS reader.

Yup, there isn't a standard way to find the feed for a website. RSS Bandit tries to make this easier by feed lookup via Syndic8 and supporting one click subscription to RSS feeds. However aggregator authors can't do this alone, the blogging tools and major websites that use RSS need to get in on the act as well.

4) An RSS Reader must come with Windows. Until this happens too, RSS reading will only be for a certain class of computer users that are willing to try this new technology. The web became mainstream when Microsoft started including Internet Explorer with Windows. MP3's became mainstream when Windows Media Player added MP3 support.

I guess my memory is flawed but I always thought Netscape Navigator and Winamp/Napster where the applications that brought the Web and MP3s to the mainstream respectively. I'm always amused by folks that think that unless Microsoft supports some particular technology then it is going to fail. It'd be nice if an RSS aggregator but that doesn't mean that a technology cannot become popular until it ships in Windows. Being a big company, Microsoft is generally slow to react to trends until they've proven themselves in the market which means that if an aggregator ever ships in Windows it will happen when news aggregators are mainstream not before.

5) RSS content is not User-Friendly. It has taken about 10 years for the Web to get to the point where it is today that most web pages we visit render in our browser the way that the designer intended. It's also taken about that long for web designers to figure out how to lay out a web page such that most users will understand how to use it. RSS takes all of that usability work and throws it away. Most RSS feeds have no formatting, no images, no tables, no interactive elements, and nothing else that we have come to rely on for optimal content readability. Instead we are kicked back to the pre-web days of simple text.

I find it hard to connect tables, interactive elements and images with “optimal content readability” but maybe that's just me. Either way, there's nothing stoping folks from using HTML markup in RSS feeds. Most of the major aggregators are either browser based or embed a web browser so viewing HTML content is not a problem. Quite frankly, I like the fact that I don't have to deal with cluttered websites when reading content in my aggregator of choice.

6) RSS content is not machine-friendly. There are search engines that search RSS feeds but none of them are intelligent about the content they are searching because RSS doesn't describe the properties of the content well enough. For example, many bloggers quote other blogs in their blog. Search engines cannot tell the difference between new content and quoted content, so they'll show both in the search results.

I'm curious as to which search engine he's used which doesn't have this problem. Is there an “ignore items that are parts of a quote” option on Google or MSN Search? As search engines go I've found Feedster to be quite good and better than Google for a certain class of searches. It would be cool to be able to execute ad-hoc, structured queries against RSS feeds but this would be icing on the cake and in fact is much more likely to happen in the next few years than is possible that we will ever be able to perform such queries against [X]HTML web sites.

7) Many RSS Feeds show only an abridged version of the content. Many RSS feeds do not include the full text. Slashdot.org, one of the most popular geek news sites, has an RSS feed but they only put the first 30 words of each 100+ word entry in their feed. This means that RSS search engines do not see the full content. This also means that users who syndicate their feed only see the first few words and must click to open a web browser to read the full content.

This is annoying but understandable. Such sites are primarily using an RSS feed as a way to lure you to the site not as a way to provide users with content. I don't see this as a problem with RSS any more than the fact that some news sites need you to register or pay to access their content is a problem with HTML and the Web.

8) Comments are not integrated with RSS feeds. One of the best features of many blogs is the ability to reply to posts by posting comments. Many sites are noteworthy and popular because of their comments and not just the content of the blogs.

9) Multiple Versions of RSS cause more confusion. There's several different versions of RSS, such as RSS 0.9, RSS 1.0, RSS 2.0, and RSS 3.0, all controlled by different groups and all claiming to be the standard. RSS Readers must support all of these versions because many sites only support one of them. New features can be added to RSS 1.0 and 2.0 can by adding new XML namespaces, which means that anybody can add new features to RSS, but this does mean that any RSS Readers will support those new features.

I assume he has RSS 3.0 in there as a joke. Anyway, the existence of multiple versions of RSS is not that much more confusing to end users than the existence of multiple versions of [X]HTML, HTTP, Flash and Javascript some of which aren't all supported by every web browser.

That said a general plugin mechanism to deal with items from new namespaces would be an interesting problem to try and solve but sounds way too hard to successfully provide a general solution for.

10) RSS is Insecure. Lets say a site wants to charge for access to their RSS feed. RSS has no standard way for inputing a User Name and Password. Some RSS readers support HTTP Basic Authentication, but this is not a secure method because your password is sent as plain text. A few RSS readers support HTTPS, which is a start, but it is not good enough. Once somebody has access to the "secure" RSS file, that user can share the RSS file with anybody.

Two points. (A) RSS is a Web technology so the standard mechanisms for providing restricted yet secure access to content on the Web apply to RSS and (B) there is no way known to man short of magic to provide someone with digital content on a machine that they control and restrict them from copying it in some way, shape or form.

Mark Pilgrim has a post entitled The history of draconian error handling in XML where he excerpts a couple of the discussions on the draconian error handling rules of XML which state that if an XML processor encounters a syntax error in an XML document it should stop parsing and indicate a fatal error as opposed to muddling along or trying to fixup the error in some way. According to Tim Bray

What happened was, we had a really big, really long, really passionate argument on the subject; the camps came to be called “Draconians” and “Tolerants.” After this had gone on for some weeks and some hundreds of emails, we took a vote and the Draconians won 7-4.

Reading some of the posts from 6 years ago on Mark Pilgrim's blog it is interesting to note that most of the arguments on the sides of the Tolerants are simply no longer relevant today while the Draconians turned out to be the reason for XML's current widespread success in the software marketplace.

The original goal of XML was to create a replacement for HTML which allowed you to create your own tags yet have them work in some fashion on the Web (i.e SGML on the Web). Time has shown that placing XML documents directly on the Web for human consumption just isn't that interesting to the general Web development comunity. Most content on the Web for human consumption is still HTML tag soup. Even when Web content claims to be XHTML it often is really HTML tag soup either because it isn't well-formed or is invalid according to the XHTML DTD. Even applications that represent data internally as XML tend to use XSLT to transform the content to HTML as opposed to putting the XML directly on the Web and styling it with CSS. As I've mentioned before the dream of the original XML working group of replacing HTML by inventing “SGML on the Web” is a failed dream. Looking back in hindsight it doesn't seem that the choice of tolerant over draconian error handling would have made a difference to the lack of adoption of XML as a format for representing content targetted for human consumption on the Web today.

On the other hand, XML has flourished as a general data interchange format for machine-to-machine interactions in wide ranging areas from distributed computing and database applications to being a format for describing configuration files and business documents. There are a number of reasons for XML's rise to popularity

The ease with which XML technologies and APIs enabled developers to process documents and data in an easier and more flexible manner than with previous formats and technologies.

The ubiquity of XML implementations and the consistency of the behavior of implementations across platforms.

The fact that XML documents were fairly human-readable and seemed familiar to Web developers since it was HTML-like markup.

Considering the above points, does it seem likely that XML would be as popular outside of its original [failed] design goal of being a replacement for HTML if the specification allowed parsers to pick and choose which parts of the spec to honor with regards to error recovery? Would XML Web Services be as useful for interoperability between platforms if different parser implementations could recover from syntax errors at will in a non-deterministic manner? Looking at some of the comments linked from Mark Pilgrim's blog it does seem to me that a lot of the arguments on the side of the Tolerants came from the perspective of “XML as an HTML replacement” and don't stand up under scrutiny in today's world.

Programming languages that barf on a syntax error do so because a partial executable image is a useless thing. A partial document is *not* a useless thing. One of the cool things about XML as a document format is that some of the content can be recovered even in the face of error. Compare this to our binary document friends where a blown byte can render the entire content inaccessible.

Given that today XML is used for building documents that are effectively programs such as XSLT, XAML and SVG it does seem like the same rules that apply for partial programs should apply as well.

Browsers do not just need a well-formed XML document. They need a well-formed XML document with a stylesheet in a known location that is syntactically correct and *semantically correct* (actually applies reasonable styles to the elements so that the document can be read). They need valid hyperlinks to valid targets and pretty soon they may need some kind of valid SGML catalog. There is still so much room for a document author to screw up that well-formedness is a very minor step down the path.

I have to agree here with the spirit of the post [not the content since it assumed that XML was going to primarily be a browser based format]. It is far more likely and more serious that there are logic errors in an XML document than syntax errors. For example, there are more RSS feeds out there with dates are invalid based on the RSS spec they support than there are ill-formed feeds. And in a number of these it is a lot easier to fix the common well-formedness errors than it is to fix violations of the spec (HTML in descriptions or titles, incorrect date formats, data other than email addresses in the <author> element, etc).

The basic point against the Draconian case is that a single (monolithic?) policy towards error handling is a recipe for failure. ...

XML is many things but I doubt that one could call it a failure except when it comes to its original [flawed] intent of replacing HTML. As an mechanism for describing structured and semi-structured content in a robust, platform independent manner IT IS KING.

So why do I say everyone lost yet everyone won? Today most XML on the Web targetted at human consumption [i.e. XHTML] isn't well-formed so in this case the Tolerants were right and the Draconians lost since well-formed XML has been a failure on the human Web. However in the places were XML is getting the most traction today, the draconian error handling rules promote interoperability and predictability which is the opposite of what a number of the Tolerants expected would happen with XML in the wild.

One of the hats I wear as part of my day job is that I'm the Community Lead for the WebData XML team. This means I'm responsible for a lot of our interactions with our developer community. One of the things I'm proudest of is that I got the Microsoft Most Valuable Professional (MVP) Program to create an award category for XML which was tough at first but they eventually buckled.

I'm glad to note that a number of folks I nominated to become MVPs this year have been awarded including Daniel Cazzulino, Oleg Tkachenko, DonXML Demsak and Dimitre Novatchev among others. These folks have gone above and beyond in helping fellow developers working with XML technologies on Microsoft platforms. You guys rock...

My next major task as community lead is to launch an XML Developer Center on MSDN. I've been working at this for over a year but it looks like there is now a light at the end of the tunnel. If you are interested in writing about XML technologies on Microsoft platforms on MSDN you should give me a holler via my work email address.

The core of the UI for search folders is done. Now all we have to do is some test it and make sure the feature is fully baked. I'm beginning to have reservations about XPath search but time will tell if it'll actually be problematic or not. Screenshot below.

Saturday's Scripting News asked an important question: What do users want from RSS? The context of the question is the upcoming RSS Winterfest... Over the weekend I received a draft of the RSS Winterfest agenda along with a In an October posting from BloggerCon I present video testimony from several of them who make it painfully clear that the most basic publishing and subscribing tasks aren't yet nearly simple enoughrequest for feedback. Here's mine: focus on users. .

Here's more testimony from the comments attached to Dave's posting:

One message: MAKE IT SIMPLE. I've given up on trying to get RSS. My latest attempt was with Friendster: I pasted in the "coffee cup" and ended up with string of text in my sidebar. I was lost and gave up. I'm fed up with trying to get RSS. I don't want to understand RSS. I'm not interested in learning it. I just want ONE button to press that gives me RSS.... [Ingrid Jones]

Like others, I'd say one-click subscription is a must-have. Not only does this make it easier for users, it makes it easier to sell RSS to web site owners as a replacement/enhancement for email newsletters... [Derek Scruggs]

For average users RSS is just too cumbersome. What is needed to make is simpler to subscribe is something analog to the mailto tag. The user would just click on the XML or RSS icon, the RSS reader would pop up and would ask the user if he wants to add this feed to his subscription list. A simple click on OK would add the feed and the reader would confirm it and quit. The user would be back on the web site right where he was before. [Christoph Jaggi]

One of the biggest problems that faces designers of XML vocabularies is how to make them extensible and design them in a way that applications that process said vocabularies do not break in the face of changes to versions of the vocabulary. One of the primary benefits of using XML for building data interchange formats is that the APIs and technologies for processing XML are quite resistant to additions to vocabularies. If I write an application which loads RSS feeds looking for item elements then processes their link and title elements using any one of the various technologies and APIs for processing XML such as SAX, the DOM or XSLT it is quite straightforward to build an application that processes said elements which is resistant to changes in the RSS spec or extensions to the RSS spec as the link and title elements always appear in a feed.

On the other hand, actually describing such extensibility using the most popular XML schema language, W3C XML Schema, is difficult because of several limitations in its design which make it very difficult to describe extension points in a vocabulary in a way that is idiomatic to how XML vocabularies are typically processed by applications. Recently, David Orchard, a standards architect at BEA Systems wrote an article entitled Versioning XML Vocabularies which does a good job of describing the types of extensibility XML vocabularies should allow and points out a number of the limitations of W3C XML Schema that make it difficult to express these constraints in an XML schema for a vocabulary. David Orchard has written a followup to this article entitled Providing Compatible Schema Evolution which contains a lot of assertions and suggestions for improving extensibility in W3C XML Schema that mostly jibe with my experiences working as the Program Manager responsible for W3C XML Schema technologies at Microsoft.

The scenario outlined in his post is

We start with a simple use case of a name with a first and last name, and it's schema. We will then evolve the language and instances to add a middle name. The base schema is:

At this point I'd like to note that this a versioning problem which is a special instance of the extensibility problem. The extensibility problem is how does one describe an XML vocabulary in a way that allows producers to add elements and attributes to the core vocabulary without causing problems for consumers that may not know about them. The versioning problem is specific to when the added elements and attributes actually are from a subsequent version of the vocabulary (i.e. a version 2.0 server talking to a version 1.0 client). The additional wrinkle in the specific scenario outlined by David Orchard is that elements from newer versions of the vocabulary have the same namespace as elements from the old version.

A strategy for simplifying the problem statement would be if additions in subsequent versions of the vocabulary had were in a different namespace (i.e. a version 2.0 document would have elements from the version 1.0 namespace and the version 2.0 namespace) which would then make the versioning problem the same as the extensibility problem. However most designers of XML vocabularies would balk at creating a vocabulary which used elements from multiple namespaces for its core [once past version 2.0] and often site that this makes it more cumbersome for applications that process said vocabularies because they have to deal with multiple namespaces. This is a tradeoff which every XML vocabulary designer should consider during the design and schema authoring process.

David Orchard takes a look at various options for solving the extensibility problem outlined above using current XML Schema design practices.

Type extension

Use type extension or substitution groups for extensibility. A sample schema is:

This requires that both sides simultaneously update their schemas and breaks backwards compatibility. It only allows the extension after the last element

There is a [convoluted] way to ensure that both sides do not have to update their schemas. The producer can send a <name> element that contains xsi:type attribute which has the NameExtendedType as its value. The problem is then how the client knows about the definition for the NameExtendedType type which is solved by the root element of the document containing an xsi:schemaLocation attribute which points to a schema for that namespace which includes the schema from the previous version. There are at least two caveats to this approach (i) the client has to trust the server since it is using a schema defined by the server not the client's and (ii) since the xsi:schemaLocation attribute is only a hint it is likely the validator may ignore it since the client would already have provided a schema for that namespace.

Change the namespace name or element name

The author simply updates the schema with the new type. A sample is:

This does not allow extension without changing the schema, and thus requires that both sides simultaneously update their schemas. If a receiver has only the old schema and receives an instance with middle, this will not be valid under the old schema

Most people would state that this isn't really extensibility since [to XML namespace aware technologies and APIs] the names of all elements in the vocabulary have changed. However for applications that key off the local-name of the element or are unsavvy about XML namespaces this is a valid approach that doesn't cause breakage. Ignoring namespaces, this approach is simply adding more stuff in a later revision of the spec which is generally how XML vocabularies evolve in practice.

Use wildcard with ##other

This is a very common technique. A sample is:

The problems with this approach are summarized in Examining elements and wildcards as siblings. A summary of the problem is that the namespace author cannot extend their schema with extensions and correctly validate them because a wildcard cannot be constrained to exclude some extensions.

I'm not sure I agree with David Orchard summary of the problem here. The problem described in the article he linked to is that a schema author cannot refine the schema in subsequent versions to contain optional elements and still preserve the wildcard. This is due to the Unique Particle Attribution constraint which states that a validator MUST always have only one choice of which schema particle it validates an element against. Given an element declaration for an element and a wildcard in sequuence the schema validator has a CHOICE of two particles it could validate an element against if its name matches that of the element declaration. There are a number of disambiguating rules the W3C XML Schema working group could have come up with to allow greater flexibility for this specific case such as (i) using a first match rule or (ii) allowing exclusions in wildcards.

Use wildcard with ##any or ##targetnamespace

This is not possible with optional elements. This is not possible due to XML Schema's Unique Particle Attribution rule and the rationale is described in the Versioning XML Languages article. An invalid schema sample is:

The Unique Particle Attribution rule does not allow a wildcard adjacent to optional elements or before elements in the same namespace.

Agreed. This is invalid.

Extension elements

This is the solution proposed in the versioning article. A sample of the pre-extended schema is:

An extended instance is

DaveBOrchard

This is the only solution that allows backwards and forwards compatibility, and correct validation using the original or the extended schema. This articles shows a number of the difficulties remaining, particularly the cumbersome syntax and the potential for some documents to be inappropriately valid. This solution also has the problem of each subsequent version will increase the nesting by 1 level. Personally, I think that the difficulties, including potentially deep nesting levels, are not major compared to the ability to do backwards and forwards compatible evolution with validation.

The primary problem I have with this approach is that it is a very unidiomatic way to process XML especially when combined with the problem with nesting in concurrent versions. For example, take a look at

DaveBMr.Orchard

Imagine if this is the versioning strategy that had been used with HTML, RSS or DocBook. That gets real ugly, real fast. Unfortunately this is probably the best you can if you want to use W3C XML Schema to strictly define the an XML vocabulary with extensibility yet allow backwards & forwards compatibility.

David Orchard goes on to suggest a number of potential additions to future versions of W3C XML Schema which would make it easier to use it in defining extensible XML vocabularies. However given that my personal opinion is that adding features to W3C XML Schema is not only trying to put lipstick on a pig but also trying to build a castle on a foundation of sand, I won't go over each of his suggestions. My recent suggestion to some schema authors at Microsoft about solving this problem is that they should have two validation phases in their architecture. The first phase does validation according to W3C XML Schema rules while the other performs validation of “business rules“ specific to their scenarios. Most non-trivial vocabularies end up having such an architecture anyway since there are a number of document validation capabilities missing from W3C XML Schema so schema authors shouldn't be too focused on trying to force fit their vocabulary into the various quirks of W3C XML Schema.

For example, in one could solve the original schema with a type definition such as

where the validation layer above the W3C XML Schema layer ensures that an element doesn't occur twice (i.e. there can't be two <first> elements in a <name>). It adds more code to the clients & servers but it doesn't result in butchering the vocabulary either.

I read somewhere that the cost of going to Mars may eventually total up to $170 billion which is nowhere close to the $12 billion the US President has stated will flow into NASA's coffers over the next 5 years to help finance the Mars dream. I don't want to knock the US government's spending on AIDS (supposedly $1 billion this year) but aren't there significant, higher priority problems on Earth that need tackling before one starts dabbling in interplanterary conquest?

Gil-Scott Heron's poem Whitey's on the Moon is still quite relevant today. I guess the more things change, the more they stay the same.

The main problem is that there are a number of websites which have the same information but do not provide a uniform way to access this information and when access mechanisms to information are provided do not allow ad-hoc queries. So the first thing that is needed is a shared view (or schema) of what this information looks like which is the shared information model Adam talks about...

Once an XML representation of the relevant information users are interested has been designed (i.e. the XML schema for books, reviews and wishlists that could be exposed by sites like Amazon or Barnes & Nobles) the next technical problem to be solved is uniform access mechanisms... Then there's deployment, adoption and evangelism...

We still need a way to process the data exposed by these web services in arbitrary ways. How does one express a query such as "Find all the CDs released between 1990 and 1999 that Dare Obasanjo rated higher than 3 stars"?..

At this point if you are like me you might suspect that defining that the web service endpoints return the results of performing canned queries which can then be post processed by the client may be more practical then expecting to be able to ship arbitrary SQL/XML, XQuery or XPath queries to web service end points.

The main problem with what I've described is that it takes a lot of effort. Coming up with standardized schema(s) and distributed computing architecture for a particular industry then driving adoption is hard even when there's lots of cooperation let alone in highly competitive markets.

A few days ago I got a response to this post from Michael Brundage, author of XQuery : The XML Query Language and a lead developer of the XML<->relational database technologies the WebData XML team at Microsoft produces, on a possible solution to this problem that doesn't require lots of disparate parties to agree on schemas, data model or web service endpoints. Michael wrote

Dare, there's already a solution to this (which Adam created at MS five years ago) -- virtual XML views to unify different data sources. So Amazon and BN and every other bookseller comes up with their own XML format. Somebody else comes along and creates a universal "bookstore" schema and maps each of them to it using an XML view. No loss of performance in smart XML Query implementations.

And if that universal schema becomes widely adopted, then eventually all the booksellers adopt it and the virtual XML views can go away. I think eventually you'll get this for documents, where instead of translating WordML to XHTML (as Don is doing), you create a universal document schema and map both WordML and XHTML into it. (And if the mappings are reversible, then you get your translators for free.)

This is basically putting an XML Web Service front end that supports some degree of XML query on aggegator sites such as AddALL or MySimon. I agree with Michael that this would be a more bootstrapable approach to the problem than trying to get a large number of sites to support a unified data model, query interface and web service architecture.

Come to think of it we're already halfway there to creating something similar for querying information in RSS feeds thanks to sites such as Feedster and Technorati. All that is left is for either site or others like them to provide richer APIs for querying and one would have the equivalent of an XML View of the blogosphere (God, that is such a pretensious word) which you could query to your heart's delight.

For my last post via w::bloggar I used the configuration settings described in the dasBlog documentation on the Blogger API. Weirdly enough, my post showed up without a title.
Torsten Rendelmann comes to the rescue with the alternate instructions for posting to your dasBlog weblog with w::bloggar that supports titles and categories. Hope this works.

I'm in the process of tweaking the RSS Bandit installer and implementing a stop gap measure for supporting posting to one's blog from RSS Bandit while waiting for a SOAP version of the ATOM API. The next release of RSS Bandit will ship with a plugin for posting about a particular blog entry to your via w.bloggar plugin for SharpReader

RSS Bandit already supports Luke's plugin but most people don't know about it so I decided to implement a similar plugin [as opposed to redistributing Luke's since I didn't see anything in the license allowing free redistribution] and add during the install process. PS: This is my first post from w::bloggar. Hope it works.

Overall, the premium paid for IT workers with specific skills was 23 percent lower in 2003 than in 2001, and the pay for certification in particular skills dropped 11 percent, Foote Partners LLC said. ...In a yearlong study of 400 Fortune 1000 companies, researchers found that by 2006, the organizations expected from 35 percent to 45 percent of their current full-time IT jobs to go to workers overseas, David Foote, president and chief research officer for Foote Partners, said.

"That showed a definite declining onshore workforce--fewer jobs for IT people in this country," Foote said.

Perhaps it is time to go back to school and get started on my backup plan of being lawyer specializing in intellectual property law.

Users who subscribe to NewsGator Online Services can now synchronize their subscriptions across multiple machines. This is an industry first - NewsGator 2.0 for Outlook and NewsGator Online Services are the first commercially available tools to provide this capability in such a flexible manner. This sophisticated system ensures that subscriptions follow users wherever they go, users never have to read the same content twice (unless they choose to), and even supports multiple subscription lists so users can have separate, but overlapping, subscription lists at home and at the office.

Interesting. Synchronizing subscriptions for a news reader across multiple machines doesn't strike me as unprecedented functionality that Newsgator pioneered let alone an industry first. The first pass I've seen at doing this in public was Dave Winer's subscription harmonizer which seemed more of a prototype than an actual product expected to be used by regular users. I implemented and shipped the ability to synchronize subscriptions across multiple machines with RSS Bandit about 2 months ago. As for providing an aggregator that supports this feature and a commercial site that would host feeds synchronization information I believe Shrook has Newsgator beat by about a month if the website is to be believed (I don't have a Mac to test whether it actually works as advertised).

I find it unfortunate that it seems that we are headed for a world where multiple proprietary solutions and non-interoperable solutions exist for providing basic functionality that uses take for granted when it comes to other technologies like email. This was the impetus for starting work on Synchronization of Information Aggregators using Markup (SIAM) . Unfortunately between my day job, the girlfriend and trying to get another release of RSS Bandit out the door I haven't had time to brush up the spec and actually work on an implementation. It'll be a few weeks before I can truly focus on SIAM, hopefully it'll be worth waiting for and it'll gain some traction amongst aggregator developers.

Pitt junior Brandon Smith wanted a tattoo that proclaimed his manliness, so he decided to get the Chinese characters for “strength” and “honor” on his chest. After 20 minutes under the needle of local tattoo artist Andy Sakai, he emerged with the symbol for “small penis” embedded in his flesh.

“I had it for months before I knew what it really meant,” Smith said. “Then I went jogging through the Carnegie Mellon campus and a group of Asian kids started laughing and calling me ‘Shorty.’ That’s when I knew something was up.”

Sakai, an award-winning tattoo artist, was tired of seeing sacred Japanese words, symbols of his heritage, inked on random white people. So he used their blissful ignorance to make an everlasting statement. Any time acustomer came to Sakai’s home studio wanting Japanese tattooed on them, he modified it into a profane word or phrase.

“All these preppy sorority girls and suburban rich boys think they’re so cool ‘cause they have a tattoo with Japanese characters. But it doesn’t mean shit to them!” Sakai said. “The dumbasses don’t even realize that I’ve written ‘slut’ or ‘pervert’ on their skin!”

I'm surprised that reports of actions like this are not more widespread. I keep waiting for someone to start the Japanese version of Engrish.com that makes fun of all the folks in the USA who have misspelled Japanese characters on their T-shirts or tattoed on their skin the same way Engrish.com does for misspelled, grammatically incorrect English that shows up in Japan all over the place.

I've always thought it was really ghetto (i.e. ignorant) to have characters in a language you can't freaking understand tatooed on your skin. Anyone who's ever done this needs to be awarded 100 ghettofabulous points when they pass Go! and should also collect a free copy of Kardinall Offishall's UR Ghetto. Dang!

A recent spate of discussions about well-formed XML in the context of the ATOM syndication format kicked of by There are no exceptions to Postel's Law post has reminded me that besides using an implementation of the W3C DOM most developers do not have a general means of generating well-formed, correct XML in their applications. In the .NET Framework we provide the XmlWriter class for generating XML in a streaming manner but it is not without it's issues. In a recent blog post entitled Well-Formed XML in .NET, and Postel's Rebuttal Kirk Allen Evans writes

At any rate, Tim successfully convinced me that aggregators should not have the dubious task of “correcting“ feeds or displaying feeds that are not well-formed.

Yet I still have a concern about Tim's post, concerning XmlWriter and well-formedness:

PostScript: I just did the first proof on the first draft of this article. It had a mismatched tag and wasn’t well-formed. The publication script runs an XML parser over the draft and it told me the problem and I fixed it. It took less time than writing this postscript.

The problem is that it is quite possible to emit content using the XmlWriter that is not well-formed. From MSDN online's “Customized XML Writer Creation“ topic:

The XmlTextWriter does not verify that element or attribute names are valid.

The XmlTextWriter writes Unicode characters in the range 0x0 to 0x20, and the characters 0xFFFE and 0xFFFF, which are not XML characters.

The XmlTextWriter does not detect duplicate attributes. It will write duplicate attributes without throwing an exception.

Even using the custom XmlWriter implementation that is mentioned in the MSDN article does not remove the possibility of a developer circumventing the writing process:

Kirk provides a code sample that shows that even with an XmlWriter implementation that performs the well-formedness checks that are missing from the XmlTextWriter provided in v1.0 & v1.1 of the .NET Framework, a developer could still inadvertently write out malformed XML if they hand out the XML stream without closing the XmlTextWriter and thus closing all the unclosed tags.

In the next version of the .NET Framework we plan to provide an XmlWriter implementation that performs all the conformance checks required by the W3C XML 1.0 recommendation when generating XML [except for duplicate attribute checking].

Sam Ruby posted an RSS feed that was malformed XML which can be subscribed to from RSS Bandit without any complaints. I mentioned in a response to the post on Sam Ruby's blog that this is because RSS Bandit uses the XmlTextReader class in the .NET Framework which by default doesn't perform character range checking for numeric entities to ensure that the XML document does not contain invalid XML characters. To get conformant behavior from the XmlTextReader one needs to set its Normalization property to true. In retrospect this was an unfortunate design decision and we should have chosen the default to be conformant behavior but allowed users have the option to change it to unconformant behavior if it suited their needs not the other way around.

In the next version of the .NET Framework we plan to provide an implementation of the XmlReader which is fully conformant to the W3C XML 1.0 recommendation by default.

My boss, Mark Fussell, just purchased a Smart Watch with MSN Direct and I got to see one of them up close. In his post entitled Not an iPod Sheep [which is a reference to the fact that more and more folks on our team are geeking out Apple's lil' marvel] he writes

Today I picked up my rash and purely impulsive Christmas buy, a Fossil Wrist.NET Smart watch. It was probably sub-consciously induced by the new kid who came to our school (around 1977) with a calculator on his watch. No matter that it was impossible to press any of the buttons to do even the most simple sums and that this was tremendously useless, the fact that it was on a watch with a calculator built in made it ultra cool and an instant friend maker.

Now that I have my smart watch up and running (I had to leave the building and drive halfway home before it picked up a signal) I will say that it has some value. The #1 killer feature has to be the syncing with your Outlook calender appointments.... Of course having a wireless PC to look at the news and weather pretty much makes the other features on Smart watch useless, but “hey” I've just been told that George Bush wants to build a moon base by my watch - Wow! Now I can tell everyone all sorts of useless information. The #2 killer feature has to be the atomic clock accuracy, not that this is that necessary, but timing between meetings is everything. The #3 feature is the ability to send short (15 word?) instance messages to it.

Having a handy device that syncs to my Outlook calendar is something I definitely like but I consider a watch a fashion accessory not something that is primarily a gadget. The geek appeal of the watch is definitely high but I suspect I'll end up getting a SmartPhone instead. The main problem is that I'd like to be able to sync with Outlook when away from my work machine which may turn out to be quite expensive based on current cellular plans compared to having something like a SmartWatch.

Mark Pilgrim's recent post entitled There are no exceptions to Postel's Law among other things implies that news aggregators should process ill-formed XML feeds given that it is better for end users since they don't care about esoteric rules defined in the XML 1.0 recommendation but they do care if they can't read the news from their favorite sites.

This has unleashed some feedback from XML standards folks such as Tim Bray's On Postel, Again and Norman Walsh's On Atom and Postel's Law who argue that if an feed isn't well-formed XML then it is a fatal error. Aggregator authors have also gotten in the mix. Brent Simmons has a posted a number, of, entries on the topic where he mentions that NetNewsWire currently doesn't error on RSS feeds that are ill-formed XML if it work around the error but plans to change this for ATOM so that it errors on ill-formed feeds. Nick Bradbury has posted similarthoughts with regards to how FeedDemon has behaved in the past and will behave in future. On the other end of the spectrum is Greg Reinacker, the author of NewsGator, who has stated that NewsGator will process ill-formed RSS or ATOM feeds because he feels this is the best choice for his customers.

Personally I disagree with the first half of the law when applied to XML -- the idea that aggregators should bend over backwards to accept poorly formed XML. I always understood that XML was trying to do something different, as a response to the awful mess that HTML became because browser vendors adopted the first half of Postel's philosophy.

When I adopted XML, in 1997, as I understood it -- I signed onto the idea of rejecting invalid XML. It was considered a bug if you accepted invalid XML, not a bug if you didn't.

Brent Simmons, an early player in this market, says users are better served if he reads bad feeds, but when he does that, he's raising the barrier to entry, in undocumented ways that are hard to reproduce.

His interests are served by high barriers to entry, but the users do better if they have more choice.

Now, the users are happy as long as Brent is around to keep updating his aggregator to work around feed bugs, but he might move on, it happens for all kinds of reasons. It's better to insist on tight standards, so users can switch if they want to, for any reason; so that next year's feed will likely work with this year's aggregator, even if it doesn't dominate the market.

I yearn for just one market with low barriers to entry, so that products are differentiated by features, performance and price; not compatibility.

I work on the XML team at Microsoft and one of the things I have to do is coordinate with all the other teams using XML at Microsoft. The ability to consume and produce XML is or will be baked into a wide range of products including BizTalk, SQL Server, Word, Excel, InfoPath, Windows, and Visual Studio. This besides the number of developer technologies for processing XML from XQuery and XSLT to databinding XML documents to GUI components. In a previous post I mentioned my XML Litmus Test for deciding whether XML would beneifit your project

Using XML for a software development project buys you two things (a) the ability to interoperate better with others and (b) a number of off-the-shelf tools for dealing with format.

Encouraging the production and consumption of ill-formed XML damages both these benefits of using XML since interoperability is lost when different tools treat the same XML document differently and off-the-shelf tools can no longer be reliably used to process the documents of that format. This poisons the well for the entire community of developers and users.

Developers and users of RSS or ATOM can't reap the benefits of the various Microsoft technologies and products (i.e querying feeds using XQuery or storing feeds in SQL Server) if there is a proliferation of ill-formed feeds. So far this is not the case (ill-formed feeds are a minority) but every time an aggregator vendor decides to encourage content producers to generate ill-formed XML by working aroound it and displaying the feed to the user with no visible problems that is one more drop of cyanide in the well.

You know, I've been darn supportive of Microsoft's strategies lately. But, not this time. This strategy of "whine about lack of choice" isn't a winning one.

When someone is beating you in the marketplace, the thing to do isn't to whine about choice (and, if anyone says Apple isn't winning in the marketplace with its iPod then they are drinking far better Merlot than the $5.49 Columbia Crest stuff I can afford). A winning strategy, instead, would be to give consumers a better product and if you believe you have one, tell its story and don't knock the competition!

Shortly after joining Microsoft I attended a class on interacting with customers and competitiveness, where the presenter emphatically pointed out that Microsoft has zero credibility when it tries to attack other companies for being the 800 lb. gorilla in a particular market. At the time I thought this was mind-numbingly obvious and wondered why she was wasting our time telling us what everyone should know. After two years at Microsoft I now realize I was mistaken and many worker drones in the B0rg cube have no idea what the external perception of the company is actually like.

If it wasn't so sad given that I work here I'd find it hilarious that a Microsoft executive is actually trying to pull a “freedom of choice” argument given the company's history. Of course, the folks on Slashdot had a field day with that one.

TMC today launched TheServerSide.NET, an enterprise .NET architecture and development community. The launch is part of a vision aimed at building communities (online sites, conferences, user groups, etc) to serve all technology practitioners in the middleware industry. TSS.NET will be similar to TSS.com in style and quality, but both communities will be operated independently.

It looks like Ted Neward will be the editor-in-chief of the site. It looks like the site will be top heavy on it's focus on XML Web Services so I doubt I'll be subscribing to their RSS feed but if that's your bag it looks like a good site to check out. I've been reading TheServerSide.com for about two years and I've found it useful for getting insight into what's going on in the Java world.

Speaking of the Java world, which community blog site has better signal to noise ratio between Weblogs @ Java.net and Java.Blogs? I've been considering subscribing to one of them but I'm already swamped with lots of content of dubious quality from Weblogs @ ASP.NET and don't want to repeat the experience.

The MSN Direct (Wrist.NET) watch is the best PDA-Watch I've seen so far. As a Palm fan, I was stoked about the Palm PDA Watch, but it was WAY too big, and tried to be too much. I don't want to use a freakin' stylus on my watch. At the same time, I have been one of the 'bat-belt' people with a cell-phone, pager, PDA, digital camera and laptop (not to mention a Glucose Meter and Insulin Pump). I really don't need another battery to charge!

I don't expect a watch to replace my current ONE device - a Blackberry Phone (that handles email, calendar, web and cell phone on a single device) - but I would like something to provide me with a little more information than just the time, without making me feel overloaded with information. Plus, it has to look good and not make my one arm 5 lbs heavier than the other.

I have to agree with Rory and Scott, this is the dork watch. So far I haven't seen anyone at work rocking one of these but I'd definitely like to see one close to see if it confirms some suspicions I have about their trendiness and utility factor.

Yahoo plans to test RSS technology for its personalization tools, giving people the ability to automatically receive news and information feeds from third parties onto MyYahoo pages.

The Sunnyvale, Calif.-based company has been experimenting with technology called Really Simple Syndication (RSS), a format that is widely used to syndicate blogs, discussion threads and other Web content. Yahoo already started using RSS for its Yahoo News service, allowing other sites to automatically "scrape" Yahoo's top stories daily.

Last week, the company started beta testing RSS for MyYahoo, but soon pulled the experiment shortly after.

I've been using My Yahoo! almost from the beginning and have always wanted a way to plugin to their syndication architecture. Being able to syndicate RSS feeds directly into My Yahoo! content is a killer feature. I wonder if they'll go the Slashbox route or allow users to subscribe to feeds directly thus using My Yahoo! as an RSS news aggregator?

There’s tens of thousands of us who evangelize the company’s precedent-setting digital video recorder and how it has changed our lives. Online, 40,000 of TiVo’s customers have self-organized the TiVo Community forum, which we joined a year ago. The group is Beyond Thunderdome-loyal.

Browse the forums and you will find spirited discussions on topics as varied as these:

Why TiVo customers often take over for a hapless retail store salesperson

How-to guides on the best ways to convince a loved one to buy and keep a TiVo

The May 2004 conference in Las Vegas for TiVo enthusiasts that forum members are organizing

For most companies, a self-organized community of 40,000 passionate fans is unfathomable—a Holy Grail and marketing nirvana that many wish for but few attain.

The interesting thing is that I find myself to be one of these people. Whenever I start talking to someone who doesn't have a TiVo about owning one the conversation eventually a sales pitch. I've found that talking to people about the iPod to be the same way. Halfway through the conversation there's the frustration that washes over me because I can't seem to find the words to truly express to the person I'm talking to about how much the iPod or TiVo would change that aspect of their lives.

Watching TV hasn't been the same since I bought the TiVo and I can't imagine ever going back to not having one. Now I have my iPod I can't imagine what would possess me to buy a CD ever again yet I can listen the almost any song I've ever liked from James Brown to Metallica to 50 Cent anywhere I want, whenever I want.

I can't remember any technology ever affecting me this significantly. I believe when I first got a broadband connection it was the same thing and before that probably the first time I got on the World Wide Web. Before that nothing...

There have been a number of unhelpful suggestions recently on the Atom mailing list...

Another suggestion was that we do away with the Atom autodiscovery <link> element and “just” use an HTTP header, because parsing HTML is perceived as being hard and parsing HTTP headers is perceived as being simple. This does not work for Bob either, because he has no way to set arbitrary HTTP headers. It also ignores the fact that the HTML specification explicitly states that all HTTP headers can be replicated at the document level with the <meta http-equiv="..."> element. So instead of requiring clients to parse HTML, we should “just” require them to parse HTTP headers... and HTML.

Given that I am the one that made this unhelpful suggestion on the ATOM list it only seems fair that I clarify my suggestion. The current proposal for how an ATOM client (for example. a future version of RSS Bandit) determines how to locate the ATOM feed for a website or post a blog entry or comment is via Mark Pilgrim's ATOM autodsicovery RFC which basically boils down to parsing the webpage for <link> tags that point to the ATOM feed or web service endpoints. This is very similar to RSS autodiscovery which has been a feature of RSS Bandit for several months.

The problem with this approach is that it means that an ATOM client has to know how to parse HTML on the Web in all it's screwed up glory including broken XHTML documents that aren't even wellformed XML, documents that use incorrect encodings and other forms of tag soup. Thankfully on major platforms developers don't have to worry about figuring out how to rewrite the equivalent of the Internet Explorer or Mozilla parser themselves because others have done so and made the libraries freely available. For Java there's John Cowan's TagSoup parser while for C# there's Chris Lovett's SgmlReader (speaking of which it looks like he just updated it a few days ago meaning I need to upgrade the version used by RSS Bandit). In RSS Bandit I use SgmlReader which in general works fine until confronted with weirdness such as the completely broken HTML produced by old versions of Microsoft Word including tags such as

Over time I've figured out how to work past the markup that SgmlReader can't handle but it's been a pain to track down what they were and I often ended up finding out about them via bug reports from frustratedusers. Now Mark Pilgrim is proposing that ATOM clients must have to go through the same problems that're faced by folks like me who've had to deal with RSS autodiscovery.

So I proposed an alternative, that instead of every ATOM client having to require an HTML parser that instead this information is provided in a custom HTTP header that is returned by the website. Custom HTTP headers are commonplace on the World Wide Web and are widely supported by most web development technologies. The most popular extension header I've seen is the X-Powered-By header although I'd say the most entertaining is the X-Bender header returned by Slashdot which contains a quote from Futurama's Bender. You can test for yourself which sites return custom HTTP headers by trying out Rex Swain's HTTP Viewer. Not only is generating custom headers widely supported by web development technologies like PHP and ASP.NET but also extracting them from an HTTP response is fairly trivial on most platforms since practically every HTTP library gives you a handy way to extract the headers from a response in a collection or similar data structure.

If ATOM autodiscovery used a custom header as opposed to requiring clients to use an HTML parser it would make the process more reliable (no more worry about malformed [X]HTML borking the process) which is good for users as I can attest from my experiences with RSS Bandit and reduce the complexity of client applications (no dependence on a tag soup parsing library).

Reading Mark Pilgrim's post the only major objection he raises seems to be that the average user (Bob) doesn't know how add custom HTTP headers to their site which is a fallacious argument given that the average user similarly doesn't know how to generate an XML feed from their weblog either. However the expectation is that Bob's blogging software should do this not that Bob will be generating this stuff by hand.

The http-equiv attribute can be used in place of the name attribute and has a special significance when documents are retrieved via the Hypertext Transfer Protocol (HTTP). HTTP servers may use the property name specified by the http-equiv attribute to create an [RFC822]-style header in the HTTP response. Please see the HTTP specification ([RFC2616]) for details on valid HTTP headers.

That's right, the HTML spec says that authors can put <meta http-equiv="..."> in their HTMl documents and a web server gets a request for a document it should parse out these tags and use them to add HTTP headers to the response. In reality this turned out to be infeasible because it would be highly inefficient and require web servers to run a tag soup parser over a file each time they served it up to determine which headers to send in the response. So what ended up happening, is that certain browsers support a limited subset of the HTTP headers if they appear as <meta http-equiv="..."> in teh document.

It is unsurprising that Mark mistakes what ended up being implemented by the major browsers and web servers as what was in the spec after all he who writes the code makes the rules.

At this point I'd definitely like to see an answer to the questions Dave Winer asked on the atom-syntax list about its decision making process. So far it's seemed like there's a bunch of discussion on the mailing list or on the Wiki which afterwards may be ignored by the powers that be who end up writing the specs (he who writes the spec makes the rules). The choice of <link> tags over using RSD for ATOM autodiscovery is just one of many examples of this occurence. It'd be nice to some documentation of the actual process as opposed to the anarchy and “might is right” approach that currently exists.

The Boondocks comic is consistently funny unlike other online comics that have recently started falling off *cough*Sluggy*cough*. Also it is the only other regular newspaper comic that decries the insanity of the current situation in the US.

Slashdot ran yet another article on outsourcing today, this one about how Tech Firms Defend Moving Jobs Overseas. It had the usual comments one's come to expect from such stories. It's been quite interesting watching the attitudes of the folks on Slashdot over the past few years. I started reading the site around the time of the RedHat IPO when everyone was cocky and folks useed to brag about getting cars as signing bonuses. Then the beginning of the downturn when the general sentiment was that only those who couldn't hack it were getting fired. Then the feeling that the job loss was more commonplace started to spread and the xenophobic phase began with railings againsg H1Bs. Now it seems every other poster is either out of work or just got a job after being out of work for a couple of months. The same folks who used to laugh at the problems the RIAA had dealing with the fact that "their business model was obsolete in a digital world" now seek protectionalist government policies to deal with the fact that their IT careers are obsolete in a global economy.

Anyway, I digress. I found an interesting link in one of the posts to an article on FastCompany entitled The Wal-Mart You Don't Know. It begins

A gallon-sized jar of whole pickles is something to behold. The jar is the size of a small aquarium. The fat green pickles, floating in swampy juice, look reptilian, their shapes exaggerated by the glass. It weighs 12 pounds, too big to carry with one hand. The gallon jar of pickles is a display of abundance and excess; it is entrancing, and also vaguely unsettling. This is the product that Wal-Mart fell in love with: Vlasic's gallon jar of pickles.

Wal-Mart priced it at $2.97--a year's supply of pickles for less than $3! "They were using it as a 'statement' item," says Pat Hunn, who calls himself the "mad scientist" of Vlasic's gallon jar. "Wal-Mart was putting it before consumers, saying, This represents what Wal-Mart's about. You can buy a stinkin' gallon of pickles for $2.97. And it's the nation's number-one brand."

Therein lies the basic conundrum of doing business with the world's largest retailer. By selling a gallon of kosher dills for less than most grocers sell a quart, Wal-Mart may have provided a ser-vice for its customers. But what did it do for Vlasic? The pickle maker had spent decades convincing customers that they should pay a premium for its brand. Now Wal-Mart was practically giving them away. And the fevered buying spree that resulted distorted every aspect of Vlasic's operations, from farm field to factory to financial statement.

and has this somewhere in the middle

Wal-Mart has also lulled shoppers into ignoring the difference between the price of something and the cost. Its unending focus on price underscores something that Americans are only starting to realize about globalization: Ever-cheaper prices have consequences. Says Steve Dobbins, president of thread maker Carolina Mills: "We want clean air, clear water, good living conditions, the best health care in the world--yet we aren't willing to pay for anything manufactured under those restrictions."

which is particularly interesting given the various points I've seen raised about outsourcing in the IT field. The US is definitely headed for interesting times.

For the last couple of months I've noticed a rather annoying bug with my cellphone, an LG 5350. Whenever I enter a new contact it also copies the person's number over the number of an existing contact. If I later delete the new entry it also deletes the copied over number from the other contact. I've lost a couple of folk's phone numbers due to this annoyance. I'm now in the market for a new phone.

The main features I want besides the standard cell phone features (makes calls, addressbook) are the ability to sync with my calendar in Outlook and perhaps the ability to get information on traffic conditions as well.

In fact, this separation of the private and more general query mechanism from the public facing constrained operations is the essence of the movement we made years ago to 3 tier architectures. SQL didn't allow us to constrain the queries (subset of the data model, subset of the data, authorization) so we had to create another tier to do this.

What would it take to bring the generic functionality of the first tier (database) into the 2nd tier, let's call this "WebXQuery" for now. Or will XQuery be hidden behind Web and WSDL endpoints?

Every way I try to interpret this it seems like a step back to me. It seems like in general the software industry decided that exposing your database & query language directly to client applications was the wrong way to build software and 2-tier client-server architectures giving way to N-tier architectures was an indication of this trend. I fail to see why one would think it is a good idea to allow clients to issue arbitrary XQuery queries but not think the same for SQL. From where I sit there is basically little if any difference from either choice for queries. Note that although SQL is also has a Data Definition Langauge (DDL) and Data Manipulation Language (DML) as well as a query language for the purposes of this discussion I'm only considering the query aspects of SQL.

David then puts forth some questions about this idea that I can't help offering my opinions on

If this is an interesting idea, of providing generic and specific query interfaces to applications, what technology is necessary? I've listed a number of areas that I think need examination before we can get to XQuery married to the Web and to make a generic second tier.

1. How to express that a particular schema is queryable and the related bindings and endpoint references to send and receive the queries. Some WSDL extensions would probably do the trick.

One thing lacking in the XML Web Services world are the simple REST-like notions of GET and POST. In the RESTful HTTP world one would simply specify a URI which one could perform an HTTP GET on an get back an XML document. One could then either use the hierarchy of the URI to select subsets of the document or perhaps use HTTP POST to send more complex queries. All this indirection with WSDL files and SOAP headers yet functionality such as what Yahoo has done with their Yahoo! News Search RSS feeds isn't straightforward. I agree that WSDL annotations would do the trick but then you have to deal with the fact that WSDL's themselves are not discoverable. *sigh* Yet more human intervebtion is needed instead of loosely coupled application building.

2. Limit the data set returned in a query. There's simply no way an large provider of data is going to let users retrieve the data set from a query. Amazon is just not going to let "select * from *" happen. Perhaps fomal support in XQuery for ResultSets to be layered on any query result would do the trick. A client would then need to iterate over the result set to get all the results, and so a provider could more easily limit the # of iterations. Another mechanism is to constrain the Return portion of XQuery. Amazon might specify that only book descriptions with reviews are returnable.

This is just a difficult problem. Some queries are complex, computationally intensive but return few results. In some cases it is hard to tell by just looking at the query how badly it'll perform. A notion of returning result sets makes sense in a mid-tier application that's talking to a database but not to client app half-way across the world talking to a website.

3. Subset the Xquery functionality. Xquery is a very large and complicated specification. There's no need for all that functionality in every application. This would make implementation of XQuery more wide spread as well. Probably the biggest subset will be Read versus Update.

Finally something I agree with although David shows some ignorance of XQuery by assuming that there is an update aspect to XQuery when DML was shelved for the 1.0 version. XQuery is just a query language. However it is an extremely complex query language which is hundreds of pages in specification long. The most relevant specs from the W3C XML Query page are linked to below

I probably should also link to the W3C XML Schema: Structures and W3C XML Schema: Datatypes specs since they are the basis of the type system of XQuery. My personal opinion is that XQuery is probably too complex to use as the language for such an endeavor since you want something that is simple to implement and fairly straightforward so that there can be ubiqitous implementations and therefore lots of interoperability (unlike the current situation with W3C XML Schema). I personally would start with XPath 1.0 and subset or modify that instead of XQuery.

4. Data model subsets. Particular user subsets will only be granted access to a subset of the data model. For example, Amazon may want to say that book publishers can query all the reviews and sales statistics for their books but users can only query the reviews. Maybe completely separate schemas for each subset. The current approach seems to be to do an extract of the data subset accoring to each subset, so there's a data model for publishers and a data model for users. Maybe this will do for WebXQuery.

5. Security. How to express in the service description (wsdl or policy?) that a given class of users can perform some subset of the functionality, either the query, the data model or the data set. Some way of specifying the relationship between the set of data model, query functionality, data set and authorization.

I'd say the above two features are tied together. You need some way to restrict what the sys admin vs. the regular user executing such a query over the wire can do as well as a way to authenticate them.

6. Performance. The Web has a great ability to increase performance because resources are cachable. The design of URIs and HTTP specifically optimizes for this. The ability to compare URIs is crucial for caching., hence why so much work went into specifying how they are absolutized and canonically compared. But clearly XQuery inputs are not going to be sent in URIs, so how do we have cachable XQueries gven that the query will be in a soap header? There is a well defined place in URIs for the query, but there isn't such a thing in SOAP. There needs to be some way of canonicalizing an Xquery and knowing which portions of the message contain the query. Canonicalizing a query through c14n might do the trick, though I wonder about performance. And then there's the figuring out of which header has the query. There are 2 obvious solutions: provide a description annotation or an inline marker. I don't think that requiring any "XQuery cache" engine to parse the WSDL for all the possible services is really going to scale, so I'm figuring a well-defined SOAP header is the way to go.

Sounds like overthinking the problem and yet not being general enough. The first problem is that there should be standard ways that proxies and internet caches know how to cache XML Web Service results in the same way that they know how to cache results of HTTP GET requests today. After that figuring out how to canonicalize a query expression (I'm not even sure what that means- will /root/child and /root/*[local-name()='child'] be canonicalized into the same thing?) is probably a couple of Ph.D theses of work.

Then there's just the fact that allowing clients to ship arbitrary queries to the server is a performance nightmare waiting to happen...

Your thoughts? Is WebXQuery an interesting idea and what are the hurdles to overcome?

It's an interesting idea but I suspect not a very practical or useful one outside of certain closed applications with strict limitations on the number of users or the type of queries issued.

Anyway, I'm off to play in the snow. I just saw someone skiing down the street. Snow storms are fun.

By the way, Adam Bosworth said a great many other interesting things in his XML 2003 talk. For those of you not inclined to watch this QuickTime clip -- and in particular for the search crawlers -- I would like to enter the following quote into the public record.

The reason people get scared of queries is that it's hard to say 'You can send me this kind of query, but not that kind of query.' And therefore it's hard to have control, and people end up building other systems. It's not clear that you always want query. Sometimes people can't handle arbitrary statements. But we never have queries. I don't have a way to walk up to Salesforce and Siebel and say tell me everything I know about the customer -- in the same way. I don't even have a way to say tell me everything about the customers who meet the following criteria. I don't have a way to walk up to Amazon and Barnes and Noble and in a consistent way say 'Find me all the books reviewed by this person,' or even, 'Find me the reviews for this book.' I can do that for both, but not in the same way. We don't have an information model. We don't have a query model. And for that, if you remember the dream we started with, we should be ashamed.

I think we can fix this. I think we can take us back to a world that's a simple world. I think we can go back to a world where there are just XML messages flowing back and forth between...resources. <snipped />

Three things jump out at me from that passage. First, the emphasis on XML query. My instincts have been leading me in that direction for a while now, and much of my own R&D in 2003 was driven by a realization that XPath is now a ubiquitous technology with huge untapped potential. Now, of course, XQuery is coming on like a freight train.

When Don and I hung out over the holidays this was one of the things we talked about. Jon's post has been sitting flagged for follow up in my aggregator for a while. Here are my thoughts...

The main problem is that there are a number of websites which have the same information but do not provide a uniform way to access this information and when access mechanisms to information are provided do not allow ad-hoc queries. So the first thing that is needed is a shared view (or schema) of what this information looks like which is the shared information model Adam talks about. There are two routes you can take with this, one is to define a shared data model with the transfer syntax being secondary (i.e. use RDF) while another is to define a shared data model and transfer syntax (i.e use XML). In most cases, people have tended to pick the latter.

Once an XML representation of the relevant information users are interested has been designed (i.e. the XML schema for books, reviews and wishlists that could be exposed by sites like Amazon or Barnes & Nobles) the next technical problem to be solved is uniform access mechanisms. The eternal REST vs. SOAP vs. XML-RPC that has plagued a number of online discussions. Then there's deployment, adoption and evangelism.

Besides the fact that I've glossed over the significant political and business reasons that may or may not make such an endeavor fruitful we still haven't gotten to Adam's Nirvana. We still need a way to process the data exposed by these web services in arbitrary ways. How does one express a query such as "Find all the CDs released between 1990 and 1999 that Dare Obasanjo rated higher than 3 stars"? Given the size of the databases hosted by such sites would it make more sense to ship the documents to the client or some mid-tier which then performs the post-processing of the raw data instead of pushing such queries down to the database? What are the performance ramifications of exposing your database to anyone with a web browser and allowing them to run ad-hoc queries instead of specially optimized, canned queries?

At this point if you are like me you might suspect that defining that the web service endpoints return the results of performing canned queries which can then be post processed by the client may be more practical then expecting to be able to ship arbitrary SQL/XML, XQuery or XPath queries to web service end points.

The main problem with what I've described is that it takes a lot of effort. Coming up with standardized schema(s) and distributed computing architecture for a particular industry then driving adoption is hard even when there's lots of cooperation let alone in highly competitive markets.

In an ideal world, this degree of boot strapping would be unnecessary. After all, people can already build the kinds of applications Adam described today by screen scraping [X]HTML although they tend to be brittle. What the software industry should strive for is a way to build such applications in a similarly loosely connected manner in the XML Web Services world without requiring the heavy investment of human organizational effort that is currently needed. This was the initial promise of XML Web Services which like Adam I am ashamed has not come to pass. Instead many seem to be satisfied with reinventing DCOM/CORBA/RMI with angle-brackets (then replacing it with "binary infosets"). Unfortunate...

The first day of work yesterday after most people being gone for 2 or 3 weeks was refreshing. I attended a few meetings about design issues we've been having and people involved seemed to be coalescing around decisions that make sense. A good start to the new year. The main problems seem to not be be making changes but not breaking folks as we make the changes. Processing XML in the next version of the .NET Framework is going to kick the llama's azz.

I already have lots of follow up meetings already scheduled for tomorrow (not today since we're supposed to have a record snowfall for this area today and most people here don't do snow driving). I'll probably be working from home, the streets look dangerous from my the vantage point of my bedroom window.

As far as gzipped feeds go, about 10% of the feeds in my NNW (about 900) are gzipped. That's a lot worse than I expected. I understand that this can be tough -- the easiest way to implement gzipping is todo what Brent suggested, shove it off to Apache. That means that people who are being hosted somewhere need to know enough Apache config to turn gzip on. Not likely. Or have enlighted hosting admins that automatically turn it on, but that' doesn't appear to be the case. So blogging software vendors could help a lot by turning gzip support on in the software.

What's even more depressing is that for HTTP conditional get, the figure is only about 33% of feeds. And this is something that the blogging software folks should do. We are doing it in pyblosxom.

This is quite unfortunate. Every few months I read some "sky is falling" post about how the bandwidth costs of polling RSS feeds yet many blogging tools don't support existing technologies that could reduce the badnwidth cost of RSS polling by an order of magnitude. I am guilty of this as well.

I'll investigate how difficult it'll be to practice what I preach in the next few weeks. It looks like an upgrade to my version of dasBlog is in the works.

While the name of this site is BAD FADS, please note that this is neither an indictment nor an endorsement of any of the fads mentioned. As you know, during the '70s the word "bad" could alternately mean "good!" Thus, this site was created to take a fun and nostalgic look at fashions, collectibles, activities and events which are cherished by some and ridiculed by others.

The "Locate Feed" button that uses Mark Pilgrim's Ultra-liberal RSS autodiscovery algorithm to locate a feed for a site shows a weird error message if it couldn't make sense of the HTML because it was too malformed (i.e tag soup) tag soup. There are a bunch of things I could change here from using a better error message to falling back to using Syndic8 to find the feed.

I'll fix both of these bugs before heading out to work today. Hopefully this should take care of the problems various people have had and probably never mentioned with adding feeds to RSS Bandit.

I've played with RSS Bandit and there were some recent laudatory posts about the latest versions, so this morning I downloaded a copy (after doing Windows update for Win2K on the Thinkpad, rebooting, installing a newer version of the .NET framework, and rebooting...) and installed it. Things seemed fine, at least until I started adding feeds. The first two feeds I added were hers and mine. RSS Bandit choked on both. Now we have a internal/external network setup, complete with split DNS and a whole bunch of other stuff. I figured that might be a problem, and started tweaking. The deeper I got, the more I realized it wasn't going to work. I foresaw many pleas for technical support followed by frustration -- I mean, *I* was frustrated. So I dumped that and went for Plan B, as it were.

What's truly weird about his post is that I was reading it in RSS Bandit which means reading his feed works fine for me on my machine but for some reason didn't work with his. In fact, I just checked his wife's blog and again no problems reading it in RSS Bandit. *sigh*

Many people who use pirated products justify it by claiming they're only stealing from rich mega-corporations that screw their customers, but this conveniently overlooks the fact that the people who are hurt the most by piracy are people like me.

Shareware developers are losing enormous amounts of money to piracy, and we're mostly helpless to do anything about it. We can't afford to sue everyone who steals from us, let alone track down people in countries such as Russia who host web sites offering pirated versions of our work...Some would argue that we should just accept piracy as part of the job, but chances are the people who say this aren't aware of how widespread piracy really is. A quick look at my web server logs would be enough to startle most people, since the top referrers are invariably warez sites that link to my site (yes, not only do they steal my software, but they also suck my bandwidth).

A couple of years ago I wanted to get an idea of how many people were using pirated versions of TopStyle, so I signed up for an anonymous email account (using a "kewl" nickname, of course) and started hanging out in cracker forums. After proving my cracker creds, I created a supposedly cracked version of TopStyle and arranged to have it listed on a popular warez site....This cracked version pinged home the first time it was run, providing a way for me to find out how many people were using it. To my dismay, in just a few weeks more people had used this cracked version than had ever purchased it. I knew piracy was rampant, but I didn't realize how widespread it was until this test.

Nick has no innate right to have people pay for his software, just as I have no right to ask people to pay for use of my name.

Even if he did, most people who pirate his software probably would never use it anyway, so they aren't costing him any money and they're providing him with free advertising.

And of course it makes sense that lots of people who see some interesting new program available for free from a site they're already at will download it and try it out once, just as more people will read an article I wrote in the New York Times than on my weblog.

...

Yes, piracy probably does take some sales away from Nick, but I doubt it's very many. If Nick wants to sell more software, maybe he should start by not screaming at his potential customers. What's next? Yelling at people who use his software on friends computers? Or at the library?

Aaron's arguments are so silly they boggle the mind but let's take them one at a time. Human beings have no innate rights. Concepts such as "unalienable rights" and documents such as the Bill of Rights have been agreed upon by some societies as the law but this doesn't mean they are universal or would mean anything if not backed up by the law and its enforcers. Using Aaron's argument, Aaron has no innate right to live in a house he paid for, eat food he bought or use his computer if some physically superior person or armed thug decides he covets his possessions. The primary thing preventing this from being the normal state of affairs is the law, the same law that states that software piracy is illegal. Western society has decided that Capitalism is the way to go (i.e. a party provides goods or services for sale and consumers of said goods and services pay for them). So for whatever definition of "rights" Aaron is using Nick has a right to not have his software pirated.

Secondly, Aaron claims that if people illegally utilizing your software can't afford it then it's OK for them to do so. This argument is basically, "It's OK to steal if what you want is beyond your purchasing power". Truly, why work hard and save for what you want when you can just steal it. Note that this handy rule of Aaron's also applies to all sorts of real life situations. Why not shoplift, after all big department store chains can afford it anyway and in fact they factor that into their prices? Why not steal cars or rob jewellery stores if you can't afford them after all, it's all insured anyway right? The instant gratification generation truly is running amok.

The best part of Aaron's post is that even though Nick states that there are more people using pirated versions of his software than those that paid for it Aaron dismisses this by saying that his personal opinion is that there wouldn't have been many lost sales by piracy then it devolves into some slippery slope argument about whether people should pay for using Nick's software on a friend's computer or at the library. Of course, the simple answer to this question is that by purchasing the software the friend or the library can let anyone use it, the same way that I can carry anyone in my car after purchasing it.

My personal opinion is that if you think software is too expensive then (a) use cheaper alternatives (b) write your own or (c) do without it after all no one needs software. Don't steal it then try and justify your position with inane arguments that sound like the childish "information wants to be free" rants that used to litter Slashdot during the dotbomb era.

A common problem for users of desktop information aggregators is that there is currently no way to synchronize the state of information aggregators used on different machines in the same way that can be done with email clients today. The most common occurence of this is a user that uses a information aggregator at home and at work or at school who'd like to keep the state of each aggregator synchronized independent of whether the same aggregator is used on both machines.

The purpose of this specification is to define an XML format that can be used to describe the state of a information aggregator which can then be used to synchronize another information aggregator instance to the same state. The "state" of information aggregator includes information such as which feeds are currently subscribed to by the user and which news items have been read by the user.

This specification assumes that a information aggregator is software that consumes an XML syndication feed in one of the following formats; ATOM, [RSS0.91], [RSS1.0] or [RSS2.0]. If more syndication formats gain prominence then this specification will be updated to take them into account.

I've written what should be the final draft of the specification for the "feed" URI scheme. From the abstract

This document specifies the "feed" URI (Uniform Resource Identifier) scheme for identifying data feeds used for syndicating news or other content from an information source such as a weblog or news website. In practice, such data feeds will most likely be XML documents containing a series of news items representing updated information from a particular news source.

The primary change from the previous version was to incorporate feedback from Graham Parks about compliance with RFC 2396. The current grammar for the "feed" URI scheme is

One of the trying parts of being a tester for me was I'd file a bug and the bug with resolve it as "Not Repro" which loosely translates to "it worked on my machine". Half the time, the devs would be right and half the times I'd be. It was always a pain trying to figure out whose machine was at fault and which was the most recent build, dependencies and the whole shebang.

I didn't expect for this to happen. RSS Bandit just gave me a huge disappointment. I really thought this was it, that it works OK now, but I was wrong. Out of about 150 feeds, RSS Bandit can't parse 22 of them. And not esoteric ones either:

Jermemy,HackNot,Feedster,Dino,Brain.Save() and a good few more. These are all feeds that work very well in sharpreader. Now, the only thing I can think of is that these feeds do not conform to all the rules an RSS needs to conform to. That means that RSS bandit is somehow too '"strict" enforcing those rules (just a guess). If I can still get these feeds some other place you can be sure that's the path I'll take.

All of the aforementioned feeds work on my machine. Screenshot below. Granted I'm using the current snapshot from SourceForge but the RSS handling code hasn't changed much besides a change I just made to fix the fact that in some cases the cache loses items if you hit [Refresh Feeds] on startup.

Roy Osherove has a recent blog entry entitled Moving to RSS Bandit: A simple review where he talks about why he's switched his primary news aggregator from SharpReader to RSS Bandit. In his post he asks a couple of questions most of which are feature requests. My questions and my answers are below.

The Feed tree can only be widened to a certain extent. Why is that?

I'm not sure about the answer to this one. Torsten writes the UI code and he's currently on vacation. I assume this is done so that you can't completely cover one panel with another.

Posting to my blog from it

You can post to your blog from RSS Bandit using the w.bloggar plugin developed by Luke Hutteman. I've assigned a feature request bug to myself to ensure that this plugin should be installed along with RSS Bandit.

a "blog about this item" feature which automatically asks you what parts of the item you'd like to be inserted into the new blog post (title,author name, quoted text...)

Once the ATOM effort produces decent specs around a SOAP API for posting to blogs and the various blogging tools start to support it then this will be native functionality of RSS Bandit. No ETA for this feature since it is dependent on a lot of external issues.

Pressing space while reading a long blog post does not scroll the explorer pane of the post(unless it is focused), but automatically takes you to the next unread post. I wish that would behave like SR where it would scroll the post until it ends and only then take you to the next one

I'll mention this to Torsten when he gets back although I'm sure he'll read this entry before I get to.

I wish there was an ability to choose whether you can order the feed tree alphabetically or by a distinct order the user wants (like SR)

I've always thought this was a weird feature request. I remember Torsten didn't like having to implement it and the main justification for having the feature I've heard from a user is satisfied with Search Folders.

For some reason, some of the posts are blue and some not. What does that mean?

Blue means they contain a link to your webpage [as specified by you in the preferences dialog]. It's a handy visual way to determine when posts link to your blog. Again, this functionality is probably superceded by Search Folders.

I'd like to know how far down the feed list is the updating process when I press the "update all feeds" (a simple XX feeds left to update should do)

Another feature request for Torsten. I do like the fact that we now (in current builds not yet publicly released) provide visual indication of when items are being downloaded from a feed and when an error occurs during the downloading process.

Why is there a whole panel just for search when all there is is just a small text box? Why not simply put that box on the main tool bar?

The UI work for the Search feature isn't done yet. We will use up all that space once everything is done.

While we're at it, entering text in the search box and pressing enter should automatically run search( i.e the Search Button should be the default button when the text box is active)

Agreed. This will be fixed.

I'd like to be able to set the default update rate for a category(which will impact all feeds in it) and not just for the whole feeds globally using the main options dialog

This makes sense. However there is some complexity in that categories can nest and so on. I'll think about it.

NO RSS aggregator I've seen yet has been able to do this simple task: in the main .Net weblogs feed, show the name of the post author\Blog name next to the post title. Is this information simply missing from the feed? If not, how hard would it be to implement this?

This information is shown in the Reading Pane. Would you like to see this in the list view? For most blogs this would be empty (since the dc:author & author elements are rarely used) or redundant since most feeds are produced by a single blog.

I'd like to be able to setup the viewer pane to the right, and the posts pane to the bottom left (like in outlook's 2003 default view or like FeedDemon)

This is in the current builds although the feature is hidden. You have to right-click on the 'Feed Details' tab. I plan to talk to Torsten about making this a toolbar button like in Outlook/Outlook Express.

It snowed yesterday in the Seattle area. It was nice watching the snowflakes fall and afterwards I had the first snowball fight of my life. Then I got in my car, turned on the heat and drove a few blocks to the video store. By the time I got out of the store there was a crack almost a foot long on the driver's side of the windshield.

I want this also.I want a theory that unifies objects and data.We're not there yet.

With a relational database, you have data and relationships, but no objects.If you want objects, that's your problem, and the problem isn't insignificant.There’s been a parade of tools and technologies, and all of them have fallen short on the promise of bridging the gap.There's the DataSet, which seeks to be one bucket for all data.It's an object, but it doesn't give you an object view of the actual data.It leaves you doing things like ds.Tables["Customer"].Rows[0]["FirstName"].ToString().Yuck.Then there are Typed DataSets.These give you a pseudo-object view of the data, letting you do: ds.Customer[0].FirstName.Better, but still not what I really want.And it's just code-gen on top of the DataSet.There's no real "Customer" object here.

Then, there are ObjectSpaces that let you do the XSD three-step to map classes to relational data in the database.With ObjectSpaces you get real, bona fide objects.However, this is just a bunch of goo piled on top of ADO.NET, and I question the scalability of this approach.

Then there are UDTs.In this case, you've got objects all the way into the database itself, with the object serialized as one big blob into a single column.To find specific objects, you have to index the properties that you care about, otherwise you're looking at not only a table scan, but rehydrating every row into an object to see if it's the object you're looking for.

There's always straight XML, but at this point you're essentially saying, "There are no objects".You have data, and you have schema.If you're seeing objects, it's just an optical illusion on top of the angle brackets.In fact, with Web services, it's emphatically stated that you're not transporting objects, you're transporting data. If that data happens to be the serialization of some object, that's nice, but don't assume for one second that that object will exists on the other end of the wire.

And speaking of XML, Yukon can store XML as XML.Which is to say you have semi-structured data, as XML, stored relationally, which you could probably map to an XML property of an object with ObjectSpaces.

What happens when worlds collide? Will ObjectSpaces work with Yukon UDTs and XML?

Oh, and don't forget XML Views, which let you view your relational data as XML on the client, even though it's really relational.

<snip />

So for a given scenario, do all of you know which technology to pick?I'm not too proud to admit that honestly I don't.In fact, I honestly don't know if I'll have time to stress test every one of these against a number of real problem domains and real data. And something tells me that if you pick the wrong tool for the job, and it doesn't pan out, you could be pretty hosed.

Today we have a different theory for everything.I want the Theory of Everything.

The team I work for deals with data access technologies (relational, object, XML aka ROX) so this impedance mismatch is something that we have to rationalize all the time.

Up until quite recently the primary impedance mismatch application developers had to deal with was the Object<->Relational impedance mismatch. Usually data was stored in a relational database but primarily accessed, manipulated and transmitted over the network as objects via some object oriented programming language. Many felt (and still feel) that this impedance mismatch is a significant problem. Attempts to reduce this impedance mismatch has lead to technologies such as object oriented databases and various object relational mapping tools. These solutions take the point of view that the problem of having developers deal with two domains or having two sets of developers (DB developers and application coders) are solved by making everything look like a single domain, objects. One could also argue that the flip side of this is to push as much data manipulation as you can to the database via technologies like stored procedures while mainly manipulating and transmitting the data on the wire in objects that closely model the relational database such as the .NET Framework's DataSet class.

Recently a third player has appeared on the scene, XML. It is becoming more common for data to be stored in a relational database, mainly manipulated as objects but transmitted on the wire as XML. One would then think that given the previously stated impedance mismatch and the fact that XML is mainly just a syntactic device that XML representations of the data being transmitted is sent as serialized versions of objects, relational data or some subset of both. However, what seems to be happening is slightly more complicated. The software world seems to moving more towards using XML Web Services built on standard technologies such as HTTP, XML, SOAP and WSDL to transmit data between applications. And taken from the WSDL 1.1 W3C Note

WSDL recognizes the need for rich type systems for describing message formats, and supports the XML Schemas specification (XSD) [11] as its canonical type system

So this introduces a third type system into the mix, W3C XML Schema structures and datatypes. W3C XML Schema has a number of concepts that do not map to concepts in either the object oriented or relational models. To properly access and manipulate XML typed using W3C XML Schema you need new data access mechanisms such as XQuery. Now application developers have to deal with 3 domains or we need 3 sets of developers. The first instinct is to continue with the meme where you make everything look like objects which is what a number of XML Web Services toolkits do today including Microsoft's .NET Framework via the XML Serialization technology. This tends to be particularly lossy because traditionally object oriented systems do not have the richness to describe the constraints that are possible to create with a typical relational database let alone the even richer constraints that are possible with W3C XML Schema. Thus such object oriented systems must evolve to not only capture the semantics of the relational model but those of the W3C XML Schema model as well. Another approach could be to make everything look like XML and use that as the primary data access mechanism. Technologies already exist to make relational databases look like XML and make objects look like XML. Unsurprisingly to those who know me, this is the approach I favor. The relational model can also be viewed as a universal data access mechanism if one figured out how to map the constraints of the W3C XML Schema model. The .NET Framework's DataSet already does some translation of an XML structure defined in a W3C XML Schema to a relational structure.

The problem with all three approaches I just described is that they are somewhat lossy or involve hacking one model into becoming the uber-model. XML trees don't handle the graph structures of objects well, objects can't handle concepts like W3C XML Schema's derivation by restriction and so on. There is also a fourth approach which is endorsed by Erik Meijer in his paper Unifying Tables, Objects, and Documents where one creates a new unified model which is a superset of the pertinent features of the 3 existing models. Of course, this involves introducing a fourth model.

The fourth model mentioned above is the unified theory of everything that Scott or Sean is asking for. Since the last time I made this post, my friend Erik Meijer has been busy and produced another paper that shows what such a unification of the ROX triangle would look like if practically implemented as a programming language in his paper Programming with Circles, Triangles and Rectangles. In this paper Erik describes the research language Xen which seems to be the nirvana Scott or Sean is looking for. However this is a research project and not something Sean or Scott will be likely to use in production in the next year.

The main problem is that Microsoft has provided .NET developers with too much choice when it comes to building apps that retrieve data from a relational store, manipulate the data in memory then either push the updated information back to the store or send it over the wire. The one thing I have learned working as a PM on core platform technologies is that our customers HATE choice. It means having to learn multiple technologies and make decisions on which is the best, sometimes risking making the wrong choice. This is exactly the problem Scott or Sean is having with the technologies we announced at the recent Microsoft Professional Developer Conference (PDC) which will should be shiping this year. What technology should I use and when I should I use it?

This is something the folks on my team (WebData - the data access technology team) know we have to deal with when all this stuff ships later this year which we will deal with to the best of our ability. Our users want architectural guidance and best practices which we'll endeavor to make available as soon as possible.