XML: Schema Breeds Intolerance

I was at an offsite a while ago where a bunch of us got together to talk about some technology thing, some potential product or what not, that may not see the light of day for a while.I had a chance to give a presentation on some of my forays into data query languages.Of course, there were other presentations too, and mine was not the first, so as I sat there admiring the Hotel’s wireless net connection, googling for random words, thinking about work related things, and paying full attention to the current talk by making use of my many co-processors, out of the corner of my eye I notice a slip of paper with tiny square stickers.They were our parking validation stickers, left by the hotel management, which is an important thing if you don’t want to get stuck paying to get your car out at the end of the day.

Now I have a whimsical mind that favors pedantic puns, and so it is often spinning extra cycles in the background on odd word associations.One of the reasons my wife often asks what’s on my mind is that she knows there’s usually something brewing, and of course she is right because quite often for apparently no reason I will just start spouting some obscure joke or quip.She’s a good sport and tries to laugh at most of them, though it also makes her wonder how much time I actually spend trying to dream them up, but that’s not how it works at all.. They just sort of pop up out of thin air.I’m telling you it’s the co-processors.They are always churning out stuff.

Anyway, my bank of x86’s apparently slid the word ‘validate’ over to meet the word the ‘data’ in a mental re-enactment of a bad episode of the electric company and I get spun in the direction of thinking about XML Schemas.Now, just tell me the truth, wouldn’t you?Of course you would. That’s the first thing that comes to mind when thinking about data validation.You do validate your data, don’t you?I mean, you wouldn’t want just any old data to come slinking in off the street and set up shop in your application?Would you?

To tell you the truth, I’m not sure anymore.I mean, there are plenty of good reasons to have schema’s around.They describe your data layout.For things like WSDL, they give you a ‘crisp’ contract for describing how data is passed between interfaces, etc.You can feed them into tools that do code generation to build you proxies and data structures.But this is all really just static stuff you do with schemas.It’s not really runtime stuff.

You probably don’t use the schema information itself at runtime. If your app is sucking in data using the XmlReader or the DOM, then your app is probably structured around the data you expect anyway.It probably reads and recognizes particular tags in particular namespaces.It probably queries into documents looking for those very specific names.So what if there might be stuff inside that your app does not recognize.Isn’t that the whole point of XML in the first place?Doesn’t it allow you to build applications that are resilient to change and that rare unexpected integer?What good does it do to actually pay the overhead of validating your data stream against the actual physical schema when your app is only going to grovel over what it understands anyway?

It seems that getting overly anal about schema just leads to making apps brittle, and that should be a bad thing.We should work on making apps more tolerant of each other.Live and let live, I say.

Sounds like you need a schema validator that validates any document which implements every required item in the schema, and ignores anything extra. Like a set of XPath statements…as long as every XPath retrieves at least one element, that’s a valid schema.

Then you can publish that set of XPaths, saying "I need at least this." Sorta like interfaces in OOP…a class can have all kinds of extra junk, but as long as it implements the interface you want, it’s cool.