I’ve been playing with the January CTP of WCF, and I’ve encountered what seems like a pretty major setback. I’ve got an interface that takes a MessageContract and returns a MessageContract. All well and good. But then I want to use the AsyncPattern on the service side, so that my routine will get called on a different thread from the one that’s listening on the network. So I decorate the interface like so:

Now I get an exception at runtime, which says that I can’t mix parameters and messages for the method “EndSignon”. What it means is that if I return a MessageContract instead of a primitive type, my method has to take a MessageContract and not one or more primitive types. OK, I get that. But my EndSignon method is getting flagged because it takes an IAsyncResult (as it must according to the AsyncPattern) and returns a ThingyResponseMessage.

Does this mean I can’t use MessageContracts with the AsnycPattern? If so, LAME. If not, then what am I missing?

I’ve been working on a project that required me to turn some CLR types into a set of XML Schema element definitions so that they can be included in another file. It stumped me for a while, and I envisioned having to reflect over all my types and build schema myself, which would be a total drag.

Then I remembered that this is exactly what xsd.exe does. Thank the heavens for Reflector! It turns out to be really simple, just undocumented…

XmlReflectionImporter importer1 = new XmlReflectionImporter();

XmlSchemas schemas = new XmlSchemas();

XmlSchemaExporter exporter1 = new XmlSchemaExporter(schemas);

Type type = typeof(MyTypeToConvert);

XmlTypeMapping map = importer1.ImportTypeMapping(type);

exporter1.ExportTypeMapping(map);

It’s that easy! The XmlSchemaExporter will do all the right things, and you can do this with a bunch of types in a loop, then check your XmlSchemas collection. It will contain one XmlSchema per namespace, with all the right types, just as if you’d run xsd.exe over your assembly.

Even better, if there’s stuff in your CLR types that isn’t quite right, you can use XmlAttributeOverrides just like you can with the XmlSerializer. So if you want to exclude a property called “IgnoreMe” from your MyTypeToConvert type…

// Create the XmlAttributeOverrides and XmlAttributes objects.

XmlAttributeOverrides xOver = new XmlAttributeOverrides();

XmlAttributes attrs = new XmlAttributes();

/* Use the XmlIgnore to instruct the XmlSerializer to ignore

the IgnoreMe prop */

attrs = new XmlAttributes();

attrs.XmlIgnore = true;

xOver.Add(typeof(MyTypeToConvert), "IgnoreMe", attrs);

XmlReflectionImporter importer1 = new XmlReflectionImporter(xOver);

XmlSchemas schemas = new XmlSchemas();

XmlSchemaExporter exporter1 = new XmlSchemaExporter(schemas);

Type type = typeof(MyTypeToConvert);

XmlTypeMapping map = importer1.ImportTypeMapping(type);

exporter1.ExportTypeMapping(map);

That’ll get rid of the IgnoreMe element in the final schema. It took a bit of Reflectoring, but this saves me a ton of time.

I’ll be teaching Introduction to Web Services (CST 407) at OIT (Portland) this Fall. Tell your friends! We’ll be covering the basics of Web Services, including theory, history, best practices, and a firm grounding in underlying technologies like XML and SOAP. Should be a good time. If you are interested you should be prepared to write code in C#.

I finally solved the namespace issue I was having, although I’ll probably burn for all eternity for the solution. In short, because of the behavior of XmlTextWriter, the only solution that could be implemented in a reasonable amount of time was to post-process the XML and strip out the extra namespace declarations.

So I started down the path of using XmlTextReader to spin through and collect up all the namespaces that I needed, then add those to the root node. After that I could use a regular expression to strip out all the unneeded ones. Turns out I had overlooked the fact that the input isn’t guaranteed to be well-formed XML. :-(

The “XML” is actually a template that our system uses to do some tag replacement. So the output of that process is well-formed, but the input can contain the “@” character inside element names. A no-no according to the XML spec.

So here it is, the all-regular-expression solution. I wouldn’t suggest you try this at home, but it does actually work, and seems to be quite fast (sub 1/4 second for a 1.5Mb input, and the typical input is more like 10K).

Note: this is made a little simpler because I know (since I just wrote out the “XML”) that all the namespace prefixes we care about start with ns, e.g. ns0, ns1, etc.

#region begin hairy namespace rectifying code here

//this is necessary because the XmlTextWriter puts in more namespace

//declarations than we want, which causes file bloat.

Regex strip = new Regex(@"xmlns\:ns\d=""[^""]*""");

ArrayList names = new ArrayList();

MatchCollection matches = strip.Matches(result);

foreach(Match match in matches)

{

string val = match.Value;

if(!val.StartsWith("xmlns:ns0"))

if(!names.Contains(match.Value))

names.Add(match.Value);

}

string fixedNamespaces = null;

StringBuilder sb = new StringBuilder();

foreach(string name in names)

{

sb.AppendFormat(" {0}",name);

}

fixedNamespaces = result;

int pos = fixedNamespaces.IndexOf(">",0);//should be the end of the xml declaration

pos = fixedNamespaces.IndexOf(">",pos+1);//should be end of root node.

fixedNamespaces = fixedNamespaces.Insert(pos,sb.ToString());

pos = fixedNamespaces.IndexOf(">",0);//should be the end of the xml declaration

pos = fixedNamespaces.IndexOf(">",pos+1);//should be end of root node.

We’ve got some XML documents that are getting written out with way too many namespace declarations. That probably wouldn’t be too much of a problem, except we then use those XML documents as templates to generate other documents, many with repetitive elements. So we’re ending up with namespace bloat. Scott and I found an example that was coming across the network at about 1.5Mb. That’s a lot. A large part of that turned out to be namespace declarations. Because of the way XmlTextWriter does namespace scoping, it doesn’t write out a namespace declaration until it first sees it, which means for leaf nodes with a different namespace than their parent node, you end up with a namespace declaration on every element, like this…

With our actual namespace strings, that’s like an additional 60 btyes per element that we don’t really need. What we’d like to see is the namespaces declared once at the top of the file, then referenced elsewhere, like this…

When we edited the templates manually to achieve this effect, the 1.5Mb document went to like 660Kb. Much better.

There doesn’t seem to be any way to get XmlTextWriter to do this, however. Even if you explicitly write out the extra namespaces on the root element, you still get them everywhere, since the writer sees those as just attributes you chose to write, and not namespace declarations.

Curses! I’ve spent all day on this and have no ideas. Anyone have any input?

Sigh. It’s a constant battle. I knew full well that XmlSerializer leaks temp assemblies if you don’t use the right constructor. (The one that takes only a type will cache internally, so it’s not a problem.) And then I went and wrote some code that called one of the other constructors without caching the resulting XmlSerializer instances.

The result: one process I looked at had over 1,500 instances of System.Reflection.Assembly on the heap. Not so good.

The fix? Not as simple as I would have hoped. The constructor that I’m using takes the Type to serialize, and an XmlRootAttribute instance. It would be nice to be able to cache the serializers based on that XmlRootAttribute, since that’d be simple and all. Unfortunately, two instances of an XmlRootAttribute with the same parameters return different values to GetHashCode(), so it’s not that easy. I ended up using a string key compounded from the type’s full name and the parameters I’m using on XmlRootAttribute. Not the most beautiful, but it’ll work. Better than having 1,500 temp assemblies hanging around.

Someone asked today how to get a list of all the namespace prefixes used in an XML document, along with their associated URIs so that that information could be used to initialize a XmlNamespaceManager. This works…

You’ll end up with a hashtable with the prefixes as keys and the associated URIs as their values. You could probably do something even cooler with a unique set datastructure, but the hashtable works in a pinch.

Here we are in the year 2005. XML has been pretty ubiquitous for at least 5–6 years now. Namespaces have been in use for pretty much all of that time. And yet they remain possibly the least understood part of average, everyday XML processing.

The bottom line is that pretty much any XML parser worth its salt these days supports the namespaces spec. Which means that

<MyElement/>

is absolutely not the same thing as

<MyElement xmlns=”urn:runforthehills”/>

Furthermore, in line with the XML Namespaces spec, an application which is expecting the latter, namespace qualified element should not and must not process the former, unqualified element.

The XmlSerializer that we all know and love in .NET is particularly sensitive to this issue (as well it should be). As far as the serializer is concerned, everything should be namespace qualified. The way this commonly bites people is thus: a customer/partner sends you a schema representing the XML documents they are going to be sending you. In the schema, the targetNamespace attribute is set with a value of “http://partner.com/schema”. When you actually do to debug the application however, it turns out they are sending you totally unqualified XML. Nothing will work. There are a few pretty horrible things you can do with the XmlSerializer to try and convince it not to be such a stickler about things, most involving the XmlRootAttribute and XmlAttributeOverrides. I can share those ways if anyone really wants to see them. Probably best to keep them under cover. However, that’s only likely to work if your XML document is flat, meaning that the root element only has one level of child nodes under it. Otherwise, if you use Xsd.exe to generate your serialization class, each set of sub elements get put in their own object, which will also be namespace qualified. And you’re back to square one.

The right solution of course is to get your partner to send you XML that’s actually correct, but often that’s just not possible for a variety of reasons with which I’m sure we’re all familiar. As a last ditch effort, you can pre-process the XML text before passing it to the XmlSerializer, and inject the right namespace strings. Yucky, it’s true, but it does actually get the job done. You will of course, be paying some overhead costs of string processing and possibly parsing the XML twice. But what can you do?

The other thing to keep in mind is how namespaces play out in XSD schema files. You can only have one target namespace per schema, so anything you define in that schema file will be in that target namespace. You can import things from other namespaces, but not from the target namespace. You can, however, define two different schema files that use the same namespace, then import them both into another schema, as long as there are no name collisions. If you omit the targetNamespace attribute from your schema, the targetNamespace becomes “”, meaning you are defining the schema for an unqualified XML document.

Confusing enough? Read the namespace spec (it’s really short), familiarize yourself with how namespaces work in schema, if you see errors coming back from the XmlSerializer that look like

The element <spam xmlns=””> was not expected.

check your namespaces! That means you are trying to deserialize an unqualified document, when a qualified one was expected.

I’ll be teaching again next term at OIT (at CAPITAL Center in Beaverton), this time “Enterprise Web Services”. We’ll be looking at what it takes to build a real-world enterprise application using web services, including such topics as asynchronous messaging, security, reliable messaging and a host of others. We’ll walk through all the stages of building an enterprise-level WS application, using .NET and WSE 2.0 to do the heavy lifting. Required is a firm grasp of programming in C#, and a basic understanding of Web Services fundamentals such as XML, SOAP, and WSDL.

As most of you probably have already heard, according to Dare, we won't be getting XQuery with Whidbey.

LAME!

One of the reasons given for this decision is that customers want something that is compliant with W3 standards. OK, that's true. I would disagree that people will only use something that is compliant with a full recommendation. Back in the day when MS first started putting out XML tools (early MSXML, SOAP Toolkit, etc.) many of those tools were built around working drafts, and we still managed to use them to get real work done. I would argue that even if the XQuery spec were to change substantively between now and it's full recommendation-hood (which I doubt) there's plenty of opportunity to get real work done with XQuery starting right now.

The counter argument is that people don't want to make changes to their code when the real spec ships. Guess what! There have been breaking changes introduced in each new revision of the .NET framework. People have to change their code all the time. I had to unwind a huge number of problems do to the changes in remoting security between .NET 1.0 and 1.1. Somehow we manage. The excuse of "well, you still have XSLT" just doesn't cut it IMHO. XSLT is a much more difficult programming model than XQuery, and most people to this day don't get how the declarative model in XSLT is supposed to work. XPath 1.0 is very limiting, which is why there's an XPath 2/XSLT 2 (which also are not going to be supported in Whidbey!).

I have to wonder if performance issues aren't closer to the truth of why it's not shipping. Writing an engine that can do arbitrary XQuery against arbitrary documents certainly isn't an easy thing to do. Think SQL. SQL Server is a non-trivial implementation, and there's a reason for it. I'm guessing that the reality of trying to make XQuery perform the way people would expect is a pretty daunting task.

Either way, I think it's a drag that we won't get XQuery support, recommendation or no.

Steve Maine is in the midst of the perennial debate between SOAP and REST, and I feel compelled to add my two cents...

At the XML DevCon last week I noticed that it continues to be fashionable to bash the existing Web Services standards as being too complex and unwieldy (which in several notable cases is true, but it's what we have to work with at this point) but that doesn't change the fact that they solve real problems. I've always had a sneaking suspicion that people heavily into REST as a concept favored it mostly out of laziness, since it is undeniably a much simpler model than the SOAP/WS-* stack. On the other hand, it fails to solve a bunch of real problems that SOAP/WS-* does. WS-Addressing is a good example.

I spent two years developing an application that involved hardware devices attached to large power transformers and industrial battery systems that needed to communicate back to a central data collection system. We used SOAP to solve that particular problem, since it was easy to get the data where it needed to go, and we could use WS-Security to provide a high level of data encryption to our customers. (Utility companies like that.) However, we had one customer who would only allow us to get data from the monitors on their transformers through a store-and-forward mechanism, whereby the monitors would dump their data to a server inside their firewall, and we could pick up the data via FTP. This is a great place for WS-Addressing, since all the addressing information staid inside the SOAP document, and it didn't matter if we stored it out to disk for a bit. There is no way that REST could have solved this particular problem. Or, at least, no way without coming up with some truly bizarre architecture that would never be anything but gross.

REST is great for solving very simple application scenarios, but that doesn't make it a replacement for SOAP. I agree that many of the WS-* standards are getting a bit out of hand, but I also agree with Don Box's assessment (in his "WS-Why?" talk last week) that given the constraints, WS-Addressing and WS-Security are the simplest solutions that solve the problem. There's a reason why these are non-trivial specs. They solve non-trivial problems.

So rather than focusing on REST vs. SOAP, it's more interesting and appropriate to look at the application scenarios and talk about which is the simplest solution that addresses all the requirements. I don't think they need to be mutually exclusive.

Sam Ruby is talking about problems with textual data out on the web, or more specifically in the context of RSS, having to do with bad assumptions about character encoding. As someone who once did a lot of work in localization, it's a subject near and dear to my heart.

I'm always amazed that still to this day people don't get that there are such a thing as character encoding.

Sam points out that an upper case A and the Greek Alpha show up in most fonts as the same glyph. However, they are different code points in Unicode.

He's moving this idea up the stack to show why there are so many conflicts between HTML/XML/RSS/etc. The rules for character encoding are different in all those systems and are enforced differently by different tools, which is what causes so many RSS feeds to be badly formed.

He started the presentation talking about how XML is an "attractive nuisance" with regard to the encoding issue, in that it leads people down the primrose path to thinking all their encoding issues are solved just because XML is supposed to take care of encoding.

All in all, the issues Sam is talking about are pretty obscure, and appeal mostly to XML wonks, but that doesn't make them any less valid. The reality is we've all learned to deal with it most of the time, just like we're used to IE fixing up our bad HTML tables.

Just two more days until the Dev Con. I had a great time both speaking at and attending last year, so I'm looking forward to another exciting time. Scott and I will be talking about some of the things we're doing at work around XML Schema and using "contract first" coding in a non-Web Services context.

I may just have a new favorite XML editor. I caught wind of Stylus Studio 6 [via Mike Gunderloy] so I downloaded a trial copy and checked it out. Wow. I'm pretty impressed. It's the same price as XMLSpy Pro, but includes support for XPath (v1 and v2), XQuery, Web Services testing, and a pretty good schema-to-schema mapping tool that creates XSLT files. Plus it has a schema editor which looks pretty good, lots of data conversion tools, support for custom extensions (if you have your own file types), etc. Lots of good stuff here.

What is even cooler is that they have a "Home" version for non-commercial use that has almost all of the features of the pro version (unlike the pretty well crippled XMLSpy Home) for only $50. I'll definitely turn my students on to this next week. That's a lot of functionality for very little money. The schema editor in the Home version isn't quite as cool, and there are a few other features it doesn't support, like web services testing, but it looks otherwise pretty highly functional.

If you don't care about the WSDL editor, there might be a lot to recommend in the Pro version over XMLSpy Enterprise, at about 1/3 of the price.

Looks like Scott and I will be speaking at Chris Sells' XML DevCon this year. Last year I spoke on XML on transformer monitors. This year Scott and I will be talking about the work we've been doing with online banking and XML Schema.

If it's anything like last year's, the conference should be pretty amazing. The speakers list includes some pretty serious luminaries. In fact, it's pretty much a bunch of famous guys... and me.

I'll be teaching at OIT (in Portland/Beaverton, not K-Falls) again Fall term. This time it's "Practical Web Services". If you're interested, sign up through OIT. The course number is 15048. Description follows:

Practical Web Services

Web Services sound like a great idea, but how do you actually go about using them? How do you go about actually writing your own Web Service to expose your data or functionality?

This class will cover all the details involved in using and building your own Web Services using the Microsoft .NET platform. The first half of the class will cover the building of a client application to consume a Web Service from the Internet. The second half will focus on building an equivalent Web Service using ASP.NET.

Students will leave this class with a firm understanding of how to use Web Services built by other people, and how to implement their own Web Services using the .NET platform.

Students should either have taken the previous "Web Services Theory" class, or have instructor approval. All work will be done in C#, so a firm understanding of C# is required.

This should be totally obvious to those with XML experience, but to those who don't fall into that category, keep in mind that it's of utmost importance to not mix data and meta-data when designing your XML. For example, when creating an XML document for a purchase order, I've often seen stuff like

This is what i mean by mixing data and meta-data. By naming elements "item1" and "item2" you've mixed data (ordinal values "1" and "2") with meta-data (the description "item"). Now when you go to write a schema to match this document, what do you do? Explicitly name elements item1 and item2? What happens when you get a PO with 3 items. You're screwed.

Again, to those who are used to working with XML, this is readily apparent, but I found out from the class I taught this summer that it isn't obvious to everyone. A much better solution would be something like

I finished up my Web Services Theory class at OIT last night. Just the final to go on Monday (mwah ha ha).

We ended with some stuff on WS-* and all the various specs. I tried to spend minimal time on the actual syntax of WS-*, since some of them are pretty hairy, and spent more time on the business cases for WS-*. That seemed to go over pretty well. I think it's easier to understand the business case for why we need WS-Security than it is to understand the spec itself. Unfortunately, on of the underlying assumptions about all the GXA/WS-* specs is that eventually they will just fade into the background, and you'll never see the actual XML, since some framework piece (like WSE 2.0) will just "take care of it" for you. What that means is that the actual XML can be pretty complex. The unfortunate part is that we don't have all those framework bits yet, so we have to deal with all the complexity ourselves. Thankfully more tools like WSE 2 are available to hide some of that from the average developer. On the other hand, I'm a great believer in taking the red pill and understanding what really goes on underneath our framework implemenations.

Dare Obasanjoposits that the usefulness of the W3C might be at an end, and I couldn't agree more. Yes, the W3C was largely behind the standards that "made" the Web, but they've become so bloated and slow that they can't get anything done.

There's no reason why XQuery, XInclude, and any number of other standards that people could be using today aren't finished other than the fact that all the bureaucrats on the committee all want their pet feature in the spec, and the W3C process is all about consensus. What that ends up meaning is that no one is willing to implement any of these specs seriously until they are full recommendations. 6 years now, and still no XQuery. It's sufficiently complex that nobody is going to try to implement anything other than toy/test implementations until the spec is a full recommendation.

By contrast, the formally GXA now WS-* specs have been coming along very quickly, and we're seeing real implementation because of it. The best thing that ever happened to Web Services was the day that IBM and Microsoft agreed to "agree on standards, compete on implementations". That's all it took. As soon as you get not one but two 800 lb. gorillas writing specs together, the reality is that the industry will fall behind them. As a result, we have real implementations of WS-Security, WS-Addressing, etc. When we in the business world are still working on "Internet time", we can't wait around 6-7 years for a real spec just so every academic in the world gets his favorite thing in the spec. That's how you get XML Schema, and all the irrelevant junk that's in that spec.

The specs that have really taken off and gotten wide acceptance have largely been defacto, non-W3C blessed specs, like SAX, RSS, SOAP, etc. It's time for us to move on and start getting more work done with real standards based on the real world.

I started teaching a class at OIT this week on "Web Services Theory", in which I'm trying to capture not only reality, but the grand utopian vision that Web Services were meant to solve (more on that later). That got me thinking about the way the industry as a whole has approached file formats over the last 15 years or so.

There was a great contraction of file formats in the early 90s, which resulted in way more problems than anyone had anticipated I think, followed by a re-expansion in the late 90s when everyone figured out that the whole Internet thing was here to stay and not just a fad among USENET geeks.

Once upon a time, back when I was in college I worked as a lab monkey in a big room full on Macs as a "support technician". What that mostly meant was answering questions about how to format Word documents, and trying to recover the odd thesis paper from the 800k floppy that was the only copy of the 200 page paper and had somehow gotten beer spilled all over it. (This is back when I was pursuing my degree in East Asian Studies and couldn't imagine why people wanted to work with computers all day.)

Back then, Word documents were RTF. Which meant that Word docs written on Windows 2.0 running on PS/2 model 40s were easily translatable into Word docs running under System 7 on Mac SEs. Life was good. And when somebody backed over a floppy in their VW bug and just had to get their thesis back, we could scrape most of the text off the disc even if had lost the odd sector here and there. Sure, the RTF was trashed and you had to sift out the now-useless formatting goo, but the text was recoverable in large part. In other sectors of the industry, files were happily being saved in CSV or fixed length text files (EDI?) and it might have been a pain to write yet another CSV parser, but with a little effort people could get data from one place to another.

Then the industry suddenly decided that it could add lots more value to documents by making them completely inscrutable. In our microcosm example, Word moved from RTF to OLE Structured Storage. We support monkeys rued the day! Sure, it made it really easy to serialize OLE embedded objects, and all kinds of neat value added junk that most people didn't take advantage of anyway. On the other hand, we now had to treat our floppies as holy relics, because if so much as one byte went awry, forget ever recovering anything out of your document. Best to just consider it gone. We all learned to be completely paranoid about backing up important documents on 3-4 disks just to make sure. (Since the entire collection of all the papers I ever wrote in college fit on a couple of 1.4Mb floppies, not a big deal, but still a hassle.)

Apple and IBM were just as guilty. They were off inventing "OpenDoc" which was OLE Structured Storage only invented somewhere else. And OpenDoc failed horribly, but for lots of non-technical reasons. The point is, the industry in general was moving file formats towards mutually incomprehensible binary formats. In part to "add value" and in part to assure "lock in". If you could only move to another word processing platform by losing all your formatting, it might not be worth it.

When documents were only likely to be consumed within one office or school environment, this was less of an issue, since it was relatively easy to standardize on a single platform, etc. When the Internet entered the picture, it posed a real problem, since people now wanted to share information over a much broader range, and the fact that you couldn't possibly read a Word for Windows doc on the Mac just wasn't acceptable.

When XML first started to be everyone's buzzword of choice in the late 90s, there were lots of detractors who said things like "aren't we just going back to delimited text files? what a lame idea!". In some ways it was like going back to CSV text files. Documents became human readable (and machine readable) again. Sure, they got bigger, but compression got better too, and disks and networks became much more capable. It was hard to shake people loose from proprietary document formats, but it's mostly happened. Witness WordML. OLE structured storage out, XML in. Of course, WordML is functionally RTF, only way more verbose and bloated, but it's easy to parse and humans can understand it (given time).

So from a world of all text, we contracted down to binary silo-ed formats, then expanded out to text files again (only with meta-data this time). It's like a Big Bang of data compatibility. Let's hope it's a long while before we hit another contracting cycle. Now if we could just agree on schemas...

Scott has some comments about WSE 2.0 (which just in case you haven't heard yet has RTMed) and I wanted to comment on a few things...

Question: The Basic Profile is great, but are the other specs getting too complicated?My Personal Answer (today): Kinda feels like it! WS-Security will be more useful when there is a more support on the Java side. As far as WS-Policy, it seems that Dynamic Policy is where the money's at and it's a bummer WSE doesn't support it. [Scott]

It's the tools that are at issue here, rather than the specs I think. I spent some time writing WS-Security by hand about a year ago, and yes, it's complicated, but I don't think unnecessarily so. The problem is that we aren't supposed to be writing it by hand. We take SSL totally for granted, but writing an SSL implementation from scratch is non-trivial. We don't have to write them ourselves anymore, so we can take it for granted. The problem (in the specific case of WS-Security) is that we have taken it for granted as far as Web Services go. Unfortunately, that makes the assumption that Web Services are bound to HTTP. In order to break the dependence on HTTP (which opens up many new application scenarios) we have to replace all the stuff that HTTP gives us "for free" like encryption, addressing, authentication, etc. Because to fit with SOAP those things all have to be declarative rather than procedural, I think they feel harder than depending on the same thing from procedural code.

If we are to realize the full potential of Web Services and SO, then we have to have all this infrastructure in place, to the point where it becomes ubiquitous. Then we can take the WS-*s for granted just like we do SSL today. Unfortunately the tools haven't caught up yet. Three or four years ago we were writing an awful lot of SOAP and WSDL related code ourselves, and now the toolsets have caught up (mostly). Given enough time the tools should be able to encompass the rest of the standards we need to open up all the new application scenarios.

Steve Maine makes a good analogy to the corporate mailroom. There's a lot of complexity and complex systems involved in getting mail around the postal system which we don't see on a daily basis. But it's out there none the less, and we couldn't get mail around without them. When we can take SO for granted like we do the postal system, then we'll see the full potential of what SO can do for business, etc. in the real world.

Now that I think about it some more, this is a problem that WinFS could really help to solve. The biggest reason that people don't use things like RDF is sheer laziness (you'll notice the rich RDF on my site ) but if we can use the Longhorn interface to easily enter and organize metadata about content, it might be a cool way to generate RDF or other semantic information. Hmmmm... It would be fun to write a WinFS -> RDF widget. If it wasn't for that dang day job...

Scottmentions some difficulty he had lately in finding some some information with Google, which brings to my mind the issue (long debated) of the semantic web. Scott's problem is exactly the kind of thing that RDF was meant to solve when it first came into being, lo these 6-7 years ago.

Has anyone taken advantage of it? Not really. The odd library and art gallery. Why? Two main reasons: 1) pure laziness. It's extra work to tag everything with metadata 2) RDF is nearly impossible to understand. That's the biggest rub. RDF, like so many other standards to come out of IETF/W3C is almost incomprehensible to anyone who didn't write the standard. The whole notion of writing RDF tuples in XML is something that most people just don't get. I don't really understand how it's supposed to work myself. And, like with WSDL and other examples, the people who came up with RDF assumed that people would use tools to write the tuples, so they wouldn't have to understand the format. The problem with that (and with WSDL) is that since noone understands the standard, noone has written any usable tools either.

The closest that anyone has come to using RDF in any real way is RSS, which has turned out to be so successful because it is accessible. It's not hard to understand how RSS is supposed to work, which is why it's not really RDF. So attaching some metadata to blog content has turned out to be not so hard, mostly because most people don't go beyond a simple category, although RSS supports quite a bit more.

The drawback to RDF is that it was create by and for librarians, not web page authors (most of whom aren't librarians). Since most of us don't have librarians to mark up our content with RDF for us, it just doesn't get done. Part of the implicit assumption behind RDF and the semantic web is that authoritative information only comes from institutional sources, who have the resources to deal with semantic metadata. If blogging has taught us anything, it's that that particular assumption just isn't true. Most of the useful information on the internet comes from "non-authoritative" sources. When was the last time you got a useful answer to a tech support problem from a corporate web site? The tidbit you need to solve your tech support problem is now-a-days more likely to come from a blog or a USENET post than it is from the company who made the product. And those people don't give a fig for the "semantic web".

As I've mentioned before, I'm doing some ongoing work with code generation from XmlSchema files. Developers mark up XmlSchema documents with some extra attributes in our namespace, and that influences how the code gets generated. Think of it as xsd.exe, only this works.

So today a new problem was brought to my attention. I read in schema files using the .NET XmlSchema classes. OK, that works well. For any element in the schema, you can ask its XmlSchemaDatatype what the corresponding .NET value type would be. E.g. if you ask an element of type "xs:int", you get back System.Int32. "xs:dateTime" maps to System.DateTime, etc.

When you want to serialize .NET objects using the XmlSerializer, you can add extra custom attributes to influence the behavior of the serializer. So, if you have a property that returns a DateTime, but the schema type is "date" you need some help, since there's no underlying .NET Date type.

So now for the catch. The CLR types that the schema reading classes (System.Xml.Schema) map to schema types don't in all cases match the CLR types that the XmlSerializer maps to schema types. The schema reader says that "xs:integer" maps to System.Decimal. OK, that's consistent with the XmlSchema part 2 spec. Unfortunately, the XmlSerializer says that "xs:integer" must map to a System.String. So does "xs:gYear", and several others.

The end result is that I can't rely on the XmlSchemaDatatype class to tell me what type to make the resulting property in my generated code. Arrrrrggggghhhhh!!!!!!!

The two solutions are basically

tell people not to use integer, gYear, etc. (possibly problematic)

have my code embody the differences in type mapping between XmlSchema and XmlSerializer (lame)

I haven't delved, but I can only assume that xsd.exe uses the latter of the two.

I just got asked a question at work that reminded me how many people still don't quite "get" XML. And I still find it surprising. We've now had a good 6-7 years of XML being fairly present, and 3-4 years of it being pretty ubiquitous. And yet...

Once upon a time I wrote (and taught) a class on XML for a variety of customers, and when I think about the experience now, I think the hardest thing to get across to people is how to visualize the InfoSet. It's not flat. XML looks flat when you see it on a page (not flat in an ISAM way, there is hierarchy, but flat in a 2D kind of way), but the InfoSet isn't. As soon as you introduce things like namespaces and ID/IDREFS, the InfoSet itself is really this big n-dimensional think that's hard to get your head around. If you look at the XPath spec, it should provide a big clue. It talks about navigating InfoSets in terms of axes. That's exactly what they are. Namespaces are a separate axis. They come right out of the screen at your face. It's not flat.

And that's not even counting what happens when schema gets into the picture. The shape of a "Post Schema Validation InfoSet" may have very little to do with that the XML looks like on the page. That's why Binary XML shouldn't be scary to anyone. It's just another way of serializing the InfoSet, not the XML. Think about the thing that the XML becomes being serialized in a different manner, not the XML itself. "Binary XML" in and of itself sounds pretty silly, since the whole point is that XML is text. But "Binary Serialized PSVI" doesn't sound so silly, and may have some distinct advantages.

OK, in rereading that I realize it may make sense to nobody but me, but given the self-indulgent nature of blogging I don't really care. If I can help just one person see the InfoSet and not the flat XML, I'll sleep that much better at night.

I found myself in the position this week of having to rewrite a bunch of XML parsing code that was all written using the DOM (that I didn't write). It's not that I really have anything against the DOM model, but it seemed like overkill, since this particular code actually was organized into subroutines, each of which would take a string and load it into another XmlDocument instance. And in each case, all that happened with the DOM was a single XPath query using selectSingleNode. Pretty much a performance disaster.

What I found interesting is that when I changed it all to use XPathDocument/XPathNavigators instead, the performance didn't seem much better. Granted, I didn't do a very scientific investigation. I'm running NUnit tests inside VS.NET using the NUnit-Addin, and the before and after NUnit tests completed in around the same time.

I'm not suggesting I'm sorry I changed the code, since it's both aesthetically more pleasing (at least to me) and has the potential for better performance over larger documents (and I'm assuming a lot less memory overhead). I was just surprised that it wasn't faster. I guess I really should profile both cases and see what's really going on performance wise. Maybe I'll get around to it eventually

In a few places where the XPath wasn't really important I changed it to an XmlTextReader instead, and was gratified that the NUnit tests completed in about a quarter of the time that the DOM was taking. Every little bit counts.

I'm in the midst of doing some integration with a third party company, and their XML is driving me loony. The worst part is, it's not something you can quite put your finger on. Their XML is, in fact, well formed. It might be valid, except that the DTDs they provided us with aren't internally valid, at least according to XMLSpy.

I guess the biggest thing that bugs me is that their XML is, like, so 1997. They're still using DTDs. Every single element in the entire InfoSet is defined as a global element and then referenced. Not a single attribute appears anywhere, so everything is element-normalized to an unhealthy extreme. It just seems like XML that dates from a time when our ancestors didn't quite grasp the XML concept. Anybody remember DSSSL? It's like that. Only well formed.

And did I mention namespaces? No? Not a namespace declaration to be found anywhere.

Bah! I'm trying to make some sense of some "schemas" that I got from a third party. Whoever wrote them is stuck in the 1998 "isn't XML a neat idea" stage of their career. They are, of course, DTDs instead of W3 schemas. Better still, pretty much every single element in the entire corpus (the whole thing is defined in one file, although it contains multiple messages) is defined as a global element, which makes a complete mess.

As if that weren't enough, the DTDs don't actually validate.

Sigh.

So I'm struggling to rationalize them into some more useful (W3) form.

All I can say is that it's 2004 for cryin' out loud. XML isn't just a neat idea, and people should know better than this by now. If you're defining a group of atomic messages, do yourself a favor and define one per schema file. If you have repeating elements, import is your friend. It makes it so much easier for the schema consumer to deal with. Don't define every single element as global. If you have structures that are used in more than one place, great, but for simpleTypes, it doesn't make much sense to make them global, and it really clutters up the schema.

As with any other design task, think about how your schema is going to be used, and by whom, instead of starting from the idea that it's just really neat. It's been long enough now that we should really be seeing better XML practices globally, but I fear that's not really the case.

I'm currently working on doing some instrumenting for the sake of unit testing (using NUnit) and was doing some thinking about Schematron. I haven't heard much about it lately, and I don't know if people are going forward with it, but it's a pretty compelling idea.

For those of you who haven't looked into it, Schematron allows you to define assertions based on XPath expressions that can be used to validate XML documents, over and above what XML Schema provides. For example, if your business logic specifies that for any given order, a retail customer can only have purchases totalling $100, that's something that you can't really specify in XSD, since it involves cross-element rules, but you can with Schematron.

Anyway, I happen to have XML serialized versions of the objects I'm interested in lying around, so I could create a shim that would work with NUnit to do Schematron validation (using Schematron.NET). However, I might not always have XML around. It would be pretty cool if you could do the same kind of declarative validation of objects. I wonder if ObjectSpaces will facilitate something like that??

Ah ha! Turns out the user context we were running under didn't have permissions to the c:\winnt\temp directory. Ouch. An easy fix, but not one that was intuitively obvious, at least to me. I knew that the XmlSerializer created dynamic assemblies, but I didn't know they had to be persisted to disk.

I’d have to agree with Clemens
that one of the coolest parts of .Net is custom attributes. I’m
constantly amazed at how much you can achieve through the judicious use of
attributes. Best of all, since they are much like attributes (should be)
in XML, you can carry them over from one to the other. For example, you
can add some extra attributes to an XML Schema document in the namespace of
your choice, then (if you want to write your own xsd.exe) you can carry those
attributes forward into your .Net classes. Based on those custom
attributes, you can apply aspects to your
objects at runtime, and basically make the world a better place.

When all that work is finished, you can influence the
behavior of your data objects at runtime just be tweaking the schema that
defines them. At the same time, since you’re starting with schema,
you get lots of fun things for free, like XmlSerializer and other bits of
.Netty goodness.

I’m a bit to excited to go into all the details right
now, but suffice it to say the prospects for code generation, attributes and
aspects are pretty amazing. Once we get the work out of the way, the rest
is just business logic. More business, less work.

Say it as a mantra: “more business, less work
more business, less work…..”.

Only days
ago I mused that it would be nice to have more control over the way the
XmlSerializer works. Sure enough, according to Doug Purdy via Christoph Schittko
we’ll get access to IXmlSerializable, and can write our own XML to our
hearts content.

I’ve
been doing a lot of work with the .NET XmlSerializer over the last couple of
weeks, and once again I’m wishing I had that just one more custom
attribute to work with. What I’d love to be able to do is just tell
the XmlSerializer to let me worry about a specific part of the XML, since I
know better than it.

Something like
this (to serialize a property called MyString):

publicinterface
ICustomSerializer

{

void DoSerialization(object
theValue, XmlTextWriter w);

void DoDeserialization(XmlNode x, object targetValue)

}

publicclass
DataClass

{

public DataClass()

{

}

[XmlMyWay(typeof(MySerializer))]

publicstring
MyString

{

get{ return
"OK";}

set{}

}

}

[AttributeUsage(AttributeTargets.Property)]

publicclass
XmlMyWayAttribute : Attribute

{

public XmlMyWayAttribute(Type serializer)

{

}

}

publicclass
MySerializer : ICustomSerializer

{

#region
ICustomSerializer Members

publicvoid
DoSerialization(object theValue, XmlTextWriter
w)

{

// TODO: Add MySerializer.DoSerialization
implementation

}

publicvoid
DoDeserialization(XmlNode x, object
targetValue)

{

// TODO: Add MySerializer.DoDeserialization
implementation

}

#endregion

}

Then
you could do whatever funky serialization you needed to do. Or is this
flawed in some fundamental way? It’d be pretty cool though…

I’m sure everyone knew about this but me, but I was impressed. I needed to attach some extra data to an XML Schema document so that it would be available to a code generator I’m writing. You can put whatever extra attributes you want in an XSD document (which is how MS does it with SQLXML, for example) and no one will be bothered.

However, I needed some full on elements to express the extra data I needed to carry. Luckily for me, the clever people at the W3C thought of this, and gave us the annotation element. I knew you could put documentation inside an annotation element, but I’d never noticed the appInfo element before.

Inside an appInfo, you can put whatever you want to, and attach it to just about any place in your schema file. Very cool.

On a completely different note, I’m amazed to see that WordML actually serializes the little red “you misspelled something again” bar into the XML. Just in case you want to style it into a mistake somewhere else?