[XML Schemas] provide a means for defining the structure, content and semantics of XML documents.

I'm using an XML Schema (XSD) to validate several large XML documents. While I'm finding plenty of support within XSD for checking the structure of my documents, there are no procedural if/else features that allow me to say, for instance,

If Country is USA, then Zipcode cannot be empty.

I'm comfortable using unit testing frameworks, and could quite happily use a framework to test content integrity.

Am I asking for trouble doing it this way, rather than an alternative approach? Has anybody tried this with good / bad results?

--

Edit: I didn't include this information to keep it technology agnostic, but I would be using C# / Linq / xUnit for deserialization / testing.

5 Answers
5

I use C# to deserialize XML documents into classes and unit test from there. I do this in addition to verifying the XML document against the XSD file, and it works great. I highly recommend this method of validating XML.

You should write some .NET classes to do your validation and then verify their expected behavior with unit tests.

The process for actually finding the documents and validating them should be completely separate (Perhaps some other set of types that use a FileSystemWatcher on a folder you can drop into). Pointing your unit tests at the XML documents you need validated is a poor idea; you should not have to use a IDE or test-runner to run your business.

Use a high level exception handler to prevent an invalid file from crashing your process. There's nothing special about unit testing frameworks that make them useful in regular applications (they're essentially exception handlers).

Plus, how are you planning on logging exceptions if you will use a testing framework to run your code?

Select which tests to run depending on command-line or other arguments.

Simplify repeated initialization and deinitialization.

Report values used in failed conditions.

Now I don't see how any of those would help you. You have one long list of tests, that you need to apply to the document, you need to run all tests, you don't want to initialize and deinitialize separately for each test and you need messages describing what is wrong with the document, not dump of some random objects you compared to check.

Especially this test isolation stuff and the fact, that unit test frameworks are generally not designed to be run on externally provided sets of data is probably going to get in your way more than anything else might help you. Because you have to write the tests themselves in the raw language anyway and that's almost all the work there is.

So I'd say just pick a language you know, a good XML parsing library, that will give you convenient access to the document tree and just write a simple application with that.

Ad edit: .NET has a compiler (you give it XSD and it spits out class definitions; it can go the other way too) to generate object representation corresponding to the document from it's XSD, which is by far most convenient method to access the document elements and will verify the document conforms to the XSD as side-effect. So I'd recommend using that for reading the document. That has nothing to do with unit test, of course.

There are no procedural if/else features that allow me to say, for instance, if Country is USA, then Zipcode cannot be empty.

For this, you'll want to use XQuery. At one previous employer, we worked with government forms, and the newer forms were to be submitted via XML. The "edit tests" (which included things like zip code cannot be empty for US addresses) were expressed as XQueries and XPath expressions. Because many used dates for validation, it turns out that one cannot use the built-in validation in .NET (you need XPath 2 for dates, and that's not in .NET) and have to use external parsers (we chose Altova's).

The title of your question begins with "Should I use...". "Should" is a strong word. You should test, but whether you use a unit testing framework or something else depends on a lot of factors. You can use a unit testing framework if that is what you are comfortable with.

However, it sounds like you aren't testing units per se, and unit testing frameworks are heavily optimized for that particular niche. You might want to choose an acceptance testing framework such as cucumber or the robot framework or specflow. These types of tools let you express your tests more like business requirements rather than just a set of inputs and outputs.