Last November, I engaged Udi Dahan (the Software Simplist) on using DataSets in OLTP situations. It turns out that Udi uses a question I posed that kind of preceded this dialogue for his latest podcast at Dr. Dobbs Journal titled DataSets and Web Services. Yes, I am the Jacob he mentions.

You can tell that he's not a great fan of DataSets in general, though he takes pains to treat them fairly. Which puts him ahead of most of the developers I track, really.

Row State

One of the things that Udi doesn't like about DataSets is that they track row state. Unfortunately, he seems to be wrapping together two different consequences of doing so and it's a little bit confusing. His point actually makes more sense if you break it apart a bit.

The first thing he brings up is that because a DataSet tracks row state, many developers build that tracking into the web service as an unstated expectation. I've seen this, too, and it is hideous. In the most extreme cases, you'll get a web service for each of the CRUD operations for your objects—essentially making your web service a thin proxy for your data layer.

The real problem with using row state, however, is that it muddies the contract for your services. This is where Udi is talking about "passing along the intent." Services should have an explicit intent for the data that they receive (or return). Without an explicit intent, it can be hard to discern what any specific service is supposed to do and therefore what business logic that service will enforce. The narrower you can define your intent the better. If your service only knows that a customer record has changed, that service doesn't know if it's supposed to change the address, the name, the marital status, or what all. Which means that the service will not only have to handle all business logic associated with a customer record every time, it also has to try to reconstruct what has happened if it wants to have any kind of concurrency at all.

Incidentally (and more importantly, I think), this also means that those using that service don't know what all it is supposed to do or what criteria it uses to do it. Developers don't deal well with uncertainty. Before you know it, you'll see business logic creep into the calling code "just to be sure." Or worse, you'll see new services pop up duplicating pieces of the larger service.

The problem with row state isn't deterministically bad—just because it's easier to screw up by relying on that additional information doesn't mean you have to screw it up. In other words, just because DataSets track row state doesn't mean you have to give row state significance in your services.

Too Much Information

Row State is really just a specific example of a wider problem, though—namely that DataSets encode a lot of details beyond the data that your service needs to do its stuff. These details make DataSets unsuited for including as a part of an SOA contract because alterations in those implementation details will break a service contract even when the data structures remain the same.

Udi's conclusion is that DataSets should never be used in SOA. I'm forced to agree with him for this last reason. Versioning objects across an enterprise is a tricky business. Minimizing the number of things that break downstream stuff when changed is important to good architecture. The more that you can isolate, the happier you will be.

So the key question for me becomes: are you actually implementing SOA or just using web services? If you're simply using web services, DataSets have limitations to watch for but can be useful. If you're going SOA, keep DataSets to yourself.