December 7, 2009

Writing Unit Tests for Data Access Code

Had a question today from a developer from another software project within my company:

How do I begin and rollback database transactions from a unit test?

My short answer . . .

You don’t.

. . . led to a pointed follow-up question . . .

So how do I write unit tests for my data access code?

. . . which led to a somewhat lengthy discussion. It’s an interesting topic, and one that I have pretty strong opinions on, so I thought I would write a bit about it here.

Before I start ranting, I want to be clear that what follows is my opinion, and is intended as a discussion of general principles. Every problem is different, and providing the right solution absolutely requires understanding the full context of that problem. I don’t, and never would support the idea that there is only one correct approach. Although I do strongly advocate the use of certain methodologies, technologies and frameworks, I’m all about choosing the right tool for the job. For example, even though I vastly prefer using an ORM tool over inline SQL or stored procedures, there are times when the latter is actually a better choice.

Where Should Database Transaction Code Live?

So the question behind the original question goes something like:

Should I expose the control of database transactions (start/rollback/commit) to logical layers outside of the Data Access Layer (DAL)?

In a word, no. I don’t say this because it’s a difficult thing to do (it’s not), or because it will result in a big change to the Data Access API (it won’t). I say it because exposing fine grained database transaction control outside the DAL dramatically changes the conceptual meaning of the DAL as a logical layer. The DAL is a service/set of services whose sole purpose is to encapsulate all the functionality required to save, delete, and query data to and from the database. Exposing transaction control outside of this layer violates the encapsulation of your DAL in the worst possible way.

Why? Because database transactions are one of the foundations on which any database application is built. Transactions are fundamental in ensuring the integrity and consistency of your data.I don’t think I’m going out on a limb by saying that the data in your application is by far the most important element of the system; it’s likely the whole reason your application was created in the first place.

So although implementing database start/commit/rollback is generally not complex in terms of lines of code, I would highly recommend as a best practice:

Encapsulate database transaction logic into a well-understood class or set of classes that is accessible only from within the Data Access Layer, and not left open to the wide world and any team member (current or future) who decides to tinker with it.

Then How Do I Create Automated Tests for my Data Access Code?

The question itself contains something of a false premise from my perspective. Before even attempting to answer, I would want to be sure that the following questions have been answered to my satisfaction:

Do you really need to test your DAL discretely/directly? If so, why?

It sounds like a basic question, but you might be surprised by the puzzled looks you get when you ask it. There is a reason why you want to test your DAL, so what is it? What, exactly are you trying to protect against?

I can think of a few answers that would satisfy me that some form of automated, discrete Data Access testing would need to be done:

The DAL is going to be refactored and needs some basic tests to ensure it was done correctly, and a full regression test on the system is not possible for whatever reason.

The system has some incredibly complex edge cases for data that need to be tested, and would be extremely difficult to set up for manual/integration/QA testing. The testing would be limited to these cases.

The software is very data-centric, so our discrete DAL testing essentially is our integration/system testing.

Uncertainty in the parity between ORM representation of your database, and your actual database schema (whether it be LLBL generated code or NHibernate configs).

So assuming it turns out that, yes, you need to create discrete tests against your DAL, the question becomes:

How do I create automated tests for my Data Access code without exposing database transaction control outside of the DAL?

If you’re going to be testing the DAL discretely, it’s a given that you’ll be doing it against known data. If setting up predictable, known data in advance (e.g. via a restore of a reference database) is not an option, that means you will be creating the data from within your testing code, either as part of each discrete test or as a setup method that gets hit at the start of each test run. This data can clutter your dev/test database over time, and become huge and unwieldy. Dealing with this data really becomes the core of the question. I think the options are as follows, in order of preference:

Option 1: let the data build up, and don’t worry about it. It’s a test database, so filling it up with meaningless data should not have any real impact. If the data created by these tests really is problematic (e.g. getting in the way of manual unit testing, demoing, or QA testing), create a separate database specifically for your DAL tests. In the unlikely event that even this separate database gets out of hand over time, you can simply re-create it from scratch periodically.

Option 2: track the data you create and delete it in your teardown method. If you can create the data in your setup method, what’s stopping you from deleting it in your teardown method? This will only work of your Data Access API supports deleting the type of data you are working with; I would not recommend adding delete functionality to your API for the sole purpose of testing.

Option 3: mark the created data in such a way that it can be easily deleted. I have taken this approach in the past—basically when I created data that was to be used in discrete automated tests, I would prefix and/or suffix key areas with a known string of characters, like “***TESTDATA***“. This made it easy to find and purge the data later by deleting it directly via a specially-created stored proc or ad-hoc query. This cleanup code could be run manually when needed, via a scheduled process, or as part of the build.

Option 4: use your unit testing framework’s Rollback support to wrap your test in a COM+ transaction. This can be tricky to set up, depending on the database you are using and the security environment of your servers. But if you can get past the initial setup and get it working both locally and on your build server, this may actually be the best and easiest option. MbUnit supports this functionality out of the box (MbUnit RollbackAttribute), and I believe NUnit supports it via an add-on (Google "NUnit Rollback"). I’m sure other frameworks support it in one form or another as well.

Conclusion

Use of automated testing should be part of an overall testing/QA strategy. It is a great tool for your tool belt, and one that I recommend and use extensively on my own projects. But it is a means to an end, not an end in itself. You can have 100% coverage of your application code and still have software that doesn’t work, and you can have 0% coverage and have software that is completely bug-free.

Also note that when I say “coverage” here, I mean automated coverage (as measured by a tool such as NCover). Overall test coverage really includes a number of factors, including:

Automated unit testing

Manual unit testing

Integration testing (automated or manual)

QA/System/Regression testing (automated or manual)

Consider the possibility that your DAL is being tested implicitly by some other aspect of your overall testing strategy, and that discrete, automated testing is not necessary.

One thing to note is that having fragile unit tests are not wise. By adding explicit insert and update statements in your unit tests makes those test fragile. Someone will change the schema and break a lot of DAL unit tests due to the insert statements and not due to the fact that the DAL no longer works. If you do write SQL statements in your unit tests, I would suggest them being as simple as possible so whoever does break them can quickly understand and remedy the issue.

Great post, dude. I am knee deep in unit testing for a project I am working on right now and I’ve always wondered if option 3 is the best method. I use it frequently but I am going to have to dig a little deeper into option 4. I’ve never used mbunit so I am sure there’d be some learning curve. Anyway, keep up the good work. Now back to the grindstone.