A blog about one man's journey through code… and some pictures of the Peak District

Category Archives: Database

There are many systems out there in the wild, and some new ones being written, that use database logic extensively. This article discusses how and why these pieces of logic should be tested, along with whether they should exist at all.

In general, for unit tests, it’s worth asking the question of what, exactly, is being tested, before starting. This is especially true in database tests; for example, consider a test where we update a field in a database, and then assert that the field is what it has been set to. Are you testing your trigger logic, or are you simply testing Microsoft SQL Server works?

The second thing to consider is whether or not it makes any sense to use testable database logic in new code. That is, say we have a stored procedure that:
– Takes a product code
– Looks up what the VAT is for that product
– Calculates the total price
– Writes the result, along with the parameter and the price to a new table

Does it make sense for all that logic to be in the stored procedure, or would it make more sense to retrieve the values needed via one stored procedure, do the calculation in a testable server-side function, and call a second procedure to write the data?

FIRST

Unit testing a database is a tricky business. First of all, if you have business logic in the database then it, almost by definition, depends on the state of the data. You obviously can simply run unit tests against the database and change the data, but let’s have a look at the FIRST principles, and see where database tests are inherently difficult.

Fast

It depends exactly what is meant by fast, but in comparison to a unit test that asserts some logic in C# code, database tests are slow (obviously, in comparison to conducting the test manually, they are very fast). Realistically, they are probably going to be sufficiently slow to warrant taking them out of your standard unit test suite. A sensible test project (that is, one that tests some actual code) may contain a good few hundred tests, let’s assume they all take 200ms – that means that 300 tests take a total of 60 seconds!

One thing that conducting DB tests does give you is an idea as to how fast (or slow) they actually are:

Isolated

It’s incredibly difficult to produce a database unit test that is isolated because, by its nature, a database had dependencies. Certainly, if anything you’re testing is dependent on a particular data state (for example, in the case above, the product that we are looking for must exist in a table, and have a VAT rate) then, unless this state is set-up in the test itself, this rule is broken.

Repeatable

Again – this isn’t a small problem with databases. Should I change Column A to test a trigger on the table, am I then able to change it again. What if the data is in a different state when I run the unit tests from the last time – I might get rogue fails, or worse, rogue passes. What happens if the test crashes half way through, how do we revert?

Self-verifying

In my example before, I changed Column A in order to test a trigger, and I’ll maybe check something that is updated by the trigger. Providing that the assertion is inside the test, the test is self-verifying. Obviously, this is easier to do wrong in a database context, because if I do nothing, the data is left in a state that can be externally verified.

Timely

This refers to when a test is written. There’s nothing inherent about database tests that prevent them from being written before, or very shortly after the code is written. However, see the comment above as to whether new code written like this makes sense.

Problems With A Database Test Project

Given what we’ve put above, let’s look at the outstanding issues that realistically need to be solved in order to use database tests:

1. Deployment. Running a standard code test will run the code wherever you are; however, a database test, whichever way you look at it, needs a database before it runs.

2. Rollback. Each test needs to be isolated, and so there needs to be a way to revert to the database state before the tests began.

3. Set-up. Any dependencies that the tests have, must be inside the test; therefore, if a table needs to have three rows in it, we need to add those rows within the test.

4. Assertion. What are we testing, and what makes sense to test; each test needs a defined purpose.

Example Project

In order to explore the various possibilities when setting up a database project, I’m going to use an example project:

Let’s start with some functionality to test. I’m going to do it this way around for two reasons: having code to test better illustrates the problems faced by database tests, and it is my belief that much of the database logic code is legacy and, therefore, already exists.

This is for the purpose of illustration, so obviously, there are things here that might not make sense in real life; however, the logic is very testable. Let’s deploy this to a database, and do a quick manual test:

Okay – there are a number of problems with this test, but let’s pretend for a minute that we don’t know what they are; the test passes:

Let’s run it again, just to be sure:

Oops.

Let’s firstly check this against the test principles that we discussed before.
1. Is it fast? 337ms means that we can run 3 of these per second. So that’s a ‘no’.
2. Is it Isolated? Does is have a single reason to fail – and can it live independently? If we accept that the engine itself is a reason to fail, but ignore that, then we can look specifically at the test, which asserts nothing. What’s more, it is doing two separate things to the DB, so both can fail realistically.
3. Is it Repeatable? Clearly not.
4. Is it self-verifying? No – it isn’t, because we have no assertions in it. Although we know that on the first run, both queries worked, we don’t know why.
5. Timely – well, we did write it directly after the code, so that’s probably a tick.

So, we know that the second run didn’t work. A quick look at the DB will tell us why:

Of course, the test committed a transaction to the database, as a result, any subsequent runs will fail.

The Solution

What follows is a suggested solution for this kind of problem, along with the beginnings of a framework for database testing. The tests here are using MSTest, but the exact same concept is easily achievable in Nunit and, I imagine, every other testing framework.

So, we now have a deployment task, and a connection, the next step is to run the tests in a way in which they are repeatable. The key here is to use transactions. Going back to the base class, we can wrap this functionality into a method that can simply be inherited by all unit tests.

We now have a base test class that will deploy the database, establish a new connection, and transaction; and then, on completion of the test, will roll back the transaction. Here’s what the above test now looks like:

The idea behind the framework described above is that the data is never committed to the database; as a consequence of this, the tests are repeatable, because nothing ever changes. The unfortunate side-effect here is that debugging the test is made more difficult as, if it fails, it is not possible to see directly which changes have been made. There’s a couple of ways around this. One of which is to simply debug the test, and then manually fire a commit, look at the data and continue. However, a SQL expert recently introduced me to a concept of “Dirty Reads”.

Dirty Reads

Dirty reads are achieved by issuing the following command the SQL Server:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

This allows you to see changes in the database which are still pending (that is, they have yet to be committed). What this means is that you can see the state of the data as it currently is, it also doesn’t place a lock on the data. One of the big issues with using this methodology is that you can see half committed transactions; of course, in this instance, that’s exactly what you want! Let’s debug our unit test:

Now let’s have a look at the SalesOrder table:

Not only does this not return anything, it doesn’t return at all. We’ve locked the table, and held it in a transaction. Let’s apply our dirty read and see what happens:

Instantly, we get the SalesOrder. If we now complete the test and run the query again, the data is gone: