Writing Tests for Data Access Code – Unit Tests Are Waste

A few years ago I was one of those developers who write unit tests for my data access code. I was testing everything in isolation, and I was pretty pleased with myself. I honestly thought that I was doing a good job.

Unit Tests Answers to the Wrong Question

We write tests for our data access code because we want to know that it works as expected. In other words, we want to find the answers to these questions:

Is the correct data stored to the used database?

Does our database query return the correct data?

Can unit tests help us to find the answers we seek?

Well, one of the most fundamental rules of unit testing is that unit tests shouldn’t use external systems such as a database. This rule isn’t a good fit for the situation at hand because the responsibility of storing correct information and returning correct query results is divided between our data access code and the used database.

For example, when our application executes a single database query, the responsibility is divided as follows:

The database is responsible of executing the database query and returning the query results back to the data access code.

The thing is that if we isolate our data access code from the database, we can test that our data access code creates the “correct” query, but we cannot ensure that the created query returns the correct query results.

That is why unit tests cannot help us to find the answers we seek.

A Cautionary Tale: Mocks Are Part of the Problem

There was a time when I wrote unit tests for my data access code. At the time I had two rules:

Every piece of code must be tested in isolation.

Let’s use mocks.

I was working in a project which used Spring Data JPA, and dynamic queries were built by using JPA criteria queries.

Anyway, I created a specification builder class which builds Specification<Person> objects. After I had created a Specification<Person> object, I passed it forward to my Spring Data JPA repository which executed the query and returns the query results.

Let’s take a look at the test code which “verifies” that the specification builder class creates “the correct” query. Remember that I wrote this test class by following my own rules which means that the result should be great.

The source code of the PersonSpecificationsTest class looks as follows:

I have to admit that this test is a piece of shit which has no value to anyone, and it should be deleted as soon as possible. This test has three major problems:

It doesn’t help us to ensure that the database query returns the correct results.

It is hard to read and to make matters worse, it describes how the query is build but it doesn’t describe what it should return.

Tests like this are hard to write and maintain.

The truth is that this unit test is a textbook example of a test that should have never been written. It has no value to us, but we still have to maintain it. Thus, it is waste!

And yet, this is what happens if we write unit tests for our data access code. We end up with a test suite which doesn’t test the right things.

Data Access Tests Done Right

I am a big fan of unit testing but there are situations when it is not the best tool for the job. This is one of those situations.

Data access code has a very strong relationship with the used data storage. That relationship is so strong that the data access code itself isn’t useful without the data storage. That is why it makes no sense to isolate our data access code from the used data storage.

The solution to this problem is simple.

If we want to write comprehensive tests for our data access code, we must test our data access code together with the used data storage. This means that we have to forget unit tests and start writing integration tests.

We must understand that only integration tests can verify that

Our data access code creates the correct database queries.

Our database returns the correct query results.

If you want to know how you can write integration tests for Spring powered repositories, you should read my blog post titled Spring Data JPA Tutorial: Integration Testing. It describes how you can write integration tests for Spring Data JPA repositories.

Summary

This blog post has taught us two things:

We learned that unit tests cannot help us to verify that our data access code working properly because we cannot ensure that the correct data is inserted to our data storage or that our queries return the correct results.

We learned that we should test our data access code by using integration tests because the relationship between our data access code and the used data storage is so tight that it makes no sense to separate them.

it also raises a question: is testing a dao with junit and h2 actually an integration test?

I would say that it is an integration test because the H2 database is used to simulate the behavior of the production database. However, I must admit that I think it is fine if someone thinks that this is a unit test. Naming is mostly semantics anyway, and the most important thing is that someone wrote that test.

I think testing the data acess code using junit and H2 is a great solution to test complex queries and the Hibernate mapping. Unfortunately this is not an option if you use your database’s proprietary SQL syntax. It would be great if all databases provided an in-memory option for testing.

I agree with you. One of the biggest benefits of using the H2 database is that you get “fast” feedback. Of course it cannot guarantee that your application works when you deploy it to production environment, but I haven’t missed any bugs because I used H2 database in my integration tests.

Good post! I agree that it doesn’t make sense to unit test external system resources such as databases. How often do you add integration tests for the application database interactions in your projects?

My opinion is that in large applications, often times the integration tests just takes longer and longer to run over time and eventually have less value because you don’t want to wait on getting the feedback. I think automated end-to-end tests which tests the critical paths from the UI throughout the application and to the database is the most useful in at least large applications.

How often do you add integration tests for the application database interactions in your projects?

I do this all the time. I write integration tests for every repository method which I have created. Also, I write end-to-end tests for each feature as well.

My opinion is that in large applications, often times the integration tests just takes longer and longer to run over time and eventually have less value because you don’t want to wait on getting the feedback.

If you use the H2 memory database when you run the integration tests in the development environment, integration tests aren’t really very slow. For example, I have an integration test suite which has about 1600 integration tests and it takes about 90 seconds to run the full test suite (these tests use JUnit, Spring MVC Test, DbUnit, and H2 in-memory database).

On the other hand, you should also run your test suite against the same database server which is used in production. Because running integration tests against a real database is often slower than running them against an in-memory database, you should let the CI server do the heavy lifting for you.

I think automated end-to-end tests which tests the critical paths from the UI throughout the application and to the database is the most useful in at least large applications.

This is probably good enough if you only want to ensure that your application is working correctly. I want that my tests document the behavior of each class and help me to find the problem ASAP if a test fails. That is why I tend write more tests than an average developer.

This raises an interesting question: are my tests waste?

At the moment I think that they aren’t waste, but the odds are that if you ask this question again two years from now, I will give you a different answer.

to make sure that the part of the program does what it is supposed to do

to not contain implementation details, but be written from a blackbox view

to test only a small portion of the program

which completely suit your post, since the first test was not written from a blackbox view at all. and even though it tested only a small portion of the program, it would not make sure that the part of the program did what it was supposed to do.

However one question:

How do you deal with transactions. For example, imagine you’d use hibernate, and you’d open a transaction in the ServiceLayer which talks to the dao. Your dao is tested, and works as expected. But the transaction which was started outside the dao would have a negative effect on what was happening inside the dao. Maybe there was a detached entity, or an unassigned persistent bag. Just anything that damaged what the dao would do, but happened outside? How do you deal with these in tests

It is extremely important that we test transactions as well. Typically I do this by writing end-to-end tests with the Spring MVC Test framework. A single end-to-end test tests a specific business requirement and ensures that:

The correct changes are made to the database if the transaction is committed.

No changes are made to the database if the transaction is rollbacked.

Of course some features only read information from the database, but if a feature writes something to the database, I will write end-to-end tests which ensure that the feature works correctly in “every” situation.

Sometimes I write integration tests which calls the tested service method and ensures that the transactions are working, but typically I do this only if the entry point of the feature is hard to test.

I hope that this answered to your question. If not, please ask more questions! :)

I have to admit that this test is a piece of shit which has no value to anyone, and it should be deleted as soon as possible.

:)

One of the issues with unit vs integration testing seems to be that people often do not agree on what the terms actually mean. I’ve recently written a similar post: “Stop Unit-Testing Database Code“. It appeared that this triggered a bit of confusion over on Reddit where people were actually inline with my (and your) line of thoughts, but mis-read the “unit” part in “unit testing”.

I think the important aspect here is that “unit” testing is most often used to test algorithms that have no side-effects on external state, whereas integration testing is the only sane and reasonable thing to do when testing a system in the context of its surrounding systems with all state that is distributed across such systems. Databases being one example of such external state-providing systems.

Yeah. I cannot figure out what I was thinking when I wrote that test. Probably I was not thinking at all.

I think the important aspect here is that “unit” testing is most often used to test algorithms that have no side-effects on external state

I have actually spend a lot of time thinking about unit tests, and more specifically, when it makes sense to write them. At the moment I am writing a lot of unit tests because of two reasons:

I want to document my in a way which ensures that the documentation is always up-to-date.

I want to ensure that if a test fails, I will find the problem as soon as possible.

This means that I will write unit tests for components which alter the external state (e.g. application and domain services). But I agree that the “purest” form of unit testing takes place when you are writing tests for components which don’t have side-effects on external state.

whereas integration testing is the only sane and reasonable thing to do when testing a system in the context of its surrounding systems with all state that is distributed across such systems. Databases being one example of such external state-providing systems.

Yes, I wouldn’t write software without writing integration tests. If you have 100% unit test coverage (which you don’t have), you cannot be sure that the components of your software work together (unless you write integration tests).

This means that I will write unit tests for components which alter the external state (e.g. application and domain services). But I agree that the “purest” form of unit testing takes place when you are writing tests for components which don’t have side-effects on external state.

Yep, that’s probably where people’s confusion about the terms comes from. To me, you’re already integration testing stuff up there. But maybe that distinction is not very helpful as the border lines aren’t very clear.

Interesting (and not totally against my beliefs either). Typically I have three different setups for “unit tests” which test either an application or a domain service:

The component is tested in isolation (e.g. dependencies are replaced with mocks or stubs). I would say that this is a unit test.

The component is tested together with its dependencies. This setup is often useful if the dependencies do not touch external resources such as database. I would say that this test is an integration test.

Some dependencies are mocks/stubs and some are actual objects. Again, I would say that this is an integration test.

This is indeed a bit confusing… ;) But typically I run all of these tests in my unit test suite because they don’t use resources or components which aren’t part of the my application’s code base (e.g. database, a third party REST API, and so on).

But maybe that distinction is not very helpful as the border lines aren’t very clear.

I agree. I think that most flame wars are started because people use different definitions and never bother to find out what definition the other person use (or they just ignore it as stupid).

Thanks for sharing this informative topic. I totally agree with you that mocking database is something which might prove unnecessary in several cases . I could think of one case where in we might require to do so. Say we have a scenario wherein we want to test our service layer which is going to interact with the data layer and we want to see how our service layer responds to any exceptions being thrown from the data layer. Say in a concurrent system with hibernate as our JPA provider we want to see how our service layer behaves when the data layer throws an OptimisticLockException or any other exceptions then can we simulate it by mocking the data layer to throw exception (which I feel would be easier than using real database and trying for the exception to be thrown).

I would like to apologise here if whatever I stated here doesnot make much sense as my coding experience is limited and I havent got much chance to write extensive test cases that I can comment with confidence.

I could think of one case where in we might require to do so. Say we have a scenario wherein we want to test our service layer which is going to interact with the data layer and we want to see how our service layer responds to any exceptions being thrown from the data layer.

I think that this makes sense, and I mock my repositories (or DAOs) when I write unit tests for service classes.

Say in a concurrent system with hibernate as our JPA provider we want to see how our service layer behaves when the data layer throws an OptimisticLockException or any other exceptions then can we simulate it by mocking the data layer to throw exception (which I feel would be easier than using real database and trying for the exception to be thrown).

This is a very good example. If something makes your life easier, it is definitely the right thing to do.

I would like to apologise here if whatever I stated here doesnot make much sense as my coding experience is limited and I havent got much chance to write extensive test cases that I can comment with confidence.

It makes the perfect sense. Also, remember that even if someone has written a lot of tests, it doesn’t necessarily mean that this person knows everything. Open discussion and asking questions is the best way to learn and often learning requires that the student questions so called authorities. :)

Hi Petri, You have given good logic to an idea that I have always felt right. When I have started writing tests for an legacy application there were hell lot of tests which made no sence. I wrote an article about 3 years back on the similar lines http://blog.manupk.com/2011/11/when-to-replace-unit-tests-with.html over this frustration.

Also, I read your blog post, and your examples reminded me that if we write unit tests for data access code, sometimes we have to mock so many things that the test cannot fail. And if a test cannot fail, does it really have any value?

Nope.

In other words, we end up with a test suite which contains tests that are very brittle and don’t really test anything relevant, or test which cannot fail.

Hi Petri. Its great article and rightly said, but incomplete. Could you please post an update with examples how you advice to do integration tests replacing unit tests? Biggest problem with writing integration tests is we can’t use real production database so have to rely on in-memory database, right? As many DAO to test require inserting data in database and we don’t wanna do that during build time test runs. If we do should have a reliable way to do total roll back after tests are complete. I once wrote lot of integration tests where I was testing DAO’s and writing records to real database in DB but after tests rolling them back. Problem was in case test breaks may skip rollback and records may stay in database. What is best way to use real database for integration tests with reliable rollback?

You are right. This article (or this tutorial) won’t help you if you are looking for instructions how you can write the actual test classes. The goal of this tutorial is to simply identify the techniques which you should use when you write tests for data access code.

Could you please post an update with examples how you advice to do integration tests replacing unit tests?

Are you using Spring Framework or Spring Boot? If so, what version?

Biggest problem with writing integration tests is we can’t use real production database so have to rely on in-memory database, right?

Actually, you can also use a real database (such as PostgreSQL) when you run your integration tests. I guess the only reason why people use an in-memory database is that it is fast and it doesn’t require any setup after it’s configured for the first time.

As many DAO to test require inserting data in database and we don’t wanna do that during build time test runs. If we do should have a reliable way to do total roll back after tests are complete

All my integration tests initialize the used database into a known state before an integration test method is run. This way I will know what data is found from the database when my tests are run. And yes, I know that my tests are slower and I cannot use large data sets but I can live with it because I don’t need to roll back the transaction after each test.

What is best way to use real database for integration tests with reliable rollback?