Testing The Right Way

Let’s assume the code functions properly: doesn’t suffer from scaling issues, performance issues, security issues, memory leaks, etc.
If the code functions properly and you’re passing judgment based on syntax, some standard, different ways of doing the same thing, etc… I disagree.

When you judge somebody’s code, you should generally base it on ONE thing: Testability. Everything else is just an opinion.

What do I mean by “testability”? The ability to unit test your code with ease. Testability also implies loose coupling, maintainability, readability, and more. You’ve heard that term “TDD” being thrown around by different software engineers, right? If I ask a developer to explain Test-Driven Development, 9 times out of 10 he/she would explain that it’s the practice of writing tests first. I disagree. Writing tests before code certainly forces you to write your code in a more test-driven manner… but TDD has less to do with when you write your tests and more to do with being mindful of writing testable code. Whether you write your tests before or after your code, you can be very test-driven!

I noticed most developers also don’t truly understand the difference between an integration test and a unit test, so before we discuss unit tests, let’s briefly cover the 3 major types of tests: Unit tests, integration tests, and end-to-end tests.

End-to-end Tests:

E2E tests describe testing the application from end to end. This involves navigating through the workflow/UI, clicking buttons, etc. Although this should be performed by both developer and QA, comprehensive E2E testing should be performed by QA. These tests can be (and ideally, should be) expedited through automated testing tools such as selenium.
Cons:
– Failed E2E tests don’t tell you where the problem lies. It simply tells you something isn’t working.
– E2E tests take a long time to run
– No matter how comprehensive your E2E tests are, it can’t possibly test every edge case or piece of code

Integration Tests:

Integration tests (aka functional tests, API tests) describe tests against your service, API, or the integration of multiple components. These tests can be maintained by either developer or QA, but ideally the QA team should be equipped to perform comprehensive integration tests with tools such as SoapUI. Once a service contract is established, the QA team can start writing tests at the same time the developer starts writing code. Once complete, the integration test can be used to test the developer’s implementation of the same contract. Note: Intended unit tests that cover multiple layers of code/logic is also considered an integration test.
Cons:
– Failed integration tests don’t tell you exactly where the problem lies, although it narrows it down more than an E2E test.
– Integration tests take longer to run than unit tests
– Integration tests may or may not make remote calls, write entries to the DB, write to disk, etc.
– It takes many more integration tests to cover what unit tests cover (combinatoric)

Unit Tests:

Unit tests are laser focused to test a small unit of code. It should not test external dependencies as they should be mocked out. Properties of a unit test include:
– Able to be fully automated
– Has full control over all the pieces running (Use mocks or stubs to achieve this isolation when needed)
– Can be run in any order if part of many other tests
– Runs in memory (no DB, file access, remote calls, for example)
– Consistently returns the same result (You always run the same test, so no random numbers, for example. save those for integration or range tests)
– Runs fast
– Tests a single logical concept in the system (and mocks out external dependencies)
– Readable
– Maintainable
– Trustworthy (when you see its result, you don’t need to debug the code just to be sure)
– Should contain little to no logic in the test. (Avoid for loops, helper functions, etc)

Why unit tests are the most important types of tests:
– Acts as documentation for your code
– A high unit test code coverage means your code is well tested
– A failed test is easy to fix
– A failed test pinpoints where your code broke
– A developer can be confident he/she won’t introduce regressions when modifying a well tested codebase
– Can catch the most potential bugs out of all three test types
– Encourages good, loosely-coupled code

How to write unit tests

I’m going to demonstrate unit tests in Python, but please note that writing unit tests in Python is more forgiving than say, Java or C# because of its monkey patching capabilities, duck-typing, and multiple inheritance (no Interfaces). Please follow my example from https://github.com/cranklin/dog-breed-api which is my sample Python/Flask API.

Let’s take a look at a snippet of code I wish to test in dogbreed/actions.py. The method is Actions.get_all_breeds().

Notice this method makes a call to an external dependency… an object or class called “Breed”. Breed happens to be a SQLAlchemy Model (ORM). Many online sources will encourage you to utilize the setUp() and tearDown() methods to initiate and clear your database and allow the tests to make calls to your DB. I understand it’s difficult to mock out the ORM, but this is wrong. Mock it out! You don’t want to write to the DB or filesystem. It’s also not your responsibility to test anything outside the scope of Actions.get_all_breeds(). As long as your method does exactly what it’s supposed to do and honors its end of the contract, if something breaks, it’s not the method’s fault.

Here’s how I tested it in dogbreed/tests/test_actions.py. The test method is called ActionsTest.test_get_all_breeds().

I’m using the Mock library to create two different mock Breed models when initiating this test. Once I have that in place, I can mock out the Breed.query object and let its all() method return a list of the two mock breed models I set up earlier. In Python, we are fortunate enough to be able to patch objects/methods on the fly, and run the tests within the patched object context.Note: In Java, C#, or other strict OOP languages, this is not possible. Therefore, it is considered good practice in these languages to inject your dependencies and utilize the respective interface class to generate a mock object of its dependencies and inject the mock object in place of the dependencies in your tests. Yes. Python devs are spoiled.
Now that I’ve mocked/patched the dependencies out, we run the class method. The things you should remember to test for:
– how many times did you call its dependencies
– what arguments did you call its dependencies with
– is the response what you expected?

Now let’s look at how I tested the service layer of the API that calls this method. This can be found in dogbreed.routes.py:

This opens up the endpoint ‘/api/breeds’ which calls an external dependency which is the Actions class. The Actions.get_all_breeds() method is what we already tested above, so we can mock it out. The test for this endpoint can be found in dogbreed/tests/test_views.py.

Once again, I’ve patched the external dependency with a mock object that returns what it’s meant to return. With these service layer tests, what I’m mainly interested in, is that the dependency is called with the proper arguments. Notice the isolation of each test? That’s how unit tests should work!

But something is missing here. So far, I’ve only tested the happy path cases. Let’s test for an exception to be properly raised. In this particular method, we allow a user to vote on a dog as long as the user hasn’t cast a vote before. This is a snippet from dogbreed/actions.py

You’ll notice that if there is an entry found in the client database which matches the user agent of the voter, it raises an exception NotAllowed.Note, in many other languages, it is considered poor practice to raise exceptions for cases that fall within the confines of normal business logic. Exceptions should be saved for true exceptions. However, Pythonistas for some reason consider it to be standard practice to utilize exceptions to bubble up errors, so don’t judge me for doing so.
In order to test that piece of logic, we can simply mock out Client.query to return an entry and it should induce that exception. This is a snippet from dogbreed/tests/test_actions.py

Once again, we verify that the dependency was called with the correct argument. We also verify that the proper exception was raised when we make the call to the tested method.

How about the service layer portion of this error? We’ve set up Flask to catch that particular exception and interpret it into a HTTP status code 403 with a message. Here is the endpoint for that call, found in dogbreed/routes.py:

In order to verify that the particular exception is handled correctly and interpreted to a HTTP 403, we mock out our dependency once again, and allow the mock to raise that same exception. This test is found in dogbreed/tests/test_views.py

Notice the Mock() object allows you to raise an exception with side_effect. Now we can raise an exception just as the Actions class would have raised, except we don’t even have to touch it! Now we can assert that the response data from the POST call has a status code of 403 and the proper error message associated with it. We also verify that the dependency was called with the proper arguments.

Remember I mentioned that unit tests are harder to write in Java or C#? Well, if Python didn’t have the luxury of patch(), we’d have to write our code like this:

Notice, I’ve injected the dependency in the constructor.In Java or C#, there is such a thing as constructor injection as well as setter injection. Even this type of dependency injection in Python does not compare to Java or C# because an interface is not necessary for us to generate a Mock and pass in because Python is a duck-typed language.
In order to test this, we’d do something like this:

Now, we’d generate a mock Breed.query object, assign its method all() to return our mock data, inject it into Actions when instantiating an Actions object, then run the object method “get_all_breeds()”. Then we make assertions against the response as well as assert that the mock object’s methods were called with the proper arguments. This is how one would write testable code and corresponding tests in a more Java-esque fashion… but in Python.

Furthermore, I categorize unit tests into two types: Contract tests and Collaboration tests.

Collaboration tests insure that your code interacts with its collaborators correctly. These verify that the code sends correct messages and arguments to its collaborators. It also verifies that the output of the collaborators are handled correctly.

Contract tests insure that your code implements its contracts correctly. Of course, contract tests aren’t as easily distinguishable in Python because of the lack of interfaces. However, with the proper use of multiple inheritance, a good Python developer SHOULD distinguish mixin classes that provide a HAS-A versus an IS-A relationship.

Running the test

In python, nose is the standard test runner. In this example, we run nosetests to run all of the tests:

Why did I run nosetests with the option “–with-coverage”? Because that tells us what percent of each module’s code I have covered with my tests. Additionally, –cover-package=dogbreed limits that to the modules within my app (and not include code coverage for all the third party pip packages under my virtual environment).

Why is coverage important? Because that is one way of determining if you have sufficient tests. Be warned, however. Even if you reach 100% code coverage, it doesn’t mean you’ve covered each of the edge cases. Also be warned, that often times it is impossible to reach 100% code coverage. For example, if you look at my nosetest results, you’ll notice that lines 10-25 are not covered in dogbreed/base/models.py and only 45% of it is covered. You’ll also notice that dogbreed/models.py is only 80% covered. In my particular example, SQLalchemy is very difficult to test without actually writing to the DB. What’s most important, however, is that any code that contains business logic and the service layer is fully covered. As you can see, I have achieved that with my tests.In Java and/or C#, private constructors cannot be tested without using reflections… in which case, it’s just not worth it. Hereby presenting another good reason why 100% coverage may not be reached.

Ironically, as difficult as it may seem to write testable code in Java and/or C#, (and as simple as it is to write testable code in Python) I find that Java, C# developers tend to display more discipline when it comes to writing good, testable code.

Python is a great language, but the lack of discipline that is common amongst Python and Javascript developers is unsettling. People used to complain that PHP encouraged bad code, yet PHP at least encourages the use of Interfaces, DI, and IoC with most of the popular frameworks such as Laravel, Symphony2, Zend2, etc!

Perhaps it’s because new developers seem to crash course software engineering and skip important topics such as testing?
Perhaps it’s because lately, less developers learn strict typed languages like Java or C++ and jump straight into forgiving, less disciplined languages?
Perhaps it’s because software engineers are pushed to deliver fast code over good code?

It was not easy to find someone who mocks DB for unit tests of flask application.

I have a question:
Why when you test get_all_breeds (class method) you create mock_breed_one_data and mock_breed_two_data dicts with so much data inside (like real data)?
What I mean is most likely key names/amount will change in your app… of course test will still pass but your test data may start to be confusing as it will not represent real data anymore.

Would it be better to put just id there… or some other very simple data?

Anyway thank you very much for your post – I am trying to rewrite tests of my application as without unittests it is hard to extract parts of my app to separate repos.

Milosz, that’s an excellent question. To answer your question, I’ll typically preach London school (aka mockist) TDD practices as it requires more discipline vs the Detroit school (aka classicist) TDD practices. Neither approach is incorrect as both are just different denominations to testing. Personally, I think each provides advantages at different times… for example, when backfilling legacy code with tests before a refactor, Detroit (classic) will likely suit you better. Neither is wrong, but since most “testers” seem to make their tests too “integration-ish”, I’ll preach the mockist approach and leave it up to the reader’s discretion to employ the right testing practices.