testing equivalences

Given my history of posts on this blog you may be forgiven for thinking that what follows is yet another discourse on the inner workings of Equivalence in Atlas. Refreshingly for both of us, that’s not the case this time. Today I’m going to be taking a look at Guava‘s Equivalence type and the convenient companion test class, EquivalenceTester, from Guava’s testlib.

equivawhatnow?

So what is an Equivalence and why do we need them? It turns out that, while having an equals method on all types in Java is useful, it’s quite often the case that the interchangeability of two objects changes in different contexts. An Equivalence provides a way of defining alternative equivalence relations over a given type.

Here’s a more concrete example. Every piece of Content in Atlas is assigned an identification number. In most cases, when two pieces of Content have the same ID we can assume that they are really the same thing so the equals method simply checks the IDs of the two pieces of Content match. When updating Content it’s a slightly more complex affair. Often adapters will write Content to Atlas even when nothing has changed because they tend to poll the source data more frequently than it changes. Ideally, we don’t want that write to happen, especially when using MongoDB where large amounts of writes can lock up a database. Even when the database isn’t the motivation, it’s still nice to know when something actually changes to prevent unnecessary updates propagating through the system.

For a while now, our strategy for determining if a piece of Content was changed at write-time has been based on the hash of serialized form of the Content for MongoDB, essentially a large amount of HashMaps. Just before the write is performed the hash of JSON map is compared with a hash which was computed in the same way at read-time. Since Atlas is moving from MongoDB it sure would be nice to replace this slightly backward mechanism with something that’s easier to debug, test and reason about. And that’s were an Equivalence comes in handy.

equivahownow?

Defining an Equivalence for a type is no more difficult than implementing the equals and hashCode methods for that type and probably slightly easier. Any concrete subclass of Equivalence has to implement two methods:

boolean doEquivalent(T a, T b): returns true if and only if a and b are equivalent. Must be reflexive, symmetric, transitive and consistent. Both references are guaranteed to be non-null.

int doHash(T a): returns a consistent hash code for a. If doEquivalent(a, b) is true then doHash(a) must equal doHash(b).

As you can see, it’s pretty similar to the standard equals and hashCode methods and just as with those methods getting an Equivalence just right is sometimes a little tricky. Fortunately along with EqualsTester Guava’s testlib also has an EquivlanceTester. Its usage is straightforward and might look a little like this:

The above will check that all the content in the first group are equivalent to eachother and likewise for the content in the second group. It will also ensure that none of the first group is equivalent to any of the second.

So how can this be applied to our Content example from before? Well, ContentEquivalence‘s doEquivalent needs to check all the relevant fields are equal (there are a few fields, like updated times, that we’re not interested in) while doHash can simply return the hash of the Content’s ID. As you might expect, the implementation of doEquivalent relies heavily on the null-safe equality provided by Objects.equal() but arguably the more interesting part is how it’s tested.

The unit tests for ContentEquivalence should break whenever Content-related classes are modified in such a way that the Equivalence breaks without the test requiring modification. As the tests are run after the breaking modification the failure will highlight that ContentEquivalence needs fixing automatically. One way to achieve this aim to add an equivalence group for each field in the Content class and give that field a value using reflection. For Episodes, one type of Content, it might look a little like this:

The above traverses all the “setter” methods of the Episode class and uses valueFor(), which relies on a ClassToInstanceMap, to call those methods on new Episodes for each equivalence group. Each additional equivalence group in the tester has one more method called on the Episodes in that group. When a new field, and therefore setter method, is added to the Episode class, and additional group will be added to the tester but, unless the ContentEquivalence has also been updated, the new group’s members will also be equivalent to an existing group and so the test will fail. Using setter methods to config each object is not ideal but does leave scope for switching to a Builder pattern and looking for methods starting with “with”.

Overall this solution is much more efficient, maintainable and hopefully future-proof compared to the previous of determining equivalence through serialized form. It also means that this sort of “deep” equivalence can be performed anywhere without having to allocate all the HashMaps.