Clojure: How To Prevent "Expected Map, Got Vector" And Similar Errors

What my Clojure code is doing most of the time is transforming data. Yet I cannot see the shape of data being transformed - I have to know what the data looks like on the input and hold a mental model of how they change at each step. But I make mistakes. I make mistakes in my code so that the data does not correspond anymore to the model it should follow. And I make mistakes in my mental model of what the data currently looks like, leading likely to a code error later on. The end result is the same - a little helpful exception at some later step regarding wrong shape of data. There are two problems here: The error typically provides too little useful information and it usually manifests later than where the code/model mistake actually is. I therefore easily spend an hour or more troubleshooting these mistakes. In addition to that, it is also hard to read such code because a reader lacks the writer's mental model of the data and has to derive it herself - which is quite hard especially if the shape of the input data is not clear in the first place.

I should mention that I of course write tests and experiment in the REPL but I still hit these problems so it is not enough for me. Tests cannot protect me from having a wrong model of the input data (since I write the [unit] tests based on the same assumptions as the code and only discover the error when I integrate all the bits) and even if they help to discover an error, it is still time-consuming the root cause.

Can I do better? I believe I can.

The hard to troubleshoot errors with delayed manifestation and hard to understand code that communicates only half of the story (the transformations but not the shape of the data being transformed) is the price we pay for the power of dynamic typing. But there are strategies to lower this price. I want to present three of them: small, focused functions with good names, destructuring as documentation, and judicious use of pre- and post- conditions.

The content of this post is based on what I learned from Ulises Cerviño Beresi during one of the free Clojure Office Hours he generously offers, similarly to Leif in the US.

So we need to make the shape of data more obvious and to fail fast, preferably with a helpful error message.

The main idea is:

Break transformations into small, simple functions with clear names

Use destructuring in function arguments to document what data is expected

Use pre- and post-conditions (and/or asserts) both as checks and documentation

(All the testing and interactive exploration in REPL that you already do.)

A simplified example

We have a webshop that sells discounted cars. We also have occasional campaigns with increased discounts for selected cars. For each car we have also a number of keywords people can use to find it and categories it belongs to. Below is code that processes raw car + campaigns + search keywords data from a DB query, first the original and then the refactored one with checks:

A real-world example

We have a webshop that sells discounted cars. Each car we sell has a base discount (either an absolute amount or percentage) and we also have occasional campaigns for selected cars. For each car we have also a number of keywords people can use to find it.

Original code

Below is code that processes raw car + campaigns + search keywords data from a DB query, selected the best applicable campaign and computing the final discount:

Defects and me

I had originally two [discovered] errors in the code and both took me quite a while to fix - first I forgot to convert JSON from string into a map (wrong assumption about input data) and then I run merge-campaigns directly on the list of car+campaign lists instead of mapping it (the sequential? precondition did not help to detect this error). So the transformations are clearly too error-prone.

The stack traces did not contain enough helpful context info (though a more experienced Clojurist would have certainly found and fixed the root causes much faster):

(def count-price (with-valid-car (fn [car] (do-something car))))
;; or make & use a macro to make it nicer

What about static types

This looks arguably like a good case for static types. And yes, I come from Java and lack them. On the other hand, even though static typing would solve the main category of problems, it creates new ones and has its liits.

A) I have actually quite a number of "types" here so it would require lot of classes to model fully:

Raw data from the DB - car with campaign fields and keywords, category_ref as java.sql.Array

Car with keywords as a sequence

Car with category_ref as a sequence

Car with a nested :campaign "object"

Car with a nested :best-campaign object and with :rate (you could have :rate there from start, set initially to nil, but then you'd still need to ensure that the final function sets it to a value)

B) A key strength of Clojure is the use of generic data structures - maps, vectors, lazy sequences - and powerful, easy to combine generic functions operating on them. It makes integrating libraries very easy since everything is just a map (and not a custom type that needs to be converted) and you can always transform these with your old good friends functions - whether it is a Korma SQL query definition, result set, or a HTTP request. Static types take this away.

C) Types permit only a subset of checks that you might need (that is unless you use Haskell :)) - they can check that a thing is a car but not that a return value is in the range 7 ... 42.

D) Some functions do not care about the type, only its small part - f.ex. jdbc-array-to-set only cares about the argument being a map, having the key, and if set, the value being a java.sql.Array.

What else is out there?

Conclusion

Using smaller functions and pre+post conditions, I can discover errors much earlier and also document the expected shape of the data better, even more so with destructuring in fn signatures. There is some duplication in the pre/post conditions and the error messages are little helpful but is much better. I guess that more complex cases may warant the use of core.contracts or even core.typed / schema.

What strategies do you use? What would you improve? Other comments?

I encourage you to fork and improve the gist and share your take on it.

Updates

Lawrence Krubner recommends using dire to capture the arguments and return value to provide a useful error message

Alf Kristian recommends adding more tests and integration tests and if it is not enough, using core.typed rather than :pre and :post (example)