Tuesday, September 25, 2012

Replacing Common Code With clojure.set Function Calls

If you've written a fair amount of Clojure code and aren't familiar with clojure.set, then chances are you've probably reinvented a few functions that are already available in the standard library. In this blog post I'll give a few examples of commonly written code, and I'll show the clojure.set functions that already do everything you need.
Removing elements from a collection is a very common programming task. Sometimes the collection will need to be a vector or a list, and removing an element from the collection will look similar to the example below.

user=> (remove #{1 2} [1 2 3 4 3 2 1])
(3 4 3)

In the cases where you're starting with a list and you want to return a seq, remove is a good solution. However, you may also find yourself starting with a set or looking to return a set.

If you're starting with sets, you'll probably get a performance gain by using clojure.set/difference, and if you're going to need a set returned it's less code and likely more performant to use clojure.set/difference rather than calling clojure.core/set on the results of clojure.core/remove.

clojure.set/difference is simple to use - from the docs

Usage: (difference s1)
(difference s1 s2)
(difference s1 s2 & sets)
Return a set that is the first set without elements of the remaining sets

A simple example of using clojure.set/difference can be found below.

user=> (clojure.set/difference #{1 2 3 4 5} #{1 2} #{3})
#{4 5}

Transforming data in clojure is something I do very often. On many occasions I've had a list of maps and I wanted them indexed by 1 or more values. This is fairly easy to do with reduce and update-in, as the example below demonstrates.

The reduce + update-in combo is a good one, but clojure.set/index is even better - since it's both more concise and doesn't require you to define an anonymous function.
clojure.set/index is also very straightforward to use - from the docs

Usage: (index xrel ks)
Returns a map of the distinct values of ks in the xrel mapped to a set of the maps in xrel with the corresponding values of ks.

The example below demonstrates how you can get very similar results to what is above by using clojure.set/index.

It is worth noting that the reduce + update-in example has seqs as values and can contain duplicates, and the clojure.set/index example has sets as values and will not contain duplicates. In practice, this has never been an issue for me.
Another common case while working with collections is finding the elements that are in both collections. Since sets are functions (and can be used a predicates), finding common elements is as simple as the following clojure.

user=> (filter (set [1 2 3]) [2 3 4])
(2 3)

Similar to the clojure.set/difference example, if you have lists or vectors in and you want a seq out, you may want to stick to using filter. However, if you are already working with sets or you can easily convert to sets, you'll probably want to take a look at clojure.set/intersection.

Usage: (intersection s1)
(intersection s1 s2)
(intersection s1 s2 & sets)
Return a set that is the intersection of the input sets

To get results similar to the above example, simply call clojure.set/intersection in a similar way to the example below.

user=> (clojure.set/intersection #{1 2 3} #{2 3 4})
#{2 3}

In a codebase I was once working on I stumbled upon the following code, which inverts a map.

Similar to clojure.set/index, you'll want to take note of the result being a set and not a list, and just like clojure.set/index, this isn't something that ends up causing a problem in practice.
The rename and rename-keys functions of clojure.set are very similar, and they can both be helpful when you're passing around data-structures that are similar and simply require a few renames to play nicely with existing code.

Below are a few simple examples of how to get things done without rename and rename-keys.

The rename & rename-keys functions are very straightforward, and you can find their documentation and example usages below.

Usage: (rename xrel kmap)
Returns a rel of the maps in xrel with the keys in kmap renamed to the vals in kmap
Usage: (rename-keys map kmap)
Returns the map with the keys in kmap renamed to the vals in kmap

If you've gotten this far, I'll assume you already understand how to use filter. The clojure.set namespace has a function that's very similar to filter, but it returns a set. If you don't need a set, you're better off sticking with filter; however, if you're working with sets, you might save yourself a few keystrokes and microseconds by using clojure.set/select instead.

Below is a the documentation and an example.

Usage: (select pred xset)
Returns a set of the elements for which pred is true

user=> (clojure.set/select odd? #{1 2 3 4})
#{1 3}

The clojure.set/subset? and clojure.set/superset? functions are also functions that are straightforward to use, and probably don't benefit from an example of how to create the same results on your own. However, I will provide the docs and 2 brief examples of their usage.

Usage: (subset? set1 set2)
Is set1 a subset of set2?
Usage: (superset? set1 set2)
Is set1 a superset of set2?

The final function I will document is clojure.set/union. If you needed a list of the unique elements resulting from combining 2 or more lists, you could get the job done with a combination of concat, reduce, and/or set. The example below shows how to do things without using the set function or a set data-structure. note: Using a set would likely be both more efficient and more readable. This example is designed to show that you could do things without sets, but I do not recommend that you code in this way.

Truthfully, I don't tend to think about 'union' unless I'm already thinking about sets. In Clojure, clojure.set/union is defined to take multiple sets and return the union of each of those sets (as you'd expect).

Usage: (union)
(union s1)
(union s1 s2)
(union s1 s2 & sets)
Return a set that is the union of the input sets

Finally, the example below shows the union function in action.

user=> (clojure.set/union #{1 2} #{2 4 3 1})
#{1 2 3 4}

The clojure.set namespace does define one additional function, clojure.set/join. To be honest, I haven't used join in production and I don't believe that I'm writing my own inferior versions within my codebases. So, I don't have an example for you, but I do like the examples on clojuredocs.org and I would encourage you to go check them out: http://clojuredocs.org/clojure_core/1.2.0/clojure.set/join