Introduction

I was recently tasked with building a web service which facilitated the creation of Bleacher Report’s Emoji Is Life story: a high-level summary of the 2016 NBA playoffs, as told through emoji.

The service was used during the story’s editorial phase to determine which emoji were being used most frequently when people tweeted during games’ key moments.

A separate service, using Twitter’s Streaming API, was responsible for ingesting tweets sent out during each playoff game – which met a number of qualifications like: @mentioned either/both teams, contained at least one emoji, etc. – and storing them in a database.

My service was then responsible for making that data accessible and digestible to editors, who used it to surface interesting “moments” in each game.

Overview

The service was built using Clojure, Docker, PostgreSQL and deployed to Heroku. The data was exposed via a REST API, built using Ring, which made it possible for editors to interactively explore the different resources: series, games, moments, etc. (e.g. 3700 people authored tweets using the 😳 emoji after Waiters elbowed Ginobili during the Thunder’s win over the Spurs in game 2 of the Western Conference semifinals)

Lessons Learned/Suspicions Confirmed

Docker

Docker and Docker Compose made standing-up the application for development and testing trivial. Once Docker/Compose had been installed and the Dockerfile/docker-compose.yml config files had been authored, everything just worked. New contributors could jump in and have the project running on their machine within minutes.

PostgreSQL/Korma

We used the Korma library to interface with PostgreSQL and it was a pleasure to work with. Korma is a DSL that translates Clojure code into SQL statements. It also does useful things like prevent SQL injection when inserting dynamic values into queries. Korma does require you to write more boilerplate than a more – ahem – active ORM would, but it provides more flexibility as a result.

Here’s the entity definition and series query functions from emoji-api.db:

One Korma feature I wish I’d known about while working on this project is set-naming, which allows you to define a top-level strategy for translating non-standard table/column names. The framework used to scaffold the emoji/moment database used capital letters for table names and camel case for column names; instead of having to be cognizant of these quirks when defining entities, we could have defined conversion strategies once in the defdb declaration and used standard Clojure naming conventions throughout:

(defentity series
(table :series)
(has-many game {:fk :series-id}))

Data Transformation

In order to make the data digestible for editors, the tweet result set for a given moment was run through a transformation function in order to transform a list of emoji IDs and team IDs into a map containing the top 10 emojis used in reference to each team and both teams.

While this transformation function worked well enough, it’s certainly not as efficient as it could have been. (I’ve yet to revisit the implementation, but I believe the entire transformation could be achieved in a single-pass.) I was also working under the (mistaken) impression that r/reduce was automatically parallelized, but that turns out not to be the case. r/fold, among other functions in the Reducers library, are automatically parallelized – when doing so is efficient. This post provides a nice, high-level overview of the Reducers library. Also, be sure to check out the official Reducers docs.

Because this data set wasn’t big, was accessed infrequently and the data transformations were snappy, I didn’t invest any time in caching the transformation results. However, that would have been trivial using either clojure.core/memoize or a more robust solution like core.memoize – which allows for pluggable caches as opposed to using system memory, like clojure.core/memoize does.

Testing

Midje states its aims as, “to encourage readable tests, to support a balance between abstraction and concreteness, and to be gracious in its treatment of the people who use it” and I think it does that all quite well.

I found its documentation and examples to be well written and wide-reaching. (The Midje wiki in Github has 96 pages!) The fact(s) structure is ergonomic, self-documenting and provides for nice separation of domain concepts. Its error messages are comprehensible and its “checkers” are very expressive. For instance, here’s an example of the facts for the reduce-emojis function: