Problem starter. Party solver.

Nov 4th, 2014

ActiveRecord uniqueness validations cannot be used on their own – we must use database level constraints to guard against race conditions. Given that we are guarding against duplication at the database level, should we also introduce uniqueness constraints in ActiveRecord models, and if so, under what circumstances?

All of the examples use the locations table. The code column in this table has a unique index. Let’s start off by adding the following ActiveRecord validation to our model:

12

classLocation<ActiveRecord::Basevalidates_uniqueness_of:code

When we save a change to an existing record, we see something like the following:

Note the presence of Location Exists. This extra database call to ensure code is unique occurs before every update, even when code was not one of the columns updated! The performance for this extra database call is somewhat severe. 1000 updates, average of 10 runs produced:

3.104 seconds without ActiveRecord uniqueness validation

4.796 seconds with ActiveRecord uniqueness validation

For those keeping score, that’s 55% slower.

Let’s say we never change code in our application – we could change the validation to validates_uniqueness_of :code, on: :create so we only see the extra database call once during a record’s lifespan. We could also use ActiveModel::Dirty to run the validation only when code has changed with validates_uniqueness_of :code, if: :code_changed?.

We’ve demonstrated model level uniqueness validations can be expensive, so why bother when we can rescue from ActiveRecord::RecordNotUnique?

As Erik Michaels-Ober noted in his presentation at Baruco 2014, Writing Fast Ruby, using exceptions for control flow in Ruby is over 10x slower than if/else. Benchmarking 1000 failed creates, (also average of 10 runs) produced:

In addition to the 10% speed increase, the ActiveRecord validation also has the advantage of making it possible to see all validation errors at once in model.errors. A RecordNotUnique exception is only thrown when we hit the database so the user would never see a uniquness-related error if their inputs failed any other validation, leaving them with the frustrating experience of making corrections in two steps instead of one.

Aug 20th, 2014

I recently reviewed a pull request wherein a coworker initialized a Hash to return an array as a default value and appended items to the arrays returned by hash keys as the program ran. When appending items to a Ruby array, using the shovel operator is usually preferred over plus-equals. The shovel operator is orders of magnitude faster because it appends a value to an existing array whereas plus-equals creates a new array every time (explored in more detail in this post by Alec Jacobson). However, using the shovel operator to populate our hash led to some “interesting” results:

What the heck is going on here? When our hash can’t find a value, it does not simply return any array – it returns the array, the exact object we gave it when we called Hash.new with a default. When we use the shovel operator, we mutate that default object in place – hence we can alter it with any key not set with a new object.

Jun 23rd, 2014

The Erdős number describes the collaborative distance between an author of mathematical papers and Paul Erdős. The Bacon number, which measures an actor’s degress of separation from Kevin Bacon in cinema, is a well known varition on this theme. I recently came across another interesting variation – what is the shortest possible path between any two articles on Wikipedia?

Finding the shortest path between two nodes suggests a breadth-first search. Given any starting page and a target page, our search should look something like this:

Extract all links on the page

Check if any of these are the target page. If so, return the path from the starting page to the target page

Return to step 1, visiting all of the links found on the page (yay recursion!)

This gives us a few problems to solve:

Extracting all of the links from a Wikipedia page that point to other Wikipedia pages

Returning the sequence of articles that makes up a shortest path. Many paths could tie for shortest; we are satisfied with returning one of them

Keep track of which links we have visited to avoid visiting them a second time

The first problem is readily solved with yokogiri (Nokogiri for Clojure), a lovely HTML/XML parser library. A solution might look something like this, with the addition of error handling:

The second problem is more interesting – how do we ensure the shortest path is returned? Our goal is to visit all of the distance-1 links first, then all of the distance-2 links, etc. to ensure the shortest possible path is found. The distance-2 links are all created from distance-1 links, hence the only challenge here is to ensure we visit links in order of their depth. We can accomplish this with a queue, taking the next link to test from the front of the queue and adding new links discovered to the back of the queue. While you won’t find it on the cheatsheet, Clojure does include a queue that can be created with clojure.lang.PersistentQueue/EMPTY.

As links are extracted from the page, they are compared against the visited set to ensure we do not visit the same link twice. New links are added to the visited set and the queue as they are discovered.

Instead of enqueueing links, we enqueue lists of links to maintain knowledge of the path that got us to the next link to visit, which is the first link in each list. Hence, running (search "/wiki/Secondhand_Lions:_A_New_Musical" "/wiki/Kevin_Bacon") ultimately returns ("/wiki/Robert_Duvall" "/wiki/Secondhand_Lions" "/wiki/Secondhand_Lions:_A_New_Musical") giving Secondhand Lions: A New Musical a Wiki-Bacon number of 3.

If Clojure isn’t quite your speed and you’d prefer to see a Ruby implementation you can find it here.

May 20th, 2014

I recently encountered a situation that called for a work queue. The system seemed simple enough at the outset – only producers and consumers, with jobs prioritized by when they are enqueued. There is, however, a curious twist – the consumers are people doing work in a single page app. There are enough concurrent users to warrant a datastore with atomic operations.

Jobs cannot simply be dequeued when people start them because many jobs go unfinished for one reason or another. The behavior we want is a timeout – the person has some amount of time to complete the job after it was dequeued, else the system considers the job abandoned and must it back in the inbound queue.

Redis does not have primitives for this specific use case, but it turns out we don’t need them – we can run Lua scripts on our Redis server to get the exact behavior we need.

First, a script to reserve jobs:

12345678

-- arguments are passed in as strings; we need our score to be a numberlocaltime=tonumber(ARGV[1])localval=redis.call('zrange',KEYS[1],0,0)[1]ifvalthenredis.call('zadd',KEYS[2],time,val)redis.call('zremrangebyrank',KEYS[1],0,0)endreturnval

This takes jobs from our default queue and moves them to a reserved queue. Both queues are sorted sets – the time is passed in as a string (ARGV[1]), converted to a number and used as the score. Using a sorted set keeps the inbound queue ordered by the time when jobs were queued up and prevents duplicate jobs. We can also use the timestamp/score to determine which jobs in the reserved queue have timed out and move them back to the inbound queue:

This server needs a cron job or some other way to periodically call #clear to clear the reserved queue of timed-out jobs. A more fully baked implementation might also use SCRIPT LOAD to upload the script to our Redis server and then call it via EVALSHA instead of sending the entire script every time.

The tour of Lua on Learn X in Y Minutes and the examples on the Redis homepage are all you need to get started and extend the usefulness of Redis in meaningful way within a couple of hours. I recommend giving this a shot the next time you encounter a unique problem where Redis seems close but not quite right for your specific use case.

May 4th, 2014

One of the first challenges I faced in building Knod was establishing a good workflow. Using the REPL as a development tool is one of the great joys of using Ruby and languages in the LISP family. I tend to experiment with different things on the REPL when the domain isn’t well known and I want to spike on a solution – the exact circumstance faced at the outset of this project. However, the REPL falls short when building a server because, while a server will certainly run in a REPL, it provides no useful output. We ultimately need a client to connect, send a request, and ensure it receives an appropriate response. Furthermore, since the server will block while it is waiting for a connection, we cannot run the server and test it in the same REPL. When I started building Knod, my workflow was as follows:

Repeatedly stopping and starting the server and jumping back and forth between terminal sessions became tedious almost immediately so I started looking for ways to streamline my workflow. There were two problems to solve – reloading the server every time I made a change, then rerunning a series of requests to the server to see if the changes had the desired effect without breaking anything that was already working. I ultimately needed was a faster, more repeatable way to test the server’s behavior.

Ruby’s standard library includes almost everything one needs for behavior-driven development in the form of MiniTest::Spec. Its syntax will look immediately familiar to anyone used to RSpec:

123

it'should add numbers'do(2+2).must_equal4end

MiniTest does not provide a way to send HTTP requests out of the box. I initially tried using Net::HTTP to make requests. It looked something like this:

This is awful. Tons of setup for just a few tests. At this rate I would end up spending more time writing tests than on the actual server! I received an excellent piece of advice from my friend Eno Compton – create a wrapper class for Net::HTTP to cut down on boilerplate. Dan Knox provides a great rundown of creating such a wrapper class in this blog post. With a small wrapper class suited to my purposes, tests became much easier to read and write:

We could even take this a step further and delegate all of the HTTP verbs to our connection to almost exactly mimic the functionality of rspec-rails or minitest-rails:

1234567891011

describeKnod,'a tiny http server'doextendForwardablelet(:connection){Connection.new("http://0.0.0.0:4444")}def_delegators:connection,:get,:put,:post# ... all the verbsdescribe'PUT'do# after setup:it'writes to the local path'doputpath,dataFile.file?(path).must_equaltrueend

With the Net::HTTP wrapper class complete, I was halfway to a good BDD setup – I could quickly write useful tests, but still had to stop and start the server between each test run to pickup code changes. I revisited Jesse Storimer’s Working With Ruby Threads for inspiration:

I’ve been saying the GIL prevents parallel execution of Ruby code, but blocking IO is not Ruby code… MRI doesn’t let a thread hog the GIL when it hits blocking IO.

This means we can just start the server in its own thread before the test suite starts running. At the top of our test file:

The only issue with this implementation is that we are in trouble if another server is already running on the port specified when we start the server. The solution lies in an interesting propery of Ruby’s socket library: A new instance of TCPServer initialized with a port of 0 will automatically select an open port in the ephemeral range:

12

server=TCPServer.new('0.0.0.0',0)port=server.addr[1]

Setting up tests this way took a couple of hours that I could have spent on the server, but I think it was time well spent. Subsequent features became much easier to build with the improved workflow a test suite provided and I gained insight into how DSLs like rspec-rails work in the process.

Next up in this series: A post on higher-level learnings that came out of the gem-building process.

Apr 25th, 2014

While the lighting in the photo could use some work, you can still observe that I am:

In front of a room giving a presentation

Enjoying myself

I gave this presentation as part of an ongoing lunch and learn series at thredUP and it went better than I could have hoped. The crowd was very much engaged and I was still fielding follow up questions the next day. A 15 minute presentation in front of ~40 people is barely a blip on the radar for some of my peers, but for me, it is a milestone at least a decade in the making.

Until I was well into my 20’s, I stuttered quite severly. It was difficult, and at times impossible, to speak on the phone, order in a restaurant, even say my own name – let alone engage in public speaking. Stress is a trigger for stuttering so it tends to self perpetuate. Stuttering lends itself to traumatic experiences, which in turn can lead to a lot of fear and shame associated with speaking, which drives still more stuttering.

This cycle started pretty early in my case – when I was five, a nun at St. James Catholic Church in Pittsburgh decided physical violence was the most appropriate remedy for my speech impediment. I spoke only with great difficulty from that point forward. I spent my youth being bullied by peers, ridiculed by teachers, and frequently feeling unwelcome in my own family. Fortunately for my current self, I managed to persist despite often thinking my life was not one worth living.

Things took a turn for the better during my graduate studies at Georgia Tech when I finally took charge of my life. I cut out every negative influence, found a great speech therapist based on my own research, and most important of all, convinced myself the people who had treated me like shit for my entire life, not me, had the problem. Tim Mackesey planted the idea in my head that I could counteract the negative feedback cycle of stuttering with a positive one – I could take any good experience speaking, no matter how small, use it to gain confidence, then use that confidence to drive a series of ever-improving experiences. My first victory was ordering a drink in Starbucks without stuttering – I still remember it as one of the happiest moments I’ve ever experienced, and at age 23, one of the first times I ever felt really good about myself. A cup of coffee gave me the beachhead I needed to retake my life.

While life became easier after that breakthrough, two decades of harm were not simply undone overnight, nor did I expect them to be. In the intervening years I continued to improve my speech and grow as a person by viewing my situation as a perpetual journey, pacing myself accordingly, and limiting my traveling companions to people who believed in me as much as I believed in them. The pattern of discipline and consistent improvement I established when I took on stuttering carried over into every other part of my life, becoming the core of my identity. Now, when I reflect on the person I used to be I find I hardly know him. Yet I take no shame in him. He and I simply mark different points on a path whose destination is unknown, but whose direction I will continue to follow.

Apr 16th, 2014

While working through Facebook’s excellent React tutorial, I came across a common problem when developing and testing AJAX applications: Having some sort of rudimentary backend is necessary, or at the very least makes things easier. While options for http static servers are many, these only respond to GET requests and hence only get you part of the way there. The ideal solution would also respond to PUT, POST, and DELETE and would work with almost zero configuration.

People build quick prototypes often enough that I assumed a tool must exist for this very purpose. When I could not find such a tool, I built it. Knod, a tiny HTTP server for front-end development, was released on April 12 on RubyGems. The code is available here on GitHub. I limited myself to using the Ruby standard library and learned a lot in doing so. Subsequent posts in this series will detail what I learned about the request/response cycle, writing server tests, and turning a library into a useful command line tool.

Dec 5th, 2013

4Clojure recently introduced me to Clojure’s curious juxt function. From the official docs:

Takes a set of functions and returns a function that is the juxtaposition
of those functions.

Makes a bit more sense once you see it in practice:

12

((juxt+ max min)235164); => [21 6 1]

This alone is pretty neat, but it gets better. Jay Fields observed that juxt has other interesting applications:

12

((juxtfilter remove)even?[124356]); => [(2 4 6) (1 3 5)]

Implementing juxtapose in Ruby sounds like my idea of a good time. How might we go about this? We immediately run into a problem if we want to mimic the interface of Clojure’s juxt: The first example takes advantage of variadic functions that lack Ruby equivalents. Examining one of these functions, however, gives us a pretty strong hint:

12

(+ 1234); => 10

This looks like a reduce function! Reduce is the most universal of functional programming’s three cornerstone functions (map, reduce and filter), so some sort of reduce-based solution should provide the desired results. In our first example, we pass three functions and get three results back, a pretty strong hint we should map our list of functions over the reduce operation. Our first implementation:

Jackpot! This solution works and we even maintained the variadic interface from Clojure. The downside is that this solution requires us to recreate most of the functionality provided by Enumerable and Array. While I rarely turn down the opportunity to write a stabby lambda, let’s assume we want to avoid this for the sake of argument. We can send symbols to our list of arguments if we are willing to adapt the interface. One way to go about it:

I find this diversion so interesting because I only considered doing things like this in Ruby after I discovered Clojure. There is much to be learned about Ruby by exploring functional programming langugages.

Nov 17th, 2013

Generating methods based on the content of arrays, hashes, and other Enumerable things is a powerful metaprogramming technique in Ruby. To keep things relatively simple, let’s use an example problem from Katrina Owen’s fantastic site Exercism:

Write a program that, given an age in seconds, calculates how old someone is in terms of a given planet’s solar years.

We know the length of an Earth year in seconds and the length of every other planet’s orbital period in terms of earth years. Here is an implementation in Ruby:

We use metaprogramming in lines 22 through 26 to generate methods for ages on every planet other than Earth based on our ORBITAL_PERIODS hash. This will make it super easy to change this class when we are done with Earth and want to define everything in terms of Martian years.

Writing the Clojure equivalent of this implementation proved a bit more difficult than expected. Let’s set up the Clojure equivalent and work through the metaprogramming piece:

How might we generate functions from our orbinal-periods hashmap? A list comprehension with for feels pretty close to the mark, but this cannot work because it yields a lazy sequence. We need to execute the contents of this sequence to get the functions we are creating into our namespace. Clojure’s’ doseq macro is purpose built for this use case. Now we have the start of our solution:

123

(doseq [[planetperiod]orbital-periods];somehow make functions)

I had a bit of trouble wrapping my mind around this part of the problem because I taught myself Clojure with resources placing a heavy emphasis on lazy evaluation and side-effect free functions. This case runs totally counter to that, executing a sequnce specifically for its side effects, which happen to be producing pure functions.

Now that we know something will execute, we must determine what to execute to generate a function from a key in the orbital-periods hashmap. My first instict was to try something like this:

1

(defn (string"on-"planet)[seconds];do stuff)

This fails because the first argument to def or defn must be a symbol at readtime. intern solves this problem by finding or creating a var by the supplied symbol at runtime. From there, it is as easy as building the function we want bound to that var:

Oct 5th, 2013

Duplication causes all of the same maintainability issues in test suites it does in production code, but I often see the DRY principle violated in the name of comprehensive test coverage and fidelity.

Let’s say we have a Rectangle class that normalizes a range of inputs and we want to test all of them. How can we accomplish this without:

Writing an individual test for each input (duplication)?

Writing one test with multiple assertions (lost fidelity)?

Consider this approach:

123456789

subject{Rectangle}formats={comma_delimited_string:'100, 100, 500, 500',array_with_strings:["100","100","500","500"],array_with_ints:[100,100,500,500]}formats.eachdo|(format_name,data)|it"accepts and normalizes data passed in as a #{format_name}"dosubject.create(data).coordinates.should==formats[:array_with_intsendend

String interpolation in the test description provides nearly the fidelity of breaking out assertions into individual tests. Neat!