From adhoc script to object-oriented program

Apr 10, 2013 • Gregory Brown

NOTE: This issue of Practicing Ruby was one of several content experiments
that was run in Volume 6. It uses a cookbook format (e.g. problem -> solution -> discussion)
instead of the traditional long-form article format we use in most Practicing Ruby articles.

Problem: An adhoc script has devolved into an unmaintainable mess

Imagine that you’re working on a shipping cost estimation program for a small
business that uses a courier service for regional deliveries. Part of the task
for building that tool would involve importing pricing information from
some data source, such as this CSV file:

06770,$12.00
06512,$14.00
06510,$15.30
06701,$12.15

A real dataset would be more complex, but this minimal example exposes the
information we’re interested in: what it costs to ship something from our
facility to somewhere else, based on the destination’s zip code.

Now suppose that we want to build a simple data store which will be updated
daily with the latest pricing information. We then could easily write a script
using a few of Ruby’s standard libraries (PStore, BigDecimal, and CSV),
which would normalize the data in a way that could be used by the user-facing
cost estimation program. If we could assume the source CSV data was validated
before we processed it, the program could be as simple as what you see below:

But in reality, most businesses environments do not make things like this easy
for you. You’d probably quickly discover that the source data could have
any number of problems with it, ranging from duplicate entries to inconsistently
formatted fields. Because this kind of data often originates from people who are
entering information into Excel by hand, they can even be littered with typos!

To help mitigate these issues somewhat, you need a combination of
sanity-checking validations and basic logging so that when something goes wrong
you know why it happened. After adding those features, your simple script might
collapse into the mess you see below:

Once your code ends up like this, it becomes increasingly difficult to
add new features or make any sort of change without breaking
something. Because this style of program is fairly difficult to test,
the maintenance problems can be made even worse by the fact that bugs may
end up not being discovered until long after they’re introduced.

Procedural scripts are great when you can throwaway the code once you’ve
completed your task, or for solving simple problems that you are reasonably
sure the requirements will never change for. For everything else,
more structure pays off in the long run. It’s clear that this program
is in the latter category, so how do we fix it?

Solution: Redesign the script as an object-oriented program

The thing that makes ad-hoc scripts complicated to reason about
as they grow is that they blend all their concerns together – both
logically and conceptually. For that reason, it is worthwhile to
start thinking in terms of functions and objects as soon as your
program exceeds more than a paragraph or two of code.

Imagine that the script portion of your importer tool was reduced
to the following code:

This brings us back to about the same level of detail expressed in the
naïve implementation of the importer script, albeit with a few custom classes
thrown into the mix. It hides a lot of detail
from the reader, but its core purpose is obvious: it iterates over a CSV file
to create a mapping of zipcodes to shipping rates in a datastore.

To see where the real work is being done, we need to look at the
PriceInformation and Importer class definitions. We’ll start by taking a
look at the former, because it has fewer moving parts to consider:

Here we see that PriceInformation applies the same validations and
transformations as shown in the script version of this program, but
encapsulates them in its constructor. This makes sure that a PriceInformation
object will either represent valid data or not be instantiated at all,
which makes it so that the main script does not need to concern itself
with these issues. Even if these validations or transformations become
more complex over time, the calling code should not need to change.

In a similar vein, the Importer class attempts to encapsulate the details
about some lower level concepts at a higher level of abstraction. It’s
functionality is a bit more involved than the PriceInformation class,
so take a few minutes to study it before moving on:

Despite the complexity of its implementation, this class presents a very minimal
user interface, consisting of only Importer.update and Importer#[]=. The
Importer.update method is responsible for instantiating a PStore object,
initiating a transaction, and then wrapping it in an Importer instance to
limit access to its internals. From there, the only method available to the user
is Importer#[]=, which wraps PStore#[]= with two important features:

Single-assignment semantics: once a key has been set to particular value, it
cannot be reset from within the same Importer instance. This is because we
want to raise an exception whenever we encounter duplicate keys in the data
we’re importing.

Update notifications: For debugging purposes, we want to know whether a
record is introducing a new key, or updating the value associated with
an old one. Rather than cluttering up this class with the particular log
messages associated with those events, we delegate to a ChangeLog helper
object, which is shown below:

With this last detail exposed, you’ve walked through the complete
object-oriented solution to this problem. It is much longer than the
script version, but also much more organized. Before we wrap things up,
let’s talk a bit more about the costs and benefits involved in introducing
more structure into your programs.

Discussion

The best thing about unstructured code is that nothing is hidden from view.
To understand a script, you start at the top of the file and read downwards,
mentally evaluating the state changes and iterators you encounter along the way.

Object-oriented programs are much more logically complex, because they
represent a network of collaborators rather than a linear set of instructions.
For example, whenever we make a call to Importer#[]=, messages are sent to the
ChangeLog helper object as well as to an instance of PStore, but these
details are not at all visible when you read the caller code. The more objects
that exist within a system, the more complex their interactions get, and so
it is not uncommon to end up with call graphs that are both wide and deep.

But when it comes to visibility, the strength of scripted solutions is also their
weakness, and the weakness of object-oriented programs is also their strength:

In an adhoc script, you cannot make simple decisions about your code
without considering the entire program. Even something as straightforward
as renaming a variable used for temporary storage must be carefully considered,
because everything exists within a single namespace; anything more involved
than that is simply inviting trouble unless you can keep the entire program
in your head at once.

In an object-oriented program, the walls erected between different objects give
you freedom to make sweeping changes to internal structures, as long as their
interfaces are preserved. You can even rewire entire subnetworks of functionality
from your programs, as long as you know what features depend on them. When
done well, the fact that you cannot keep an entire object-oriented program
in your head is not much of a concern, because the layered abstractions
make it so you don’t have to.

The real challenge involved in writing object-oriented programs is that they’ll
only be as useful as the mental model they represent. This is why it can
actually be helpful to start off with less structure (even none at all!), and
gradually work your way towards something more organized. After all,
there is nothing worse than an abstract solution in search of a concrete problem!

Practicing Ruby is proudly independent, open source, and advertising-free.This is a 100% reader-funded, reader-focused project that needs your support.