Losing the plot

Have you ever read a “choose your own adventure” book? Nearly every page ends with a question like this:

If you fight the angry troll with your bare hands, turn to page 137.

If you try to reason with the troll, turn to page 29.

If you don your invisibility cloak, turn to page 6.

You’d pick one option, turn to the indicated page, and the story would
continue.

Did you ever try to read one of those books from front to back? It’s a
surreal experience. The story jumps forward and back in
time. Characters appear out of nowhere. One page you’re crushed by the
fist of an angry troll, and on the next you’re just entering the
troll’s realm for the first time.

What if each individual page was this kind of mish-mash? What if
every page read like this:

You exit the passageway into a large cavern. Unless you came from
page 59, in which case you fall down the sinkhole into a large
cavern. A huge troll, or possibly a badger (if you already visited
Queen Pelican), blocks your path. Unless you threw a button down the
wishing well on page 8, in which case there nothing blocking your
way. The [troll or badger or nothing at all] does not look happy to
see you.

If you came here from chapter 7 (the Pool of Time), go back to the
top of the page and read it again, only imagine you are watching the
events happen to someone else.

If you already received the invisibility cloak from the aged
lighthouse-keeper, and you want to use it now, go to page 67. Otherwise, forget you read anything about an invisibility cloak.

If you are facing a badger (see above), and you choose to run away,
turn to page 93…

Not the most compelling narrative, is it? The story asks you to carry
so much mental baggage for it that just getting through a page is
exhausting.

Code as narrative

What does this have to do with software? Well, code can tell a story
as well. It might not be a tale of high adventure and intrigue. But
it’s a story nonetheless; one about a problem that needed to be
solved, and the path the developer(s) chose to accomplish that task.

A single method is like a page in that story. And unfortunately, a lot
of methods are just as convoluted, equivical, and confusing as that
made-up page above.

In the following sections, we’ll take a look at some examples of code
that unnecessarily obscures the storyline of a method. We’ll also
explore some techniques for minimizing distractions and writing
methods that straightforwardly convey their intent.

Secure the borders

Here is some code that’s having some trouble sticking to the plot:

require'date'classEmployeeattr_accessor:nameattr_accessor:hire_datedefinitialize(name,hire_date)@name=name@hire_date=hire_dateenddefdue_for_tie_pin?raise"Missing hire date!"unlesshire_date((Date.today-hire_date)/365).to_i>=10enddefcovered_by_pension_plan?# TODO Someone in HR should probably check this logic((hire_date&&hire_date.year)||2000)<2000enddefbioifhire_date"#{name} has been a Yoyodyne employee since #{hire_date.year}"else"#{name} is a proud Yoyodyne employee"endendend

We can speculate about the history of this class. It looks like over
the course of development, three different developers discovered that
#hire_date might sometimes be nil. They each chose to handle this
fact in a slightly different way. The one who wrote
#due_for_tie_pin? added a check that raises an exception if the hire
date is missing. The developer responsible for
#covered_by_pension_plan substituted a (seemingly arbitrary) default
value for nil. And the writer of #bio went with an if statement
switching on the presence of #hire_date.

This class has serious problems with second-guessing itself. And the
root of all this insecurity is the fact that the #hire_date
attribute is unreliable—even though it’s clearly important to the operation of the class!

One of the purposes of a constructor is to establish an object’s
invariant: a set of properties which should always hold true for that
object. In this case, it really seems like one of those invariants should be: employee hire date is a Date.

But the constructor, whose job it is to stand guard against initial
values which are not compatible with the class invariant, has fallen
asleep on the job. As a result, every other method dealing with hire
dates is burdened with the additional responsibility of checking
whether the value is present.

This is an example of a class which needs to set some
boundaries. Since there is no obvious “right” way to handle a missing
hire date, it probably needs to simply insist on having a valid hire
date, thereby forcing the cause of these spurious nil values to be
discovered and sorted out. To do this, it should guard its own
integrity by checking the value wherever it is set, either in the
constructor or elsewhere:

require'date'classEmployeeattr_accessor:nameattr_reader:hire_datedefinitialize(name,hire_date)@name=nameself.hire_date=hire_dateenddefhire_date=(new_hire_date)raiseTypeError,"Invalid hire date"unlessnew_hire_date.is_a?(Date)@hire_date=new_hire_dateenddefdue_for_tie_pin?((Date.today-hire_date)/365).to_i>=10enddefcovered_by_pension_plan?hire_date.year<2000enddefbio"#{name} has been a Yoyodyne employee since #{hire_date.year}"endend

In this version, the hire_date attribute is protected by type check
in the setter method. Since the constructor now delegates to this
setter method to initialize the attribute, it is no longer possible to
construct new Employee objects without a valid hire date. Now that
the “borders” of the object are guarded, all the other methods can
focus on telling their own stories, without being distracted by
a potentially missing hire_date.

Be assertive

In the last section we saw how assertions in a class’ constructor or
setter methods can help keep the other methods focused. But
uncertainty and convoluted code can come from sources
other than input parameters.

Let’s say you’re working on some budget management software. The next
user story requires the application to pull in transaction data from a
third-party electronic banking API. According to the meager
documentation you can find, you need to use the
Bank#read_transactions method in order to load bank
transactions. The first thing you decide to do is to stash the loaded
transactions into a local data store.

Unfortunately the documentation doesn’t say what the
#read_transactions method returns. An Array seems likely. But what
if there are no transactions found? What if the account is not found?
Will it raise an exception, or perhaps return nil? Given enough time
you might be able to work it out by reading the API library’s source
code, but it’s pretty convoluted and you might still miss some edge
cases.

You decide to make an assumption… but as insurance, you document your assumption with an assertion.

classAccountdefrefresh_transactionstransactions=bank.read_transactions(account_number)transactions.is_a?(Array)orraiseTypeError,"transactions is not an Array"transactions.eachdo|transaction|# ...endendend

You manually test the code against a test account and it doesn’t blow
up, so it seems your suspicion was correct. Next, you move on to
pulling amount information out of the individual transactions.

You ask your teammate, who has had some experience with this API, what
format transactions are in. She says she thinks they are hashes with
string keys. You decide to tentatively try looking at the “amount”
key.

transactions.eachdo|transaction|amount=transaction["amount"]end

You look at this for a few seconds, and realize that if there is no
“amount” key, you’ll just get a nil back. Then you’d have to check
for the presence of nil everywhere the amount is used. You’d prefer
to document your assumption more explicitly. So instead, you make an
assertion by using the Hash#fetch method:

transactions.eachdo|transaction|amount=transaction.fetch("amount")end

Hash#fetch will raise a KeyError if the given key is not found,
signaling that one of your assumptions about the Bank API was
incorrect.

You make another trial run and you don’t get any exceptions, so you
proceed onward. Before you can store the value locally, you want to
make sure the transaction amount is in a format that your local
transaction store can understand. Nobody in the office seems to know
what format the amounts come in as. You know that many financial
system store dollar amounts as an integer number of cents, so you
decide to proceed with the assumption that it’s the same with this
system. In order to once again document your assumption, you make
another assertion:

transactions.eachdo|transaction|amount=transaction["amount"]amount.is_a?(Integer)orraiseTypeError,"amount not an Integer"end

You put the code through it’s paces and… BOOM. You get an error.

TypeError: amount not an Integer

You decide to drop into the debugger on the next round, and take a
look at the transaction values coming back from the API. You see this:

[{"amount"=>"1.23"},{"amount"=>"4.75"},{"amount"=>"8.97"}]

Well that’s… interesting. The amounts are reported as decimal strings.

You decide to convert them to integers, since that’s what
your internal Transaction class uses.

Once again, you find yourself questioning this code as soon as you
write it. You remember something about #to_f being really forgiving
in how it parses numbers. A little experimentation proves this to be
true.

"1.23".to_f# => 1.23"$1.23".to_f# => 0.0"a hojillion".to_f# => 0.0

Only having a small sample of demonstration values to go on, you’re
not confident that the amounts this API might return will always be in
a format that #to_f understands. What about negative numbers? Will
they be formatted as “-4.56”? Or as “(4.56)”? Having an unrecognized
amount format silently converted to zero could lead to nasty bugs down
the road.

Yet again, you want a way to state in no uncertain terms what kind of
values the code is prepared to deal with. This time, you use
Kernel#Float to assert that the amount is in a format Ruby can parse
unambiguously as a floating point number:

classAccountdefrefresh_transactionstransactions=bank.read_transactions(account_number)transactions.is_a?(Array)orraiseTypeError,"transactions is not an Array"transactions.eachdo|transaction|amount=transaction.fetch("amount")amount_cents=(Float(amount)*100).to_icache_transaction(:amount=>amount_cents)endendend

This code clearly states what it expects. It communicates a great deal
of information about your understanding of the external API at the
time you wrote it. It explicitly establishes the parameters within
which it can operate confidently; as soon as any of its expectations
are violated it fails quickly, with a meaningful exception message.

Unfortunately, as a result it is really telling two stories: one about
refreshing transactions (remember, that’s nominally what this method is
about) and one about the format of an external data source. This is
quickly mended, however:

classAccountdefrefresh_transactionsfetch_transactionsdo|transaction_attributes|cache_transaction(transaction_attributes)endend# Yields a hash of cleaned-up transaction attributes for each transactiondeffetch_transactionstransactions=bank.read_transactions(account_number)transactions.is_a?(Array)orraiseTypeError,"transactions is not an Array"transactions.eachdo|transaction|amount=transaction.fetch("amount")amount_cents=(Float(amount)*100).to_iyield(:amount=>amount_cents)endendend

By failing early rather than allowing misunderstood inputs to
contaminate the system, it reduces the need for type-checking and
coercion in other methods. And not only does this code document your
assumptions now, it also sets up an early-warning system should the
third-party API ever change unexpectedly in the future.

Represent special cases with objects

The most common causes of code that tells a confusing story are
special cases. Let’s look at an example of a special case, in the
context of our budgeting application.

You’ve implemented transaction import and it’s working great. Except
for one little problem: users have been reporting bugs about the
reported balances being off. And not just the balances; in fact, all
of the reports seem to have incorrect numbers for some accounts.

You do some investigation into the system logs, and eventually
discover the culprit. It turns out that some banks, when they receive
a authorization for a credit card charge, immediately report it as a
pending transaction in the transaction list. The data looks something
like this:

{"amount"=>"55.08","type"=>"pending","id"=>"98765"}

Then, when the charge is completed or “captured”, another transaction
is recorded:

{"amount"=>"55.08","type"=>"charge","id"=>"98765"}

Aside: if you’ve ever written code to deal with actual banking APIs,
you’ve probably figured out by now that I have not. I’m making this up for the
sake of example. I expect real banking APIs are just as idiosyncratic, though,
in their own ways.

The result of these “double entries” is that your calculations get
thrown off. Your application has routines for summing transactions,
averaging them, breaking them down by month and quarter, and many
more. And every one of these calculations uses the amount field to
arrive at its results.

You briefly consider simply throwing out pending transactions. But
after a quick consultation with your team you realize this would only
introduce more problems. There is sanity-checking code in place which
checks that the bank servers and the local cache have the same
transaction count and contain the same transaction IDs. And not only
that, you might actually want to use the pending transaction
information for upcoming features.

Your second option is to handle the special case… specially,
everywhere that the amount field is referenced. For example:

This switches on type as well, but it only does it once. After that,
the transaction can be used as-is in all of your existing algorithms.

You run some tests, and discover this fixes the problem! You consider
simply leaving the code as it is, since it’s working now. But on
reflection you decide that the concept of a pending transaction would
be best represented by a proper class. That way you have a place to
put documentation about this special case, as well as any more special
logic you realize you need down the road.

# A pending credit-card transactionclassPendingTransactionattr_reader:id,:pending_amountdefinitialize(attributes)@id=attributes.fetch(:id)@pending_amount=attributes.fetch(:amount)enddefamount# Pending transactions duplicate finished transactions, thus# throwing off calculations. For the purpose of calculations and# reports, a pending transaction always has a zero amount. The# real amount is available from #pending_amount.0endend

This code solves the immediate problem of a special type of
transaction, without duplicating logic for that special case all
throughout the codebase. But not only that, it is exemplary: it sets
a good example for code that follows. When, inevitably, another
special case transaction type turns up, whoever is tasked with dealing
with it will see this class and be guided towards representing the new
case as a distinct type of object.

Conclusion

Ruby is a language which values expressiveness over just about
everything else. It is optimized to help us programmers say exactly
what we mean, without any extraneous fluff, to both the computer and
to future readers of our code. This is what makes it so much fun to
code in.

When we allow our methods to become cluttered up with ifs and maybes
and provisos and digressions, we let go of that expressiveness. We
start to lose the clear, confident narrative voice. We force the
future maintainers of the code to navigate through a twisty path full
of logical forks in the road in order to understand the purpose of a
method. Reading and updating the code stops being fun.

My challenge to you is this: when you are writing a new method, keep a
clear idea in mind of the story you are trying to tell. When detours
and diversions start to show up along the way, figure out what you
need to do to restore the narrative, and do it. You might get rid of
repetitive data integrity checks by introducing preconditions in the
initializer of a method. Maybe you can surround an external API in
assertions that document your beliefs about it, rather than trying to
handle anything it throws at you. Or perhaps you can eliminate a
family of often-repeated conditionals by representing a special case
as a class in its own right.

However you do it, keep your focus on telling a straightforward
tale. Not only will the future readers of your code thank you for it,
but I think you’ll find that it makes your code more robust and easier
to maintain as well.

Practicing Ruby is proudly independent, open source, and advertising-free.This is a 100% reader-funded, reader-focused project that needs your support.