Leveraging Ruby's Standard Library: Appendix B - Ruby Best Practices

Most of this book has emphasized that understanding how to use tools
effectively is just as important as having a nice collection of tools.
However, that’s not to say that knowing where to find the right tool for the
job isn’t an invaluable skill to have. In this appendix, we’ll take a look
at a small sampling of Ruby’s vast standard library. What you will find is
that it is essentially a treasure chest of goodies designed to make your
Ruby programs more enjoyable to write.

This excerpt is from Ruby Best Practices.
Written by the developer of the Ruby project Prawn (prawn.majesticseacreature.com), this concise book explains how to design beautiful APIs and domain-specific languages, work with functional programming ideas and techniques that can simplify your code and make you more productive, write code that's readable and expressive, and much more. It's the perfect companion to The Ruby Programming Language.

Why Do We Need a Standard Library?

Because of RubyGems, we tend to leverage a lot of third-party
software. For this reason, we are often more likely to resort to a Google
search instead of a search of Ruby’s API documentation when we want to
solve a problem that isn’t immediately handled in core Ruby. This isn’t
necessarily a bad thing, but it is important not to overlook the benefits
that come with using a standard library when it is available. When all
else is equal, the gains you’ll get from using standard Ruby are easy to
enumerate:

Ruby standard libraries are typically distributed with Ruby
itself, which means that no extra software needs to be installed to
make them work.

Standard libraries don’t change rapidly. Their APIs tend to be
stable and mature, and will likely outlast your application’s
development cycle. This removes the need for frequent compatibility
updates that you might experience with third-party software.

Except for a few obvious exceptions, Ruby standard libraries are
guaranteed to run anywhere Ruby runs, avoiding platform-specific
issues.

Using standard libraries improves the understandability of your
code, as they are available to everyone who uses Ruby. For open source
projects, this might make contributions to your project easier, for
the same reason.

These reasons are compelling enough to encourage us to check Ruby’s
standard library before doing a Google search for third-party libraries.
However, it might be more convincing
if you have some practical examples of what can be accomplished without
resorting to dependency code.

I’ve handpicked 10 of the libraries I use day in and day out. This
isn’t necessarily meant to be a “best of” sampling, nor is it meant to
point out the top 10 libraries you need to know about. We’ve implicitly
and explicitly covered many standard libraries throughout the book, and
some of those may be more essential than what you’ll see here. However,
I’m fairly certain that after reading this appendix, you’ll find at least
a few useful tricks in it, and you’ll also get a clear picture of how
diverse Ruby’s standard library is.

Be sure to keep in mind that while we’re looking at 10 examples
here, there are more than 100 standard libraries packaged with Ruby, about
half of which are considered mature. These vary in complexity from simple
tools to solve a single task to full-fledged frameworks. Even though you
certainly won’t need to be familiar with every last package and what it
does, it’s important to be aware of the fact that what we’re about to
discuss is just the tip of the iceberg.

Now, on to the fun. I’ve included this appendix because I think it
embodies a big part of the joy of Ruby programming to me. I hope that when
reading through the examples, you feel the same way.

Pretty-Printer for Ruby Objects (pp)

As I mentioned before, Ruby standard libraries run the gamut from
extreme simplicity to deep complexity. I figured we’d kick things off with
something in the former category.

If you’re reading this book, you’ve certainly made use of Kernel#p during debugging. This handy method,
which calls #inspect on an object and
then prints out its result, is invaluable for basic debugging needs.
However, reading its output for even relatively modest objects can be
daunting:

We don’t typically write our code this way, because the structure of
our objects actually means something to us. Luckily, the
pp standard library understands this and provides
much nicer human-readable output. The changes to use pp instead of p are fairly simple:

As you can see here, pretty_print
takes an argument, which is an instance of the current pp object. Because pp inherits from PrettyPrint, a class provided by Ruby’s
prettyprint standard library, it
provides a whole host of formatting helpers for indenting, grouping, and wrapping structured
data output. We’ve stuck with the raw text() call here, but it’s worth mentioning that
there is a lot more available to you if you need it.

A benefit of indirectly displaying your output through a printer
object is that it allows pp to give you an
inspect-like method that returns a string. Try person.pretty_print_inspect
to see how this works. The string represents exactly what would be printed
to the console, just like obj.inspect
would. If you wish to use pretty_print_inspect as your default inspect
method (and therefore make p and pp
work the same), you can do so easily with an alias:

Generally speaking, pp does a
pretty good job of rendering the debugging output for even relatively
complex objects, so you may not need to customize its behavior often.
However, if you do have a need for specialized output, you’ll find that
the pretty_print hook provides
something that is actually quite a bit more powerful than Ruby’s default
inspect hook, and that can really come
in handy for certain needs.

Working with HTTP and FTP (open-uri)

Like most other modern programming languages, Ruby ships with
libraries for working with some of the most common network protocols,
including FTP and HTTP. However, the Net::FTP and Net::HTTP libraries are designed primarily for
heavy lifting at the low level. They are great for this purpose, but they
leave something to be desired for when all that is needed is to grab a
remote file or do some basic web scraping. This is where open-uri shines.

The way open-uri works is by
patching Kernel#open to accept URIs.
This means we can directly open remote files and work with them. For
example, here’s how we’d print out Ruby’s license using open-uri:

require "open-uri"
puts open("http://www.ruby-lang.org/en/LICENSE.txt").read #=>
"Ruby is copyrighted free software by Yukihiro Matsumoto <matz@netlab.co.jp>.
You can redistribute it and/or modify it under either the terms of the GPL
(see COPYING.txt file), or the conditions below: ..."

If we encounter an HTTP error, an OpenURI::HTTPError will be raised, including the
relevant error code:

>> open("http://majesticseacreature.com/a_totally_missing_document")
OpenURI::HTTPError: 404 Not Found
from /usr/local/lib/ruby/1.8/open-uri.rb:287:in 'open_http'
from /usr/local/lib/ruby/1.8/open-uri.rb:626:in 'buffer_open'
from /usr/local/lib/ruby/1.8/open-uri.rb:164:in 'open_loop'
from /usr/local/lib/ruby/1.8/open-uri.rb:162:in 'catch'
from /usr/local/lib/ruby/1.8/open-uri.rb:162:in 'open_loop'
from /usr/local/lib/ruby/1.8/open-uri.rb:132:in 'open_uri'
from /usr/local/lib/ruby/1.8/open-uri.rb:528:in 'open'
from /usr/local/lib/ruby/1.8/open-uri.rb:30:in 'open'
from (irb):10
from /usr/local/lib/ruby/1.8/uri/generic.rb:250
>> open("http://prism.library.cornell.edu/control/authBasic/authTest/")
OpenURI::HTTPError: 401 Authorization Required
from /usr/local/lib/ruby/1.8/open-uri.rb:287:in 'open_http'
from /usr/local/lib/ruby/1.8/open-uri.rb:626:in 'buffer_open'
from /usr/local/lib/ruby/1.8/open-uri.rb:164:in 'open_loop'
from /usr/local/lib/ruby/1.8/open-uri.rb:162:in 'catch'
from /usr/local/lib/ruby/1.8/open-uri.rb:162:in 'open_loop'
from /usr/local/lib/ruby/1.8/open-uri.rb:132:in 'open_uri'
from /usr/local/lib/ruby/1.8/open-uri.rb:528:in 'open'
from /usr/local/lib/ruby/1.8/open-uri.rb:30:in 'open'
from (irb):7
from /usr/local/lib/ruby/1.8/uri/generic.rb:250

The previous example was a small hint about another feature of
open-uri, HTTP basic authentication. Notice what
happens when we provide a username and password accessing the same URI:

Success! You can see here that open-uri
represents the returned file as a StringIO object,
which is why we can call read to get
its contents. Of course, we can use most other I/O operations as well, but I won’t get into
that here.

As I mentioned before, open-uri also wraps
Net::FTP, so you could even do
something like download Ruby with it:

Here we see that even though the object returned by open() is a StringIO object, it includes some extra
metadata, such as the base_uri of your
request. These helpers are provided by the OpenURI::Meta module, and are worth looking over
if you need to get more than just the contents of a file back.

Although there are some advanced features to open-uri, it is most useful for the simple cases
shown here. Because it returns a StringIO object, this means that any fairly
flexible interface can be extended to support remote file downloads. For a
practical example, we can take a look at Prawn’s image embedding, which
assumes only that an object you pass to it must respond to #read:

Prawn::Document.generate("remote_images.pdf") do
image open("http://prawn.majesticseacreature.com/media/prawn_logo.png")
end

This feature was accidentally enabled when we allowed the image() method to accept
Tempfile objects. Because open-uri
smoothly integrates with the rest of Ruby, you might find situations where
it can come in handy in a similar way in your own applications.

Working with Dates and Times (date)

Core Ruby has a Time class, but
we will encounter a lot of situations where we also need to work with
dates, or combinations of dates and times. Ruby’s
date standard library gives us Date and DateTime, and extends Time with conversion methods for each of them.
This library comes packed with a powerful parser that can handle all sorts
of date formats, and a solid date formatting engine to output data based
on a template. Here are just a couple of trivial examples to give you a
sense of its flexibility:

Here we’ve just scratched the surface, but in the interest of
keeping a quick pace, we’ll dive right into an example. So far, we’ve been
looking at Date, but now we’re going to
work with DateTime. The two are
basically the same, except that the latter can hold time values as
well:[19]

This means that if we look a little later in the day, as you can see
that in this particular example, although eating snow is a short-lived
experience, the passion for wearing a special suit carries on:

As it turns out, implementing the Scheduler class is pretty straightforward,
because DateTime objects can be used as
endpoints in a Ruby range object. So when we look at these two events,
what we’re really doing is something similar to this:

Here we see that each event is simply a tuple consisting of two
elements: a datetime Range, and a
message. We parse the strings on the fly using DateTime.parse. This method should typically be used with
caution, as it is much more reliable to use Date.strptime, and much faster to construct
a DateTime manually than it is to
attempt to guess the date format. That having been said, there is no
substitute when you cannot rely on a standardized date format, and it does
a good job of providing a flexible interface when one is needed.

As this fairly pedestrian code completely covers storing events,
what remains to be shown is how they are selectively retrieved and
displayed. We’ll start with the helper method that looks up what events
are going on at a particular time:

def events_at(datetime)
@events.each_with_object([]) do |event, matched|
matched << event if event.first.cover?(datetime)
end
end

Here, we build up our list of matching events by simply iterating
over the event list and including those only those events in which the
datetime Range covers the time in
question. This, along with the
self-explanatory time_abbrev code, is
used to keep display_events_at nice and clean:

Here, we’re doing little more than parsing the date and time passed
in as a string to get us a DateTime
object, and then displaying the results of events_at. We take advantage of strftime for that, and recover the endpoints of
the range to include in our output, to show exactly when an event starts
and stops. There’s really not much more to it.

Although this example is obviously a bit oversimplified, you’ll find
that similar problems crop up again and again. The key thing to remember
is to take advantage of the ability of DateTime objects to be used within ranges, and
whenever possible, to avoid parsing dates yourself. If you need finer
granularity, use strptime(), but for
many needs parse() will do the trick
while providing a more flexible interface to your users.

We’ve covered some of the most common uses of Ruby’s standard
date library here, but there are of course plenty of
other features for the edge case. As with the other topics in this
appendix, hit up the API documentation if you need to know more.

Lexical Parsing with Regular Expressions (strscan)

Although Ruby’s String object provides many
powerful features that rely on regular expressions, it can be cumbersome
to build any sort of parser with them. Most operations that you can do
directly on strings work on the whole string at once, providing MatchData that can be used
to index into the original content. This is great when a single pattern
fits the bill, but when you want to consume some text in chunks, switching
up strategies as needed along the way, things get a little more hairy.
This is where the strscan library comes in.

When you require strscan, it provides a class
called StringScanner. The underlying
purpose of using this object is that it keeps track of where you are in
the string as you consume parts of it via regex patterns. Just to clear up
what this means, we can take a look at the example used in the
RDoc:

From this simple example, it’s clear to see that the index is
advanced only when a match is made. Once the end of the string is reached,
there is nothing left to match. Although this may seem a little simplistic
at first, it forms the essence of what StringScanner does for us. We can see that by
looking at how it is used in the context of something a little more
real.

We’re about to look at how to parse JSON (JavaScript Object
Notation), but the example we’ll use is primarily for educational
purposes, as it demonstrates an elegant use of
StringScanner. If you have a real need for this
functionality, be sure to look at the json standard
library that ships with Ruby, as that is designed to provide the kind of
speed and robustness you’ll need in production.

In Ruby Quiz #155, James Gray builds up a JSON parser by
hand-rolling a recursive descent parser using StringScanner. He actually covers the full
solution in depth on the Ruby Quiz website, but this
abridged version focuses specifically on his use of StringScanner. To keep things simple, we’ll
discuss roughly how he manages to get this small set of assertions to
pass:

Essentially, a StringScanner
object is built up using the original JSON string. Then, the parser
recursively walks down through the structure and parses the data types it
encounters. Once the parsing completes, we expect that we’ll be at the end
of the string, otherwise some data was left unparsed, indicating
corruption.

Looking at the way parse_value is
implemented, we see the benefit of using StringScanner. Before an actual value is
parsed, whitespace is trimmed on both ends using the trim_space helper. This is exactly as simple as
you might expect it to be:

def trim_space
@input.scan(/\s+/)
end

Of course, to make things a little more interesting, and to continue
our job, we need to peel back the covers on parse_array:

The beauty of JSON (and this particular parsing solution) is that
it’s very easy to see what’s going on. On a successful parse, this code
takes three simple steps. First, it detects the opening [, indicating the start of a JSON array. If it
finds that, it creates a Ruby array to populate. Then, the second step is
to parse out each value, separated by commas and optional whitespace. To
do this, the parser simply calls parse_value again, taking advantage of recursion
as we mentioned before. Finally, the third step is to seek a closing
], which, when found, ends this stage
of parsing and returns a Ruby array wrapped in the AST struct this parser
uses.

Going back to our three assertions, we can trace them one by one.
The first one was meant to test parsing an empty array:

assert_equal(Array.new, @parser.parse(%Q{[]}))

This one is the most simple to trace, predictably. When parse_value is called to capture the contents in
the array, it will error out, because no JSON objects start with ]. James is using a clever trick that banks on a
failed parse, because that allows him to short-circuit processing the
contents. This error is swallowed, leaving the contents empty. The string
is then scanned for the closing ],
which is found, and an AST-wrapped empty Ruby array is returned.

Here, we need to rely on parse_value’s ability to parse strings, numbers,
and booleans. All three of these are done using techniques similar to
those shown so far, but a string is a little hairy due to some tricky edge
cases. However, to give you a few extra samples, we can take a look at the
other two:

def parse_number
@input.scan(/-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?\b/) and
AST.new(eval(@input.matched))
end
def parse_keyword
@input.scan(/\b(?:true|false|null)\b/) and
AST.new(eval(@input.matched.sub("null", "nil")))
end

In both cases, James takes advantage of the similarities between
Ruby and JSON when it comes to numbers and keywords, and essentially just
evals the results after a bit of massaging. The numeric
pattern is a little hairy, and you don’t necessarily need to understand
it. Instead, the interesting thing to note about these two examples is
their use of StringScanner#matched. As
the name suggests, this method returns the actual string that was just
matched by the pattern. This is a common way to extract values while
conditionally scanning for matches.

This pretty much wraps up the interesting bits about getting the
second assertion to pass. Here, the parser just keeps attempting to pull
off new values if it can, while the array code wipes out any intermediate
commas. Once the values are exhausted, the ] is then searched for, as before.

The third and final case for array parsing may initially seem
complicated:

assert_equal([1, [2, [3]]], @parser.parse(%Q{[1, [2, [3]]]}))

However, if you recall that the way parse_array works is to repeatedly call parse_value until all its elements are consumed,
it’s clear what is going on. Because an array can be parsed by parse_value just the same as any other job, the
nested arrays have no trouble repeating the same process to find their
elements, which can also be arrays. At some point, this process bottoms
out, and the whole structure is built up. That means that we actually get
to pass this third assertion for free, as the
implementation already uses recursive calls through parse_value.

Although this doesn’t cover 100% of how James’s parser works, it
gives you a good sense of when StringScanner might be a good tool to have
around. You can see how powerful it is to keep a single reference to a
StringScanner and use it in a number of
different methods to consume a string part by part. This allows better
decomposition of your program, and simplifies the code by removing some of
the low-level plumbing from the equation. So next time you want to do
regular expression processing on a string chunk by chunk rather than all
at once, you might want to give StringScanner a try.

Cryptographic Hash Functions (digest)

Though it might not be something we do every day, having easy access
to the common cryptographic hash functions can be handy for all sorts of
things. The digest standard library provides several
options, including MD5, SHA1, and SHA2. We’ll cover three simple use cases
here: calculating the checksum of a file, uniquely hashing files based on
their content, and encrypted password storage.

I won’t get into the details about the differences between various
hashing algorithms or their limitations. Though they all have a potential
risk for what is known as a collision, where two
distinct content keys are hashed to the same value, this is rare enough to
not need to worry about in most practical scenarios. Of course, if you’re
new to encryption in general, you will want to read up on these techniques
elsewhere before attempting to use them for anything nontrivial. Assuming
that you accept this responsibility, we can move on to see how these
hashing functions can be used in your Ruby applications.

We’ll start with checksums, because these are pretty easy to find in
the wild. If you’ve downloaded open source software before, you’ve
probably seen MD5 or SHA256 hashes before. I’ll be honest: most of the
time I just ignore these, but they do come in handy when you want to
verify that an automated download completed correctly. They’re also useful
if you have a tendency toward paranoia and want to be sure that the file
you are receiving is really what you think it is. Using the Ruby 1.9.1
release notes themselves as an example, we can see what a digitally signed
file download looks like:

As both of these match the release notes, we can be reasonably sure
that nothing nefarious is going on, and also that our file integrity has
been preserved. That’s the most common use of this form of hashing.

Of course, in addition to identifying a particular file uniquely,
cryptographic hashes allow us to identify the uniqueness of a file’s
content. If you’ve used the revision control system git, you may have noticed that the revisions are
actually identified by SHA1 hashes that describe the changesets. We can do
similar things in our Ruby applications.

For example, in Prawn, we support embedding images in PDF documents.
Because these images can be from any number of sources ranging from a temp
file to a directly downloaded image from the Web, we cannot rely on unique
filenames mapping to unique images. Processing images can be pretty
costly, especially when we do things like split out alpha channels for
PNGs, so we want to avoid reprocessing images when we can avoid it. The
solution to this problem is simple: we use SHA1 to generate a
hexdigest for the image content and then use that as a
key into a hash. A rough approximation of what we’re doing looks like
this:

This technique clearly isn’t limited to PDF generation. To name just
a couple of other use cases, I use a similar hashing technique to make
sure the content of my blog has changed before reuploading the static
files it generates, so it uploads only the files it needs. I’ve also seen
this used in the context of web applications to prevent identical content
from being copied again and again to new files. Fundamentally, these ideas
are nearly identical to the previous code sample, so I won’t illustrate
them explicitly.

However, while we’re on the topic of web applications, we can work
our way into our last example: secure password storage.

It should be pretty obvious that even if we restrict access to our
databases, we should not store passwords in clear text. We have a
responsibility to offer users a reasonable amount of privacy, and through
cryptographic hashing, even administrators can be kept in the dark about
what individual users’ passwords actually are. Using the techniques
already shown, we get most of the way to a solution.

The following example is from an ActiveRecord model as part of a
Rails application, but it is fairly easily adaptable to any system in
which the user information is remotely stored outside of the application
itself. Regardless of whether you are familiar with ActiveRecord, the code
should be fairly straightforward to follow with a little explanation.
Everything except the relevant authentication code has been omitted, to
keep things well focused:

Here we see two functions: one for setting an individual user’s
password, and another for authenticating and looking up a user by username
and password. We’ll start with setting the password, as this is the most
crucial part:

Here we see that the password is hashed using Digest::SHA256, in a similar fashion to our
earlier examples. However, this password isn’t directly hashed, but
instead, is combined with a salt to make it more
difficult to guess. This technique has been shown in many Ruby cookbooks
and tutorials, so you may have encountered it before. Essentially, what
you are seeing here is that for each user in our database, we generate a
random six-byte sequence and then pack it into a base64-encoded string,
which gets appended to the password before it is hashed. This makes
several common attacks much harder to execute, at a minimal complexity
cost to us.

An important thing to notice is that what we store is the
fingerprint of the password after it has been salted rather than the
password itself, which means that we never store the original content and
it cannot be recovered. So although we can tell whether a given password
matches this fingerprint, the original password cannot be retrieved from
the data we are storing.

If this more or less makes sense to you, the authenticate method will be easy to follow
now:

Here, we first retrieve the user from the database. Assuming that
the username is valid, we then look up the salt and add
it to our bare password string. Because we never stored the actual
password, but only its salted hash, we call hexdigest again and compare the hash to the one
stored in the database. If they match, we return our user object and all
is well; if they don’t, an error is raised. This completes the cycle of
secure password storage and authentication and demonstrates the role that
cryptographic hashes play in it.

With that, we’ve probably talked enough about digest for now. There are some more advanced
features available, but as long as you know that Digest::MD5, Digest::SHA1 and Digest::SHA256 exist and how to call hexdigest() on each of them, you have all you’ll
need to know for most occasions. Hopefully, the examples here have
illustrated some of the common use cases, and helped you think of your own
in the process.

Mathematical Ruby Scripts (mathn)

The mathn standard library, when combined with
the core Math module, serves to make
mathematical operations more pleasant in Ruby. The main purpose of
mathn is to pull in other standard libraries and
integrate them with the rest of Ruby’s numeric system. You’ll notice this
right away when doing basic arithmetic:

As you can see, integer division gives way when
mathn is loaded, in favor of returning Rational objects. These behave like the
fractions you learned in grade school, and keep values in exact terms
rather than expressing them as floats where possible. Numbers also
gracefully extend into the Complex field, without
error. Although this sort of behavior might seem unnecessary for
day-to-day programming needs, it can be very helpful for mathematical
applications.

In addition to changing the way basic arithmetic works,
mathn pulls in a few of the higher-level mathematical
constructs. For those interested in enumerating prime numbers (for
whatever fun reason you might have in mind), a class is provided. To give
you a peek at how it works, we can do things like ask for the first 10
primes or how many primes exist up to certain numbers:

These classes can do all sorts of useful linear algebra functions,
but I don’t want to overwhelm the casual Rubyist with mathematical
details. Instead, we’ll look at a practical use of them and leave the
theory as a homework assignment. Consider the simple drawing in Figure B.1, “A pair of triangles with a mathematical relationship”.

Figure B.1. A pair of triangles with a mathematical relationship

We see two rather exciting triangles, nuzzling up against each other
at a single point. As it turns out, the smaller triangle is nothing more
than a clone of the larger one, reflected, rotated, and scaled to fit.
Here’s the code that makes all that happen:[20]

You don’t need to worry about the graphics-drawing code. Instead,
focus on the use of Matrix manipulations here, and
watch what happens to the points in each step. We start off with our
initial triangle’s coordinates, as such:

Notice here that the x values are inverted,
while the y value is left untouched. This is what
translates our points from the right side to the left. Our next task uses
a bit of trigonometry to rotate our triangle to lie flat along the
x-axis. Notice here that we use the arctan of 1/2,
because the bottom-edge triangle on the right rises halfway toward the
upper boundary before terminating. If you aren’t familiar with how this
calculation works, don’t worry—just observe its results:

The numbers got a bit ugly after this calculation, but there is a
key observation to make here. The triangle’s dimensions were preserved,
but two of the points now lie on the x-axis. This
means our rotation was successful.

Finally, we do scalar multiplication to drop the whole triangle down
to half its original size:

This completes the transformation and shows how the little triangle
was developed simply by manipulating the larger one. Although this is
certainly a bit of an abstract example, it hopefully serves as sufficient
motivation for learning a bit more about matrixes. Although they can certainly be
used for more hardcore calculations, simple linear transformations such as
the ones shown in this example come cheap and easy and demonstrate an
effective way to do some interesting graphics work.

Although truly hardcore math might be better suited for a more
special-purpose language, Ruby is surprisingly full-featured enough to
write interesting math programs with. As this particular topic can run far
deeper than I have time to discuss, I will leave further investigation to
the interested reader. The key thing to remember is that
mathn puts Ruby in a sort of “math mode” by including
some of the most helpful standard libraries and modifying the way that
Ruby does its basic arithmetic. This feature is so useful that
irb includes a special switch -m, which essentially requires
mathn and then includes the Math module in at the top level.

A small caveat to keep in mind when working with
mathn is that it is fairly aggressive about the
changes it makes. If you are building a Ruby library, you may want to be a
bit more conservative and use the individual packages it enables one by
one rather than having to deal with the consequences of potentially
breaking code that relies on behaviors such as integer division.

All that having been said, if you’re working on your math homework,
or building a specialized mathematical application in Ruby, feel free to
go wild with all that mathn has to offer.

Working with Tabular Data (csv)

If you need to represent a data table in plain-text format, CSV
(comma-separated value) files are about as simple as you can get. These
files can easily be processed by almost any programming language, and Ruby
is no exception. The csv standard library is fast for
pure Ruby, internationalized, and downright pleasant to work
with.

In the most simple cases, it’d be hard to make things easier. For
example, say you had a CSV file (payments.csv) that
looked like this:

CSV::Row is a sort of hash/array
hybrid. The primary feature that distinguishes it from a hash is that it
allows for duplicate field names. Here’s an example of how that works.
Given a simple file with nonunique column names like this
(phone_numbers.csv):

We see that CSV::Row#[] takes an optional second
argument that is an offset from which to begin looking for a field name.
For this particular data, r["phone_number",0] and r["phone_number",1] would resolve as the first
phone number field; an index of 2 or 3 would look up the second phone
number. If we know the names of the columns near each phone number, we can
do this in a bit of a smarter way:

Although this still depends on ordinal positioning to some extent,
it allows us to do a relative index lookup. If we know that “phone number”
is always going to be next to “applicant” and “spouse,” it doesn’t matter
which column they start at. Whenever you can take advantage of this sort
of flexibility, it’s a good idea to do so.

So far, we’ve talked about reading files, but the
csv library handles writing as well. Rather than
continue with our irb-based exploration, I’ll just
combine the features that we’ve already gone over with a couple new ones,
so that we can look at a tiny but fully functional script.

Our task is to convert the previously mentioned
payments.csv file into a summary report
(payment_summary.csv), which will look like
this:

The core mechanism for doing the grouping-and-summing operation is
just a hash with default values of zero for unassigned keys. If you
haven’t seen this technique before, be sure to make a note of it, because
you’ll see it all over Ruby scripts. The rest of the work is easy once we
exploit this little trick.

As far as processing the initial CSV file goes, the only new trick
we’ve added is to specify the option :converters
=> :numeric. This tells csv to hit each
cell with a check to see whether it contains a valid Ruby number. It then
does the right thing and converts it to a Fixnum or Float if there’s a match. This lets us normalize
our data as soon as it’s loaded, rather than litter our code with to_i and to_f
calls.

For this reason, the foreach loop
is nothing more than simple addition, keying the name to a running total
of payments.

Finally, we get to the writing side of things. We use CSV.open in a similar way to how we might use
File.open, and we populate a CSV object by shoving arrays that represent rows
into it. This code is a little prettier than it might be in the general
case, as we’re working with only two columns, but you should be able to
see that the process is relatively straightforward nonetheless.

Here we see a useful little script based on the
csv library weighing in at around 10 lines of code.
As impressive as that might be, we haven’t even scratched the surface on
this one, so be sure to dig deeper if you have a need for processing
tabular datafiles. One thing that I didn’t show at all is that
csv handles reading and writing from strings just as
well as it does files, which may be useful in web applications and other
places where there is a need to stream files rather than work directly
with the filesystem. Another is dealing with different column and row
record separators. Luckily, the csv library is
comparably well documented, so all of these things are just a quick API
documentation search away.

A great thing about the newly revamped csv
standard library is that you don’t necessarily need to upgrade to Ruby 1.9
to use it. It started as a third-party alternative to Ruby 1.8’s CSV
standard library, under the name FasterCSV, and this project is still supported
under Ruby 1.8.6. So if you like what you see here, and you want to use it
in some of your legacy Ruby code, you can always install the
fastercsv gem and be up and running.

Transactional Filesystem-Based Data Storage (pstore)

PStore provides a simple,
transactional database for storing Ruby objects within a file. This gives
you a persistence layer without relying on any external resources, which
can be very handy. Using PStore is so
simple that I can forgo most of the details and jump right into some code
that I use for a Sinatra microapp at work. What follows is a very simple
backend for storing replies to an anonymous survey:

In our application, the usage of this object for storing responses
is quite simple. Given that box is just
a SuggestionBox in the following code,
we just have a single call to add_reply
that looks something like this:

So that covers the usage, but let’s go back in a little more detail
to the implementation. You can see that our store is initialized by just constructing a new
PStore object and passing it a filename:

def store
@store ||= PStore.new(@filename)
end

Then, when it comes to using our PStore, it
basically looks like we’re dealing with a hash-like object, but all of our
interactions with it are through these transaction blocks:

def add_reply(reply)
store.transaction do
store[:replies] ||= []
store[:replies] << reply
end
end
def replies
store.transaction(readonly=true) do
store[:replies]
end
end
def clear_replies
store.transaction do
store[:replies] = []
end
end

So the real question here is what do we gain? The answer is,
predictably, a whole lot.[21]

By using PStore, we can be sure
that only one write-mode transaction is open at a time, preventing issues
with partial reads/writes in multiprocessed applications. This means that
if we attempt to produce a report against SuggestionBox while a new suggestion is being
written, our report will wait until the write operation completes before
it processes. However, when all transactions are read-only, they will not
block each other, allowing them to run concurrently.

Every transaction reloads the file at its start, keeping things
up-to-date and synchronized. Every write that is done checks the MD5 sum
of the contents to avoid unnecessary writes for unchanging data. If
something goes wrong during a write, all the write operations in a
transaction are rolled back and an exception is raised. In short, PStore provides a fairly robust persistence
framework that is suitable for use across multiple applications or
threads.

Of course, though it is great for what it does, PStore has notable limitations. Because it loads
the entire dataset on every read, and writes the whole dataset on every
write, it is very I/O-intensive. Therefore, it’s not meant to handle very
high load or large datasets. Additionally, as it is essentially nothing
more than a file-based Hash object, it
cannot serve as a substitute for some sort of SQL server when dealing with
relational data that needs to be efficiently queried. Finally, because it
uses the core utility Marshal to serialize objects to
disk,[22]PStore cannot be used to
store certain objects. These include anonymous classes and Proc objects, among other things.

Despite these limitations, PStore has a very wide
sweet spot in which it is the right way to go. Whenever you need to
persist a modest amount of nonrelational data and possibly share it across
processes or threads, it is usually the proper tool for the job. Although
it is possible to code up your own persistence solutions on top of Ruby’s
raw serialization support, PStore
solves most of the common needs in a rather elegant way.

Human-Readable Data Serialization (json)

JavaScript Object Notation
(JSON) is an object serialization format that has been gaining a
ton of steam lately. With the rise of a service-oriented Web, the need for
a simple, language-independent data serialization format has become more
and more apparent.

Historically, XML has been used for interoperable data
serialization. However, using XML for this is a bit like going bird
hunting with a bazooka: it’s just way more firepower than the job
requires. JSON aims to do one thing and do it well, and these constraints
give rise to a human-readable, human-editable, easy-to-parse, and
easy-to-produce data interchange format.

The primitive constructs provided by JSON are limited to hashes
(called objects in JSON), arrays, strings, numbers,
and the conditional triumvirate of true, false,
and nil (called null in JSON). It’s easy to see that these
concepts trivially map to core Ruby objects. Let’s take a moment to see
how each of these data types are represented in JSON:

There isn’t really much to it. In fact, JSON is somewhat
syntactically similar to Ruby. Though the similarity is only superficial,
it is nice to be able to read and write structures in a format that
doesn’t feel completely alien.

If we go the other direction, from JSON into Ruby, you’ll see that
the transformation is just as easy:

Without knowing much more about Ruby’s json
standard library, you can move on to building useful things. As long as
you know how to navigate the nested hash and array structures, you can
work with pretty much any service that exposes a JSON interface as if it
were responding to you with Ruby structures. As an example, we can look at
a fairly simple interface that does a web search and processes the JSON
dataset it returns:

Here we’re using json and the
open-uri library, which was discussed earlier in this
appendix as a way to wrap a simple
Google web search. When run, this code will print out a few page titles
and their URLs for any query you enter. Here’s a sample of GSearch in action:

Maybe by the time this book comes out, we’ll have nabbed all four of
the top spots, but that’s beside the point. You’ll want to notice that the
GSearch module interacts with the
json library in a sum total of one line, in order to
convert the dataset into Ruby. After that, the rest is business as
usual.

The interesting thing about this particular example is that I didn’t
read any of the documentation for the search API. Instead, I tried a
sample query, converted the JSON to Ruby, and then used Hash#keys to tell me what attributes were
available. From there I continued to use ordinary Ruby reflection and
inspection techniques straight from irb to figure out
which fields were needed to complete this example. By thinking of JSON
datasets as nothing more than the common primitive Ruby objects, you can
accomplish a lot using the skills you’re already familiar with.

After seeing how easy it is to consume JSON, you might be wondering
how you’d go about producing it using your own custom objects. As it turns
out, there really isn’t that much to it. Say, for example, you had a
Point class that was responsible for
doing some calculations, but that at its essence it was basically just an
ordered pair representing an
[x,y] coordinate. Producing the
JSON to match this is easy:

Here, we have simply represented our core data in primitives and
then wrapped our object model around it. In many cases, this is the most
simple, implementation-independent
way to represent one of our objects.

However, in some cases you may wish to let the object internally
interpret the structure of a JSON document and do the wrapping for you.
The Ruby json library provides a simple hook that
depends on a bit of metadata to convert our parsed JSON into a customized
higher-level Ruby object. If we rework our example, we can see how it
works:

Although a little more work needs to be done here, we can see that
the underlying mechanism for a direct Ruby→JSON→Ruby round trip is simple.
The JSON library depends on the attribute json_class, which points to a string that
represents a Ruby class name. If this class has a method called json_create, the parsed JSON data is passed to
this method and its return value is returned by JSON.parse. Although this approach involves
rolling up our sleeves a bit, it is nice to see that there is not much
magic to it.

Although this adds some extra noise to our JSON output, it does not
add any new constructs, so there is not an issue with other programming
languages being able to parse it and use the underlying data. It simply
comes with the added benefit of Ruby applications being able to simply map
raw primitive data to the classes that wrap them.

Depending on your needs, you may prefer one technique over the
other. The benefit of providing a simple to_json hook that produces raw primitive values
is that it keeps your serialized data completely implementation-agnostic.
This will come in handy if you need to support a wide range of clients.
The benefit of using the "json_class"
attribute is that you do not have to think about manually building up
high-level objects from your object data. This is most beneficial when you
are serving up data to primarily Ruby clients, or when your data is
complex enough that manually constructing objects would be painful.

No matter what your individual needs are, it’s safe to say that JSON
is something to keep an eye on moving forward. Ruby’s implementation is
fast and easy to work with. If you need to work with or write your own web
services, this is definitely a tool you will want to familiarize yourself
with.

Embedded Ruby for Code Generation (erb)

Code generation can be useful for dynamically generating static
files based on a template. When we need this sort of functionality, we can
turn to the erb standard library. ERB stands for
Embedded Ruby, which is ultimately exactly what the
library facilitates.

In the most basic case, a simple ERB template[23] might look like this:

If you’ve not worked with ERB before, you may be wondering how this
differs from ordinary string interpolation, such as this:

x = 42
puts "The value of x is: #{x}"

The key difference to recognize here is the way the two strings are
evaluated. When we use string interpolation, our values are substituted
immediately. When we evaluate an ERB template, we do not actually evaluate
the expression inside the <%= ...
%> until we call ERB#result. That means that although this code
does not work at all:

This is the main reason why ERB can be useful to us. We can write
templates ahead of time, referencing variables and methods that may not
exist yet, and then bind them just before rendering time using binding.

We can also include some logic in our files, to determine what
should be printed:

Here, we run the same template against three different objects,
evaluating it within the context of each of their bindings. The more
complex ERB.new call here sets the
safe_level the template is executed in to 0 (the
default), but this is just because we want to provide the third argument,
trim_mode. When trim_mode is set to
"<>", the newlines are omitted
for lines starting with <% and
ending with %>. As this is useful
for embedding logic, we need to turn it on to keep the generated string
from having ugly stray newlines. The final output of the script looks like
this:

The value of x is 10
The value of x is 21
You have stumbled across the Answer to the Life, the Universe, and Everything

As you can see, ERB does not emit text when it is within a
conditional block that is not satisfied. This means a lot in the way of
building up dynamic output, as you can use all your normal control
structures to determine what text should and should not be
rendered.

The documentation for the erb library is quite
good, so I won’t attempt to dig much deeper here. Of course, no mention of
ERB would be complete without an example of HTML templating. Although the
API documentation goes into much more complicated examples, I can’t resist
showing the template that I use for rendering entries in my blog
engine.[24] This doesn’t include the site layout, but just the code that
gets run for each entry:

This code is evaluated in the context of a Blaag::Entry object, which, as you can clearly
see, does most of the heavy lifting through helper methods. Although this
example might be boring and a bit trivial, it shows something worth
keeping in mind. Just because you can do all sorts of logic in your ERB
templates doesn’t mean that you should. The fact that these templates can
be evaluated in a custom binding means that you are able to keep your code
where it should be while avoiding messy string interpolation. If your ERB
templates start to look more like Ruby scripts than templates, you’ll want
to clean things up before you start pulling your hair out.

Using ERB for templating can
really come in handy. Whether you need to generate a form letter or just
plug some values into a complicated data format, the ability to late-bind
data to a template and generate dynamic content on the fly from static
templates is powerful indeed. Just be sure to keep in mind that ERB is
meant to supplement your ordinary code rather than replace it, and you’ll
be able to take advantage of this useful library without creating a big
mess.

Conclusions

Hopefully these examples have shown the diversity you can come to
expect from Ruby’s standard library. What you have probably noticed by now
is that there really is a lot there—so much so that it might be a little
overwhelming at first. Another observation you may have made is that there
seems to be little consistency in interface between the libraries. This is
because many were written by different people at different stages in
Ruby’s evolution.

However, assuming that you can tolerate the occasional wart, a solid
working knowledge of what is available in Ruby’s standard libraries is a
key part of becoming a masterful Rubyist. It goes beyond simply knowing
about and using these tools, though. Many of the libraries discussed here
are written in pure Ruby, which means that you can actually learn a lot by
grabbing a copy of Ruby’s source and reading through their
implementations. I know I’ve learned a lot in this manner, so I
wholeheartedly recommend it as a way to test and polish your Ruby
chops.