2017-10-01T06:38:27+00:00http://thorstenball.com/2017-06-28T17:30:00+00:00http://thorstenball.com/blog/2017/06/28/the-lost-chapter-a-macro-system-for-monkeyIf you don’t care about the Who, Where, When, Why, How and the Why Is It A Lost
Chapter? and want to skip to the What: I wrote a new chapter for Writing An
Interpreter In Go and you can read it for free at
interpreterbook.com/lost. Otherwise, read on…

The pages you are about to read were found amidst the rubble of a collapsed
ruin. Wedged between the scratched and battered cases of old machines once
called “computers”. Bearing, in faint white and barely readable, the title
“Writing An Interpreter In Go. Chapter 5: A Macro System For Monkey.” …

Alright, I’ll admit it: that was a lie. What I want to show you is not really a
lost chapter, preserved through the eons, found in the ruins of a long-gone
civilization. I just needed a good intro.

You see, I couldn’t sit still. In the first couple of months after publishing
Writing An Interpreter In Go I took some time off from
Monkey, the programming language we built in the book. “The book is
done. Take a breath and play around with something else. After working on it for
a year you deserve it”, I told myself, only to grow more anxious by the week
about all the features, optimizations and tweaks I could try and add to Monkey.
In the end, the temptation of everything Monkey could still be won. I gave in
and restarted work on Monkey again.

This resulted in two things: a project I’m not ready to talk about
yet and a new, additional chapter for Writing An Interpreter
In Go called The Lost Chapter: A Macro System For Monkey, which I want to
tell you all about.

It started with me getting sidetracked while working on said secret project by
discovering how elegant and beautiful macros in Racket are. I
guess, I just can’t stop myself from ushering an impressed “nice” when hearing
about “code that writes code”. Next thing I know I was digging through various
implementations of macros in different languages and getting more and more
fascinated. It’s code that writes code! It’s a hand that draws
itself! How could I not be fascinated by that?

A few “Huh, interesting…” followed by more “Well, I guess, it wouldn’t be
too hard to just…” later I successfully added macros to Monkey. Macros that
are able to modify and generate Monkey source code and are evaluated in their
own macro expansion phase. A real, Lisp-style macro system. I was elated.

In fact, the whole journey from learning about how macros are implemented and
why they’re so powerful to implementing them myself was so mind-blowing and fun
that I had to write about it.

At first I thought I was writing a blog post or a tiny addition to the book and
gave it the working title “The Lost Appendix”, thinking of a few pages hidden at
the end of a book.

It ended up with the title The Lost Chapter: A Macro System For Monkey,
because what we have here is not a small addition. It’s a complete chapter,
close to 50 pages in PDF format, that shows you how to implement a fully-working
macro system for Monkey - step by step, all code shown, fully tested, just like
the book. You can think of it as the fifth chapter of Writing An Interpreter In
Go, since it seamlessly continues the previous four. It’s just being delivered
a few months later than the rest of the book.

But why “The Lost Chapter”? Because a text about macros deserves a touch of
mystery, don’t you think? It’s code that writes code, come on! It’s snakes
eating their own tail and surgeons operating on themselves! If that isn’t worthy
of title that’s a little bit out there, I don’t know what is.

I also didn’t want to make it an addition to the book itself. On the practical
side there’s the hurdle of extending a paperback edition by around 50 pages and
not being able to send the update to readers who already bought the paperback.
But then there were also, let’s say, “conceptual” considerations.

While I consider learning to build your own programming language a worthwhile
endeavor that can teach you a lot of valuable things about programming, I’ll
concede that it looks pretty disconnected from the realities of one’s day job.
But adding a macro system? Writing code that lets you write code that writes
code? That doesn’t just look unrealistic, but rather … Let me put it this way:
totally and completely nuts and, oh, incredible fun!

I wanted this chapter to be exactly that: a fun addition to Writing An
Interpreter In Go, not quite Monkey canon, but a bizarro expansion pack; a
curious and accidental supernova in the same universe.

Oh, and did I mention it’s available for free? Well, it’s available for free.
Read it online or download it as PDF/HTML/Mobi/ePub here:

The downloadable version also includes all the runnable, tested code shown in
the chapter and the complete Monkey interpreter from Writing An Interpreter In
Go.

I hope it’ll get you to usher a “nice”, too.

]]>2017-02-22T17:45:00+00:00http://thorstenball.com/blog/2017/02/22/writing-an-interpreter-in-go-the-paperback-editionIf you’d asked me a only few months ago if there’ll ever be a printed version of
Writing An Interpreter In Go I’d responded with a “Huh, uummm, well, I
don’t know. Maybe. Maybe if I’ll find the time and if there’s any interest.”

As it turned out, to my surprise, quite a few people told me that they’d love
hold a copy of the book in their hands. And I also had some free time on my
hands. Alright, let’s do it then, I thought.

But even though I said that time and interest were the only limiting factors, I
knew that there couldn’t be a printed version without Monkey - the programming
language that we build in the book - having a logo. Yes, I know, I know, that’s
not a real requirement, but a little indulgement I wouldn’t deny myself. So I
created a 99designs contest and Hazel Anne submitted
the winning entry. I love the logo Monkey has now.

A paperback version of a book also needs a full cover, front and back, and so I
wrestled with vector images and PDFs and print dimensions and page bleed and
spine widths for quite a while. But, in the end, using
createspace to print and distribute my book turned out to be much
easier than one might think. I was lucky enough to already have had a working
Pandoc setup in place and only needed to add one more LaTeX template,
the one for the print version.

If you appreciate holding a physical copy of a book in your hands more than
having a PDF on your hard drive, I hope you enjoy this paperback edition.

]]>2017-02-08T18:00:00+00:00http://thorstenball.com/blog/2017/02/08/higher-value-toolsThere are certain tools that provide incredibly high value. Much more so than
others. They provide so much value by acting as a multiplier of power and
leverage. And I think there’s something they all have in common.

I’m talking about interpreters, compilers and transpilers. Programming languages
are the ultimate, universal tools and sit at the bottom of stack on which a
bazillion other tools are built. Some programming languages offer so much power
that their creation was the big bang for whole categories of other tools.

But I’m also talking about DSLs, code generators and templating engines. And
databases with query languages. And database drivers that make these databases
available to programming languages. jQuery and its $('exactly what I want')
interface. jq and its query language. Webservers. Editors, IDEs, code
analysers and generators.

It seems to me what they all have in common, what is close to their center of
power, is parsing. Parsing user input, parsing source code, parsing query
expressions, parsing configuration files, parsing network responses. Maybe it’s
parsing itself what makes these tools so powerful. I’m not sure.

What I know and what I’m sure about is that without knowledge of parsing you
won’t be able to build tools like these. Knowing how to write a parser is like a
secret power and once you have it, you realize that you’re now able to solve a
whole range of problems you haven’t even considered before. Now you can create
higher value tools.

]]>2017-01-16T18:00:00+00:00http://thorstenball.com/blog/2017/01/16/what-i-didnt-do-to-write-a-bookI wrote my book “Writing An Interpreter In Go” over the course of 11
months. The first four months were spent on building the Monkey programming
language and its interpreter. In the following seven months I wrote the book
itself and at times it felt like I’ll never finish. But I did and now I want to
answer a question a few people have asked me: “How?”

What follows is much more of a confession than a precise description of a
refined workflow or a secret productivity technique.

I didn’t have a TODO list I didn’t abandon after three weeks. Did I get things
done? I did, but I never read Allen’s book. I also didn’t organize my time
according to the four quadrants. I didn’t use a bullet
journal to keep on top of ideas and tasks, didn’t use a
pomodoro timer and didn’t keep a work journal.
org-mode? I wish. Unplug, turn off notifications and just use pen
and paper? That’s ridiculous, I have a keyboard.

Some tasks and ideas I put in Wunderlist, some in a Trello board and others in a
file called “TODO.md”. Occasionally I even came back to each one and moved some
things around.

Taking notes wasn’t much more organized. There’s a shell script I built. It’s
based on the sound principles of popsicle-sticks-and-duct-tape-engineering and
helped me to quickly create text files in a “notes” folder. Other times I used
Notes.app. I also had iA Writer on my phone to access my Dropbox
folder and directly write random ideas into the book. When I felt like it, I
also did this on my computer: write ideas and outlines directly into the files
that make up the book.

All of this changed from week to week and month to month. Sometimes from one day
to the other.

The only constant in these 11 months was this: I was determined to finish the
book, to keep chipping away at it until it’s done. I got up every day at 5:45am
and tried to get take another step forward, using whatever it takes.

But don’t take this for something it isn’t. To say that every morning I sat down
at the computer and got a solid hour of writing done before heading to work
would be lying.

Sometimes I got up, drank two, three cups of coffee and just browsed the
internet for an hour, breaking the chain. Other times I
wrote for ten minutes at home and for 30 more on the train. And on my best days,
I wrote for an hour at home and for the whole train ride. And some days I only
wrote down one sentence, more often than not starting with “FIXME:”.

Is there a moral to the story? I’m not sure, maybe it’s this one: productivity
tools and techniques can only help, they won’t ever do the work for you.

It’s easy to fall into this trap and think that once the TODO lists are tidy and
organized and the best notebook money can buy is sitting on the table, half of
the work is already done. Of course, that’s not the case. Just like an expensive
guitar won’t make you a great guitar player and the best running shoes won’t get
you out of the door every day, productivity techniques won’t finish your
project. They might help, but you have to put the work in. You have to keep
showing up and keep chipping away at it. No tool will ever do that for you.

]]>2017-01-04T17:00:00+00:00http://thorstenball.com/blog/2017/01/04/a-virtual-brainfuck-machine-in-goYou’re a programmer and your product manager walks up to your desk, taps you on
the shoulder and asks if you have a couple of minutes to spare. She needs to
talk to you about something. You sit down together and she has a serious look on
her face. Oh boy. Something’s up. “Do you have anything important on your
plate right now? I need you to do something for me.” Here it comes… “I
need you to write a Brainfuck interpreter for me. A fast one.”

Some people might say that this conversation will never, ever happen. Well,
“better be prepared” is what I say.

Brainfuck

Brainfuck is a weird looking programming language and keeps every promise its
name makes. Here is “Hello, World!” in Brainfuck:

I think, he reached that goal. Implementing Brainfuck is an eye-opening
experience. Even though it’s a tiny language, it’s perfectly well-equipped to
illustrate a number of concepts behind programming language implementations.

But before we can build Brainfuck, we need to understand how Brainfuck thinks.

Views Of The World

One thing in which programming languages differ is their model of the world and
how they make it accessible to their users.

Take C, for example. Leaving aside the multitude of abstractions that hide in
the depth of the kernel and the hardware, when working with C you can peek
behind the curtain and see the inner workings of your computer. You are pretty
close to the hardware-supported stack and you can allocate and free memory on
the heap. If you’re experienced and stare intently enough, you can see the
actual machine code when looking at your C code. The same goes for C++.

In Forth you mainly work with a stack. You push, you pop, you swap and
drop. Nearly everything you do happens on a stack. In Forth, the stack is the
world.

In other languages, these underlying assumptions about the mechanics of the
world are abstracted away. Even though the current version of the Ruby Virtual
Machine has a stack, you won’t notice. You don’t push and pop, but send messages
to objects. The same goes for Java. You have classes that inherit from each
other and memory allocation only concerns you in so far as the garbage collector
shows up on time.

Then there are some languages that explicitly tell you what their world looks
like. Especially intermediate languages, which are not meant to be written by
hand, but are representation of end-user languages and easier for computers to
understand and optimize. WebAssembly,
for example, represents the commands of a stack-based machine, that gets then
emulated by a runtime (which will be a browser, most of the time). Java
bytecode is a representation of
Java code in the world of a stack machine.

Brainfuck Machines

And then there’s Brainfuck. Brainfuck doesn’t just tell you what its view of
the world is, no, it smacks you over the head with it.

Brainfuck is based on the assumption that Brainfuck code will be executed by a
Brainfuck machine. Just like the PUSH and POP operations in Java bytecode
assume that the JVM manages a stack, the + and - in Brainfuck assume that
there’s a Brainfuck machine which supports these two instructions.

So what does this Brainfuck machine look like? Not too complicated! It only
has a few parts:

Memory: The machine has 30000 memory cells, that can each hold an integer
value from 0 to 255 and are initialized to 0 by default. Each cell is
addressable by a zero based index, giving us a range of 0 to 29999 as possible
indexes.

Data pointer: It “points” to a memory cell, by holding the value of
the cell’s index. E.g.: if the value of the data pointer is 3, it points to
the fourth memory cell.

Code: The program that’s executed by the machine. It’s made up
of single instructions, which we’ll get to in a short while.

Instruction pointer: It points to the instruction in the code that’s to
be executed next. E.g.: if the code is ++-++ and the instruction
pointer has the value 2 then the next instruction to be executed is -.

Input and output streams: Just like STDIN and STDOUT in Unix systems,
these are normally connected to the keyboard and the screen and are used for
printing and reading characters.

CPU: It fetches the next instruction from the code and executes it,
manipulating the data pointer, instruction pointer, a memory cell or the
input/output streams accordingly.

That’s it. Those are all the parts of a complete, working Brainfuck machine that
can execute Brainfuck code. So let’s take a closer look at Brainfuck code.

The Instructions

Brainfuck is tiny. It consists of eight different instructions. These
instructions can be used to manipulate the state of the Brainfuck machine:

> - Increment the data pointer by 1.

< - Decrement the data pointer by 1.

+ - Increment the value in the current cell (the cell the data pointer is pointing to).

- - Decrement the value in the current cell.

. - Take the integer in the current cell, treat it as an ASCII char and
print it on the output stream.

, - Read a character from the input stream, convert it to an integer and
save it to the current cell.

[ - This always needs to come with a matching ]. If the current cell
contains a zero, set the instruction pointer to the index of the instruction
after the matching ].

] - If the current cell does not contain a zero, set the instruction
pointer to the index of the instruction after the matching [.

That’s all of it, the complete Brainfuck language.

Even though these instructions look archaic, they’re just identifiers. Replace
+ with PLUS, - with SUB, . with PRINT and [ with LOOP and
suddenly Brainfuck starts to look more like
Brain-oh-wow-wait-a-second-I-can-actually-read-that.

Now that we know what the machine should look like and what it has to do, let’s
get started with building it.

Building The Machine

The basic structure will be called - you guessed it - Machine and looks like
this:

Here we step through every instruction in m.code until we reach its end. In
order to execute each instruction individually, we have a switch statement,
that “decodes” the current instruction and manipulates the machine according to
which instruction it is.

In the case of + and - we manipulate the current memory cell, incrementing
and decrementing its value respectively. The current memory cell is pointed to
by the data pointer, m.dp, and we can get to it with m.memory[m.dp]. And in
order to change the data pointer itself, we have two case branches for > and
<.

So far, so good. But we’re missing printing and reading, the . and ,
instructions. In order to implement support for those, we need to make a slight
modification: we need to give our Machine a one-byte buffer slice.

readChar reads one byte from the input, which will be os.Stdin, and then
transfers this byte to the current memory cell, m.memory[m.dp]. putChar does
the opposite and writes the content of the current memory cell to the output
stream, which will be os.Stdout.

It has to be said, that instead of doing proper error handling here, we just let
the machine blow up by calling panic. That shouldn’t happen, of course, when we
plan to use it in production (I dare you), so keep that in mind.

Using these two methods means adding new case branches to the switch statement
in Execute:

And with that, our Brainfuck machine can read and print characters! It’s time to
move on to the hairiest part of the implementation.

Looping

Brainfuck’s two control flow instructions are [ and ]. And they’re not
quite like loops or other control flow mechanisms in “normal” languages.
Expressed in some Go-like dialect of pseudo-code, what they do is this:

Note the two different conditions of the if-statements. They are the most
important bits here, because they give both instructions separate meaning.
Here’s an example to see how [ and ] can be used:

+++++ -- Increment current cell to 5
[ -- Execute the following code, if the current cell is not zero
-> -- Decrement current cell, move data pointer to next cell
+< -- Increment current cell, move data pointer to previous cell
] -- Repeat loop if current cell is non-zero

This snippet increments the current cell to 5 and then uses [ and ] to add
the cell’s value to the next cell, by decrementing and incrementing both cells in a
loop. The body of the loop will be executed 5 times until the first cell
contains zero.

Of course, implementing the “does the current memory cell hold zero or not?”
check is not the problem. Finding the matching brackets is what’s hairy about
this, because brackets can be nested. It’s not enough to find the next ] when
we encounter a [, no, we need to keep track of every pair of brackets we find.
How are we going to do that? With a simple counter! Here is the pseudo-code from
above turned into real Go code:

Here we check whether the current memory cell’s value is zero and if it is, we
try to set the instruction pointer, ip, to the position of the matching ].
In order to do that correctly in the face of nested bracket pairs, we use
depth as a counter. With each [ we pass, we increment the counter, and with
each ] we decrement it. Since it’s set to 1 initially, we know that we are
sitting on our matching ] when depth is 0. And that means that m.ip is
set to the correct position. The m.ip++ at the end of the for-loop does the
rest and sets the instruction pointer to the instruction right after the
matching bracket.

The case ']' branch is the mirrored version, where we walk backwards in the
instructions, trying to find the matching [.

It’s time to flip the power switch on this machine.

Hello World

Here is a small driver, that reads in a file and passes it to our Brainfuck
machine:

So slow!

I have some good and some bad news. Our product manager said that the Brainfuck
interpreter needs to be fast and, sadly, ours isn’t. That’s the bad news.

On my computer, our machine currently takes around 70 seconds to execute
mandelbrot.b, a mandelbrot set fractal viewer written in Brainfuck
by Erik Bosman, that’s often used as a benchmark for Brainfuck interpreters.
That’s slow.

The good news is, that there are a few things we can do to make it faster.

Take a look at the hello_world.b example from above or the mandelbrot.b
program. See all those runs of + and -? There are a lot of instructions of
the same type right behind each other in Brainfuck programs. And we have to read
each one, check which one it is and then execute it.

The overhead of doing this is high. Consider this Brainfuck snippet: +++++. In
order to execute it, we need five cycles of “fetch the next instruction”, “what
instruction do we have here?” and “execute this!”. That turns into us
incrementing the value of the current memory cell by one five times. It would
give us a huge performance boost if we could just increase the current cell’s
value by five directly.

The other thing that’s slowing us down is the way we handle [ and ]. Every
time we stumble upon such a bracket, we go looking for its matching counterpart
again. Scan the program, keep track of all the other brackets we pass and then
modify the instruction pointer. The longer the program, the longer this will
take. If we could do that just once for each bracket and remember the position
of its matching counterpart, we wouldn’t need to rescan the program again and
again.

And here’s the best of news: we can! We can do all of this before we even start
up our Brainfuck machine. We can turn +++++ into something that says “increase
by 5”. We can also do the same for -, >, <, ., and ,. And we can find
and remember the positions of matching bracket pairs beforehand. All we need to
do is create another representation of the original Brainfuck code that can
include these optimizations and have our machine execute this instead.

A New Instruction Set

Up until now we’ve used a string to represent the code, that’s to be executed by
the Machine. But in order to make optimizations, we need a new instruction
set. Here is the Instruction type, that makes up the new set:

Each Instruction has a Type and an Argument. The Type can be one of the
predefined constants defined at the top, where each constant has a corresponding
Brainfuck instruction. The interesting part here is the Argument field. This
field allows us to make our instruction set much more dense than the original
Brainfuck code. We can put more information in less instructions. We’ll use
Argument in two ways:

In the case of +, -, ., ,, >, and < the Argument field will
contain the number of original Brainfuck instructions this Instruction
represents. E.g.: +++++ will be turned into Instruction{Type: Plus,
Argument: 5}

In the case of [ and ] the Argument field will contain the position of
the instruction of the matching bracket. E.g.: the Brainfuck snippet [] will
be turned into two Instructions: Instruction{Type: JumpIfZero, Argument:
1} and Instruction{Type: JumpIfNotZero, Argument: 0}.

Now that we have our new Instruction type and know how this new instruction
set is to be interpreted, we can modify our Machine to do exactly that. The
first thing we need to do is to change its definition, so it doesn’t work with a
string anymore, but with a slice of *Instruction:

That’s a lot cleaner than what we had before, right? And it’s faster, too! Well,
I can’t prove it yet, because there’s still a piece missing: something that
turns Brainfuck code into a slice of *Instructions.

That looks remarkably close to the Execute method of the current and previous
versions of our Machine. But there’s a huge difference: whereas the Machine
executed the Brainfuck instructions directly, our Compiler now turns them into
*Instructions, so they can be executed later. Here is what the
CompileFoldableInstruction method does:

Together with EmitWithArg the CompileFoldableInstruction method scans through the
input code (the Brainfuck string code) to see if the current instruction is
followed by other instructions of the same type. If that’s the case, it folds
those Brainfuck instructions into one Instruction.

EmitWithArg is a helper method that creates a new *Instruction, adds it to
the c.instructions slice of the Compiler and returns the position of this
newly created instruction in c.instructions.

Returning the position of the newest instruction is an important detail, because
we’re going to need it now. As you may have noticed, we didn’t add support for
[ and ] to our Compiler yet. That’s because these are not foldable
instructions (e.g.: we cannot turn [[[ into a single instruction), but need to
do something more elaborate.

Compiling Loops

We have two loop instructions: [ and ]. And we want to turn them into
JumpIfZero and JumpIfNotZero instructions, where the Argument field
contains the position of the matching bracket. That is: the position of the
matching counterpart Instruction in the final instructions slice.

That’s easier said than done, though. The problem is that when we encounter a
[ we don’t know where in the final instructions slice the matching ]
instruction will end up. Counting the instructions in between doesn’t work,
because it’s possible that those will be folded together in the next compilation
step and thus invalidate the position we got through counting.

Then there’s also the problem of remembering the position of the last
JumpIfZero instruction, so it can be used as Argument when constructing the
matching JumpIfNotZero instruction.

But here’s what we’re going to do, here’s how we’re going to solve these
problems. First, we will emit a JumpIfZero instruction for each [ we
encounter, with the placeholder value 0 in the Argument field. Later, when
we have constructed the matching JumpIfNotZero instruction, we’re going to
come back to this instruction and change its Argument to the real value.

In order to later be able to change them, we need to keep track of JumpIfZero
instructions. And we’re going to use a stack to do that, implemented with a
simple Go slice:

loopStack, which acts as a stack onto which we can push elements and later pop
them off, is just an empty slice. There’s not much to it. Interesting here is
the case branch for the [ instructions. Just like we discussed, we emit a
new JumpIfZero instruction with a placeholder Argument. Then comes the
important part: we push the position of the new JumpIfZero position onto our
loopStack.

That, in turn, allows us to correctly handle ] instructions:

// compiler.gofunc(c*Compiler)Compile()[]*Instruction{// [...]case']':// Pop position of last JumpIfZero ("[") instruction off stackopenInstruction:=loopStack[len(loopStack)-1]loopStack=loopStack[:len(loopStack)-1]// Emit the new JumpIfNotZero ("]") instruction,// with correct position as argumentcloseInstructionPos:=c.EmitWithArg(JumpIfNotZero,openInstruction)// Patch the old JumpIfZero ("[") instruction with new positionc.instructions[openInstruction].Argument=closeInstructionPos// [...]}

We pop the position of the last JumpIfZero instruction, the opening [, which
still holds a placeholder 0 as Argument, off the stack, and use it as the
correct Argument for a new JumpIfNotZero instruction.

And since we now have the position of the JumpIfZero instruction, we can
access it in c.instructions and change its Argument from 0 to the
correct position of the new JumpIfNotZero instruction!

That’s looks a lot like our old driver. But instead of reading in a file and
passing its content to our Brainfuck machine, we first compile the original
Brainfuck code in the file to our new Instruction set. And these
Instructions will then be executed by our Machine.

If we now run this with the mandelbrot.b benchmark we can see
that our work paid off: what took 70s before now only takes 13s!

Taking A Closer Look

Yes, we’ve only implemented Brainfuck, a language with no syntax to speak
of and only eight different instructions. You might be tempted to call our two
Brainfuck machines toys. But let’s take a look at what we actually did.

The first thing we built is an interpreter that acts as a Brainfuck machine. It
has all the necessary parts: memory cells, data and instruction pointers, input
and output streams. The interpreter effectively tokenizes its input by
processing it byte by byte. It then evaluates each token on the fly. It’s not
much longer than 100 lines, but has all the essential parts of a fully-grown
interpreter.

And then we’ve built a compiler! Sure, it doesn’t output native machine code and
it’s really simple, but it’s a compiler nonetheless! It takes Brainfuck code as
input and outputs instructions for a machine - our Brainfuck machine. That’s the
basic idea behind compilers. We could also change the way our Instructions are
stored and passed around, and then we’d realize that our Machine is now a virtual
machine and is executing bytecode.

Now, that doesn’t sound like toys, does it? What we built is using the same
blueprints a lot of other, mature and production-ready programming languages
use. Once you’ve understood how and why they work, you start to recognize them
in other languages, too, and in turn understand these languages better.

And that’s why I think implementing Brainfuck can be a rewarding and eye-opening
experience.

You can find the complete code, including tests, for both versions of the
Brainfuck machine here on GitHub.

]]>2016-11-30T17:15:00+00:00http://thorstenball.com/blog/2016/11/30/why-i-wrote-a-book-about-interpretersLast week I’ve self-published my first book called “Writing An Interpreter In Go”,
which you can get at interpreterbook.com. I want to tell you
a little bit about why I chose to write this particular book.

Sometimes I jokingly call the summer of 2015 my “Summer Of Lisp”. But, honestly,
I’m only half joking when I say this. It really was a great and Lispy summer
programming-wise: I was working through the final chapters of Structure And
Interpretation Of Computer Programs (SICP), which I began studying at the
beginning of that year, was totally fascinated by Lisp, enamored by Scheme and
also starting to learn Clojure by working through the fantastic The Joy Of
Clojure.

SICP had an immense impact on me. It’s a wonderful book, full of elegant code
and ideas; it hearkens “to a programming life that if true, would be an
absolute blast to live in”. Especially the fourth chapter made a
lasting impression. In this chapter, Abelson and Sussmann show the reader how to
implement the so called “meta-circular evaluator” - a Lisp interpreter in Lisp.
“Mesmerized” is probably the word I’d use to describe myself while reading this
chapter.

The code for the meta-circular evaluator is elegant and simple. Around 400 lines of
Scheme, stripped down to the essentials and doing exactly what they are supposed
to. It’s a beautiful piece of software. I asked a friend to design a poster for
me, containing only the source code for the meta-circular interpreter,
beautifully formatted. That poster hung next to my office desk for over a year.

But soon I discovered why it’s only 400 lines. The code presented in the book
skips the implementation of an entire component - the parser. Huh. But how does
a parser work then? I was stumped. I really wanted to know how that parser
works. And I almost never want to skip anything I don’t know yet. I really
want to know how things work, at least in a rough sense. Black boxes and
skipping things always leave me wanting to dig deeper.

In that same summer I also read Steve Yegge’s “Rich Programmer
Food”, in which he argues what a worthwhile goal it is to
learn about and to understand compilers. Let me quote my favorite passage:

You might even stop bragging about how smart your tools are, how amazing it is that they can understand your code […]

You’ll be able to jump in and help fix all those problems with your favorite language.

That blog post flipped a switch. Determined as if there was some kind of weird
challenge I said to a friend of mine: “I’m going to write a compiler”. I
believe, I was gazing into the distance while saying this. “Alright”, he said
rather unimpressed, “do it.”

Without having taken a compiler course in college or even having a computer
science degree I set out to write a compiler. The first goal, I determined, is
to get a foot in the door and write an interpreter. Interpreters are closely
related to compilers, but easier to understand and to build for beginners. But
most importantly, this time there would be no skipping of anything. This
interpreter will be built from scratch!

What I found was that a lot of resources for interpreters or compilers are
either incredibly heavy on theory or barely scratching the surface. It’s either
the dragon book or a blog post about a 50 line Lisp interpreter.
The complete theory with code in the appendix or an introduction and overview
with black boxes.

Every piece of writing helped though. Slowly but surely I was completing work on
my interpreter. The tiny tutorials, the slightly longer blog posts and the heavy
compiler books - I could find something useful in all of them.

Nevertheless I was getting frustrated. There needs to be a book, that … One
day, I said to the same friend, who earlier so enthusiastically encouraged me
to write a compiler:

“You know what… I’d love to write a book about interpreters. A book that shows
you everything you need to know to build an interpreter from scratch, including
your own lexer, your own parser and your own evaluation step. No skipping of
anything!”

Somehow this turned into me giving myself a motivational speech.

“And with tests too!”, I continued, “Yeah! Code and tests front and center! Not
like in these other books, where the code is an unreadable mess that you can’t
get to compile or run on your system. And you don’t need to be well versed in
mathematical notation either! It should be a book any programmer can read and
understand.”

It’s entirely possible that I was banging my fist on the table at this point.
Calmly, my friend said: “Sounds like a good idea. Do it.”

And here we are, 11 month later, and “Writing An Interpreter In Go” is available
to the public. It has around 200 pages and presents the complete and working
interpreter for the Monkey programming language, including the lexer, the
parser, the evaluator and also including tests. No black boxes, no 3rd party
tools and no skipping of anything. Nearly every page contains a piece of code.
I’m really proud of this book.

]]>2016-11-16T17:00:00+00:00http://thorstenball.com/blog/2016/11/16/putting-eval-in-goOver the past year I’ve spent a significant amount of time reading through Go’s
go packages, the packages used by the Go compiler
and other Go tools. But only recently did it occur to me that these are real,
public packages. I can actually import and use them! So then I started to
wonder what I could do with them when it suddenly struck me: “I can… I can
put Eval in Go! Using Go!”

Let me explain. There’s the scanner package, which contains the lexer
(or scanner, or tokenizer, …) that turns Go source code into tokens. These
tokens are defined in their own package, token. And then there’s the
parser, which takes the tokens and builds an AST. The definitions of
the AST nodes can be found in the perfectly named AST package. And then
there’s also a printer package to print these AST nodes.

In other words: we have all the necessary pieces here to build an Eval
function that evaluates Go code. In fact, with these packages we could build a
complete Go interpreter in Go. If you’re really interested in doing that,
check out the go-interpreter project, which aims to do just
that. Instead, let’s start small and write an Eval function that evaluates
mathematical Go expressions.

The next step would be to initialize Go’s scanner with these input lines and
turn them into tokens. Luckily, the parser package has a
ParseExpr function that does exactly that. It initializes the
scanner and reads in the tokens for us. It then
parses the tokens and builds an AST. We can use it to parse the input in our
REPL:

The result of our call to ParseExpr, exp, is an AST that represents the
entered Go expression, without such details as comments, whitespace or
semicolons. We can use the printer package to print it. We just have
to use token.NewFileSet() to make the printer believe that we got our Go
source code from a file:

Okay, yes, you’re right. That looks exactly like our “printing back the input”
mechanism we had before. But there’s more to it. What we’re actually doing here
is parsing the input and pretty-printing the AST produced by the parser. See
for yourself:

But we want more than just pretty-printing the AST. We want an Eval function
that evaluates mathematical Go expressions. What Eval has to do is to
traverse each node in the AST and evaluate it. Granted, this definition is
kinda recursive, but that’s perfect, because Eval itself is a recursive
function:

As you can see, Eval takes an ast.Expr as argument, which is what we get
back from parser.ParseExpr. It then traverses this part of the AST but only
stops at *ast.BinaryExpr and *ast.BasicLit nodes. The former is an AST node
that represents binary expressions (expressions with one operator and two
operands) and the latter represents literals, like the integer literals we used
in our REPL.

What Eval has to do in the case of an integer literal is easy. Integer
literals evaluate to themselves. If I type 5 into the REPL then 5 is what
should come out. Eval only needs to convert the parsed integer literal to a
Go int and return it.

The case of *ast.BinaryExpr is more complex. Here Eval has to call itself
two times to evaluate the operands of the binary expression. Each operand can
be another binary expression or an integer literal. And in order to evaluate
the current expression, both operands need to be fully evaluated. Only then,
depending on the operator of the expression, is the correct evaluating result
returned.

We’ve successfully put a working Eval function in Go! And it only took us
around 70 lines of code, because we used Go’s internal compiler tools.

]]>2015-10-22T17:45:00+00:00http://thorstenball.com/blog/2015/10/22/write-stupid-codeThis post has been translated to Chinese.

In the last couple of months I developed a certain approach to writing code.
Whenever I write a new function, class or method I ask myself: “Is this code
stupid enough?” If it’s not, it’s not done and I try to make it stupid.

Now, stupid code does not mean “code that doesn’t work”. Stupid code should
work exactly like it’s supposed to, but in the most simple, straightforward,
“stupid” way possible.

Anyone could write it and anyone reading it should be able to understand it. It
shouldn’t make the reader think about the code itself, but about the problem at
hand. It shouldn’t be long, it shouldn’t be complex and, most importantly, it
shouldn’t try to be clever. It should get the job done and nothing more.

What does stupid code look like? It depends on the problem it’s trying to solve.
Take meta-programming, for example, which is often considered complex and “black
magic”. Does asking myself “is this code stupid enough?” mean “no
meta-programming allowed”? Not necessarily, no. There are certain cases, in
which the problem can be solved in the simplest way through meta-programming.
But there are a lot more cases in which meta-programming is unnecessary and
additional baggage on top of the solution, which gets in the way of understanding
what the code is supposed to do.

The goal is to get rid of the baggage, to chip away at it until the most stupid,
still working, tests-passing code emerges.

Keep in mind the “stupid” here: “it works” is not good enough. A lot of complex,
“look at this clever trick”, overly-abstracted, unreadable code works and makes
the tests pass. That’s not what I’m after. It has to be stupid: not clever, not
complex, not hard to understand.

Besides “stupid” the resulting code might also be described as “elegant”, “clean” and
“simple”. But the “write stupid code” mantra is not as elusive as “write elegant
code”, for example, and seems far more achievable, which makes the approach much
more valuable to me. And besides that: I find it much more likely to start out
with “write stupid code” and end up with an elegant solution than the other way
around.

Not every elegant solution is straightforward, but “stupid” ones are, per
definition, and can also be elegant.

]]>2014-11-20T17:45:00+00:00http://thorstenball.com/blog/2014/11/20/unicorn-unix-magic-tricksThis post is based on the talk of the same name I gave at the
Arrrrcamp conference in Ghent, Belgium on October 2nd, 2014. You can find the
slides here and the video recording here.

Unicorn is a webserver written in Ruby for Rails
and Rack applications. When I first used it I was amazed. This is magic, I
thought. It had to be. Why?

Well, first of all: the master-worker architecture. Unicorn uses one master
process to manage a lot of worker processes. When you tell Unicorn to use 16
worker processes it does so, just like that. And now you’re looking at 17
processes when you run ps aux | grep unicorn — each with a different name,
showing whether its the master process or one of the worker processes, which
even have their own number in their process names.

And then there’s a feature called “hot reload”, which means that you can tell
Unicorn, while it’s running, to spin up a new version of your application. As
soon as you do, Unicorn starts a new master process, which is going to serve
the new version of your application. All the while the old master process is
still running, responding to requests with your old application. Of course, the
old master now has “old” in its name. Now, as soon as the new master process is
fully booted up, you can send a QUIT signal to the old master process, which
will in turn shut down and let the new one take over. And just like that you’ve
switched to a new version of your application — without any downtime at all.

Oh, and Unicorn uses a lot more than the QUIT signal! There are tons of
signals you can send to it: TTIN to increase the number of workers, TTOU to
decrease it, USR1 to rotate the log files, USR2 to perform hot reloading,
HUP to re-evaluate the configuration file. I didn’t know half of these signal
names and there were even more in Unicorn’s own SIGNALS
file.

And then there’s “preloading”: a feature of Unicorn that allows you to spin up
new worker processes in less than a second, a fraction of the time it takes to
boot up my Rails application. Somehow Unicorn is able to preload my
application in memory and make use of that when creating new worker processes.
And I had no idea how that works! Not a clue! And as if that wasn’t enough I
discovered that Unicorn even has a file called PHILOSOPHY in its repository.
Who else has that?! I was sure that there was some black magic going on.
Because: how could Unicorn work like it does without magic?

Unix

After my first encounter with Unicorn I learned quite a bit about Unix systems
and after a while I came back to Unicorn — still in amazement. But this time I
read through the source code and it turns out, that, well, the secret
ingredient to Unicorn is not magic but plain, old Unix.

Now, most people know Unix from a “user’s perspective”: the command line,
shells, pipes, redirection, the kill command, scripting, text files and so
on. But there’s this whole other side of Unix, too, which we could call the
“developer’s perspective” now. From this side of Unix you can see signal
handling, inter-process communication, usage of pipes without the
|-character, system calls and whole lot more.

In what follows we’re going to have a look at Unicorn. We’ll take it apart and
see that it’s just using some basic Unix tricks, the ones you can use as a
developer, to do its work. The way we’re going to do that is by going through
some of these Unix tricks, basic building blocks of every Unix system, and see
how they work and how Unicorn uses them.

At the end we’ll go back to the “magic” of the beginning: hot reload,
preloading, master-worker architecture. And we will see how these features work
and how they are just Unix and not magic.

So let’s get started.

fork(2)

fork is how processes are created. Every process after the first one (with PID 1)
was created with fork. So what is it, what is fork?

fork is a system call. Most of the time we can recognize system calls by the
2 behind their name (e.g. fork(2)) which means that we can find
documentation about them in section 2 of the Unix manual, nowadays known as
“man pages”. So in order to see the documentation for fork(2) you can run
man 2 fork on your command line.

But what’s a system call? A way to communicate with the kernel of our operating
system. System calls are the API of the kernel, if you will. We tell the kernel
to do something for our us with system calls: reading, writing, allocating memory,
networking, device management.

And fork is the system call that tells the kernel to create a new process. When
one process asks the kernel for a new process with fork(2) the kernel splits
the process making the call into two. That’s probably where the name comes
from: calling fork(2) is a “fork in the road” in the lifetime of a process.
As soon as the kernel returns control to the process after handling the system
call there now is a parent process and a child process. A parent can have a lot of
child processes, but a child process only one parent process.

And both processes, parent and child, are pretty much the same, right after the
creation of the child. That’s because child processes in a Unix system inherit
a lot of stuff from their parent processes: the data (the code it’s executing),
the stack, the heap, the user id, the working directory, open file descriptors,
the connected terminal and a lot more. This can be a burden (which is why
copy-on-write is a thing) but
also has some neat advantages — as we’ll see later.

So how do we use fork? Since (deep down) making a system call involves putting
parameters and the unique identifier of the call in CPU registers (which ones
may change depending on the architecture we’re working with) and firing a
software interrupt, most programming languages provide wrappers that do all the
work and allow us to not worry about which system call is identified by which
number.

Ruby is no exception here and allows us to use fork(2) with a method called,
well, fork:

What we’re doing here is calling fork in Ruby and pass it a block. This will
create a new process, a child process, and run everything inside the block in
the new process and then exit. In the parent process we call Process.wait and
pass it the return value of fork, which is the ID of the child process. We
also need to wait for child processes to exit because otherwise they’d turn
into zombie processes. Yep, that’s a valid Unix rule right there: parent
processes need to wait for their children to die so they don’t turn into
zombies.

As we can see, the child process has a new process ID and its parent process ID
matches the process ID printed in the parent process. And most interestingly
child_pid is nil inside the child process but contains a value in the
parent process. This is how we can check whether we are in the parent process
or the child process. Since the child inherits the data from the parent process,
both processes are running the same code right after fork and we can decide
which process does what depending on the return value of fork.

If we put a sleep somewhere inside the block, run it again and use a tool like
ps or pstree we’d see something like this:

So, what happens here? Unicorn calls this method with @worker_processes set
to the number of workers we told it to boot up. It then goes into a loop and
calls fork that many times. But instead of passing a block to fork, Unicorn
instead checks the return value of fork so see if its now executing in the
parent and in the child process. Remember: a forked process inherits the data
of the parent process! A child process executes the same code as the parent,
and we have to check for that in order to have the child do something else.

Passing a block to fork does the same thing under the hood, but explicitly checking
the return-value of fork is quite a common idiom in many Unix programs, since
the C API doesn’t allow passing blocks around.

If fork returned in the parent process, Unicorn saves the newly created
worker object with PID of the newly created child process in the WORKERS
hash constant, calls a callback and starts the loop again.

In the child process another callback is called and then the child goes into its
main loop, the worker_loop. If the worker loop should somehow return the child
process exits and is done.

And boom! We’ve now got 16 worker processes humming along, waiting for work in
their worker_loop, just by going into a loop, doing some cleanup and calling
fork 16 times.

That’s not too hard, is it? So let’s go from fork to another basic Unix
feature…

Pipes!

My guess is that most people even vaguely familiar with Unix systems know about
pipes and have probably done something like this at one point or another in
their lives:

$ grep ‘wat’ journal.txt | wc -l
84

Pipes are amazing. Pipes are a really simple abstraction that allows us to take
the output of one program and pass it as input to another program. Everybody
loves pipes and I personally think the pipe character is one of the most best
features Unix shells have to offer.

But did you know that you can use pipes outside of the shell?

pipe(2)

pipe(2) is a system call with which we can ask the kernel to create a pipe
for us. This is exactly what shells are using. And we can use it too, without a
shell!

Remember the saying that under Unix “everything is a file”? Well, pipes are
files too. One pipe is nothing more than two file descriptors. A file
descriptor is a number that points to an entry in the file table maintained by
the kernel for each running process. In the case of pipes the two file
table entries do not point to files on a disk, but rather to a memory buffer to
which you can write and from which you can read with both ends of the pipe.

One of the file descriptors returned by pipe(2) is the read-end and the other
one is the write-end. That’s because pipes are half duplex – the data only flows
in one direction.

Outside of the shell pipes are heavily used for inter-process
communication. One
process writes to one end, and another process reads from the other end. How?
Remember that a child process inherits a lot of stuff from its parent process?
That includes file descriptors! And since pipes are just file descriptors,
child processes inherit them. If we open a pipe with pipe(2) in a parent
process and then call fork(2), both the parent and the child process have
access to the same file descriptors of the pipe.

In Ruby we can use IO.pipe, which is a wrapper around the pipe(2) system call,
just like fork is a wrapper around fork(2), to create a pipe.

And in this example we create a pipe with IO.pipe and then create the child
process with fork. Since just after the call to fork both processes have
both pipe file descriptors we need to close the end of the pipe we’re not going
to need. In the child process that’s the read-end and in the parent it’s the
write-end.

We then write something to the pipe in the child, close the write-end and exit.
The parent closes the write-end, waits for the child to exit and then reads the
message the child wrote to the pipe. To clean up it closes the read-end. If we
run this we get exactly what we expected:

$ ruby pipe.rb
Received from child: 'Hello from your child!'

That’s pretty amazing, isn’t it? Just a few lines of code and we created two
processes that talk to each other! By the way, this is the exact same concept a
shell uses to make the pipe-character work. It creates a pipe, it forks (once
for each process on one side of the pipe) then uses another system call
(dup2) to turn the write-end of the pipe into STDOUT and the read-end into
STDIN respectively and then executes different programs which are now connected
through a pipe.

So how does Unicorn make use of pipes?

Unicorn and pipe(2)

Unicorn uses pipes a lot.

First of all, there is a pipe between each worker process and the master
process, with which they communicate. The master process writes command to the
pipe (something like QUIT) and the child process then reads the commands and
acts upon them. Communication between the master and its worker processes
through pipes.

Then there’s another pipe the master process only uses internally and not for
IPC, but for signal handling. It’s called the “self-pipe” and we’ll have a
closer look at that one later.

And then there’s the ready_pipe Unicorn uses, which is actually quite an
amazing trick. See, if you want to daemonize a process under Unix, you need to
call fork(2) two times (and do some other things) so the process is
completely detached from the controlling terminal and the shell thinks is the
process is done and gives you a new prompt.

What Unicorn does when you tell it to run as a daemon is to create a pipe,
called the ready_pipe. It then calls fork(2) two times, creating a grand
child process. The grand child process inherited the pipe, of course, and as
soon as its fully booted up and everything looks good, it writes to this pipe
that it’s okay for the grand parent to quit. The grand parent, which waited for
a message from the grand-child, reads this and then exits.

This allows Unicorn to wait for the grand child to boot up while still having a
controlling terminal to which it can write error messages should something go
wrong between the first call to fork(2) and booting up the HTTP server in the
grand child. Only if the everything worked the grand child turns into a real
daemon process. Process synchronization through pipes.

That does come pretty close to being magic, yep, but this is just a really
clever use of fork(2) and pipe(2).

sockets & select(2)

At the heart of everything that has to do with networking under Unix are
sockets. You want to read a website? You need to open a socket first. Send something
to the logserver? Open a socket. Wait for incoming connections? Open a socket.
Sockets are, simply put, endpoints between computers (or processes!) talking to
each other.

There are a ton of different sockets: TCP sockets, UDP sockets, SCTP sockets,
Unix domain sockets, raw sockets, datagram sockets, and so on. But there is one
thing they all have in common: they are files. Yes, “everything is file” and
that includes sockets. Just like a pipe, a socket is a file descriptor, from
which you can read and write to just like with a file. The sockets API for
reading and writing is deep down the same as the file API.

So, let’s say we are writing a server. How do we use sockets for that? The
basic lifecycle of a server socket looks like this:

First we ask the kernel for a socket with the socket(2) system call. We
specify the family of the socket (IPv4, IPv6, local), the type (stream,
datagram) and the protocol (TCP, UDP, …). The kernel then returns a file
descriptor, a number, which represents our socket.

Then we need to call bind(2), to bind our socket a network address and a
port. After that we need to tell the kernel that our socket is a server socket,
that will accept new connections, by calling listen(2). So now the kernel
forwards incoming connections to us. (This is the main difference between
the lifecycles of a server and a client socket).

Now that our socket is a real server socket and waiting for new incoming
connections we can call accept(2), which accepts connections and returns a new
socket. This new socket represents the connection. We can read from
it and write to it.

But here’s the thing: accept(2) is a blocking call. It only returns if the
kernel has a new connection for us. A server that doesn’t have too many
incoming connections will be blocking for a long time on accept(2). This
makes it really difficult to work with multiple sockets. How are you going to
accept a connection on one socket if you’re still blocking on another socket
that nobody wants to connect to?

This is where select(2) comes into play.

select(2) is a pretty old and famous (maybe infamous) Unix system call for working
with file descriptors. It allows us to do multiplexing: we can monitor
several file descriptors with select(2) and let the kernel notify us as soon
as one of them has changed its state. And since sockets are file descriptors too,
we can use select(2) to work with multiple sockets. Like this:

No, but seriously, this actually does a lot of stuff in just a few lines with
the help of system calls.

We create two sockets with Socket.new, which somewhere deep down in Ruby
calls socket(2). Then we bind the sockets to two different ports, 8888 and
9999 respectively, on the local interface. Afterwards we call listen(2)
(hidden by the #listen method) and tell the kernel to queue up 10 connections
at maximum for us to handle.

With our sockets ready to go we call fork 5 times, which in turn creates 5
child processes that all run the code in the block. So every child calls
IO.select (which is the wrapper around select(2)) with the two sockets as
argument. IO.select is going to block and only return if one of the two sockets
is readable (on a listening socket that means that there are new connections).
And this is exactly why we use select(2) here: with accept(2) we would block
on one socket and miss out if the other socket had a new connection.

IO.select returns the readable sockets in an array. We take the first one and
call accept(2) on it, which is now going to return immediately. Then we just
read from the connection, close the connection socket and start our worker loop
again.

If we run this and send some messages to our server with netcat like this:

Each connection handled by a different child process. Load balancing done by the
kernel for us, thanks to select(2).

Unicorn, sockets and select

Before master process calls fork to create the worker processes, it calls socket,
bind and listen to create one or more listening sockets (yes, you can configure
Unicorn to listen on multiple ports!). It also creates the pipes that will be
used to communicate with the worker processes.

After forking, the workers, of course, have inherited both the pipe and the
listening sockets. Because, after all, sockets and pipes are file descriptors.

The workers then call select(2) as part of their worker_loop with both
the pipe and the sockets as arguments. Now, whenever a connection comes in,
one of the workers’ call to select(2) returns and this worker handles the
connection by reading the request and passing it to the Rack/Rails application.

And here’s the thing: since the workers call select(2) not only with the sockets,
but also with the master-to-worker pipe, they’ll never miss a message from the
master while waiting for a new connection. And if there is a new connection,
they handle it, close it and then read the message from the master process.

That’s a really neat way to do load balancing through the kernel and to
guarantee that messages to workers are not lost or delayed too long while the
worker process is doing its work.

Signals

Let’s talk about signals. Signals are another way to do IPC under Unix. We can
send signals to processes and we can receive them.

$ kill -9 8433

This sends the signal 9, which is the KILL signal, to process 8433. That’s
pretty well-known and a lot of people have used this before (probably with
sweat running down their face). But did you know that pressing Ctrl-C and
Ctrl-Z in your shell sends signals too?

So what are signals? Most often they are described as software interrupts. If
we send a signal to the process, the kernel delivers it for us and makes the
process jump to the code that deals with receiving this signal, effectively
interrupting the current code flow of the process. Signals are asynchronous —
we don’t have to block somewhere to send or receive a signal. And there are a
lot of them: the current Linux kernel for example supports around 30 different
signals.

Sending signals is pretty good, and I’d bet we’ve all done it a bunch of times,
but what’s really cool is this: we can tell the kernel how we want our process
to react to certain signals. That’s called “signal handling”.

We have a few options when it comes to signal handling. We can ignore
signals: we can tell the kernel we don’t care about a signal and when the
kernel delivers an ignored signal to our process it doesn’t jump to any
specific code, but instead does nothing. Ignoring signals has one limitation
though: we can’t ignore SIGKILL and SIGSTOP, since there has to be a way
for an administrator to kill and stop a process, no matter what the developer
of that process wants it to do.

The second option is to catch a signal, effectively defining a signal
handler. If ignoring a signals means “Nope, kernel, don’t care about QUIT.”
then defining a signal action is telling the kernel “Hey, if I receive this
signal, please execute this piece of code here”. For example: a lot of Unix
programs do some clean-up work (remove temp files, write to a log, kill child
processes) when receiving SIGQUIT. That’s done by catching the signal and
defining an appropriate signal handler, that does the clean-up work. Catching
signals has the limitations that ignoring signals has: we can’t catch SIGKILL
and SIGSTOP.

We can also let the defaults apply. Each signal has a default action associated
with it. E.g. the default action for SIGQUIT is to terminate the process and
make a core dump. We can let that one leave it as it is, or
redefine the signal action by catching it. See man 3 signal
on OS X or man 7 signal on Linux for a list of the
default actions associated with each signal.

We use trap to catch a signal and pass it a block to define a signal action
that will be executed as soon as our process receives the signal. In this
example, we try to redefine the signal handler for SIGUSR1, SIGQUIT and
SIGKILL. The sleep statement gives us time to send the signals to our
process.

If we run this and then send signals to our process with the kill command like
this:

As we can see, the kernel delivered all of the signals to our process. On
receiving SIGUSR1 and SIGQUIT it executed the signal handlers, but, as I
said before, catching SIGKILL proved useless and the kernel killed the process.

You can probably imagine what we can do with signal handlers. One of the most
common things to do with custom signal handlers, for example, is to catch
SIGQUIT to do some clean-up work before exiting. But there are a lot more
signals and defining appropriate signal handlers can distinguish well-behaving
processes from rude ones. Example: if a child process dies the kernel notifies
the parent process by sending a SIGCHLD. The default action is to ignore the
signal and do nothing, but a well-behaving application would probably wait
for the child, clean up after him and write something to a log file.

Unicorn and signals

Unicorn sets up a lot of different signal handlers
in the master process, before it calls fork and spawns the worker processes.
These signal handlers do a lot of things. Here are a few examples:

QUIT — Graceful shutdown. The master process waits for the workers to finish their work (the current request), cleans up and only then exits.

TERM and INT — Immediate shutdown. Workers don’t finish their work.

USR1 — Reopen the log files. This is mostly used and sent by a logration daemon.

USR2 — Hot-Reload. Start up a new master process with a new version of the application and keep the old master running.

These signal handlers are like a separate API through which you tell the master
and worker processes what to do. And it’s pretty reliable too, considering the
fact that signals are essentially asynchronous events and can be sent multiple
times. This just screams for race-conditions and locks. So how does Unicorn do
it?

Unicorn uses a self-pipe to manage its
signal actions. The pipe the master process sets up is this self-pipe, which it will
only use internally and not to talk to other processes. It also sets up a queue data structure.
After that come the signal handlers. Unicorn catches a lot of signals, as we
saw, but each signal handler doesn’t do much. It only pushes the signal’s name
into the queue and sends one byte through the
self-pipe.

After setting up the signal handlers, spawning worker processes, and so on, the
master process goes into its main loop, in which it checks upon the workers
regularly and sleeps in between. But it doesn’t just sleep, no, the master
process actually goes to sleep by calling select(2) on the self-pipe, with a
timeout as argument. This way it can go to sleep but will be woken up as soon
as a signal arrived, since the signal handler just send a byte through the
pipe, turns it into a readable pipe (from the master’s perspective) and
select(2) now returns. After waking up, the master just has to pop off a
signal from the queue it set up in the
beginning and handle the signals one after another. This is of tremendous value
if you consider again that signals are asynchronous and you’ll never know what
you’re currently executing when a signal arrives, and that they can be sent
multiple times — even if you’re currently executing your signal handler code.
Using a queue and a self-pipe in this combination makes handling signals a lot
saner and easier.

Worker processes, on the other hand, inherit the master’s signal handlers –
again: child processes inherit a lot from their parents. But instead of leaving
them as they are, the workers redefine (most of) the signal handlers to be
no-ops. They get their signals through the pipe which connects them to the
master process. If the master process, for example, receives SIGQUIT it
writes the name of the signal to each pipe connected to a worker process to
gracefully shut them down. The worker processes call select(2) on this
master-worker pipe and the listening sockets, which means that as soon as they
finish their work (or don’t have anything to do) they will read the signal name
from the pipe and act upon it. This “signal delivery from master to worker via
pipe”-mechanism avoids the many problems that can occur if a worker process
should receive a signal while currently working of a request.

Magic?

By now we have looked at fork(2) and how easy it is to spawn a new process.
We saw that we can use pipes pretty easily outside a shell and without any use
of the pipe character by calling pipe(2) and just working with the two file
descriptors as if they were files. We also created sockets, worked with
select(2), looked at a pre-forking TCP server in 23 lines of Ruby and had
the kernel of our operating system do our load balancing for us. Then we saw
that Unicorn has its own API composed of signals and that it’s not that hard to
work with signals.

These were just some basic Unix concepts. Trivial on their own, powerful when
combined.

So, let’s have a closer look at these features of Unicorn that amazed me so
much, that I was sure were created by some wizards with long robes and tall
hats, in a basement far, far away, on old rusty PDP-11s.

Let’s see how this “magic” is just Unix.

Preloading

If we put preload = true in the configuration file, Unicorn will “preload”
our Rack/Rails application in the master process to spare the worker process
from doing it themselves. As soon as the application is preloaded, spawning off
a new worker process is really, really fast, since the workers don’t have to
load it anymore.

The question is: how does this work exactly? Let me explain.

Right after Unicorn has evaluated command line options, it
builds a lambda called app.
This lambda contains the instructions needed to load our Rack/Rails application
into memory. It loads the config.ru file (or uses default settings) and then
creates a Rack application with Rack::Builder, on which it calls #to_app.

So what should come out of the lambda is a Rack application in which we just
need to call #call to pass it a request and get a response. But since lambdas
are evaluated only as soon as they are called, this doesn’t happen when the
lambda is defined.

Unicorn passes this app lambda on to the Unicorn::HttpServer, which
eventually calls fork(2) to spawn the worker processes. But before it creates
a new process, the HttpServer checks if we told Unicorn to use preloading. If
we did, only then it calls the lambda. If we
didn’t, the workers would each call the lambda after the call to fork(2).

Calling the lambda, which hasn’t been called before, now loads our application
into memory. Files are being read, objects are created, connections established
– everything is somehow getting stored in memory.

And here comes the real trick: since the master loaded the application into
memory, which can take some time if we’re working with a large Rails
application, the worker processes inherit it. Yep, the worker processes inherit
our application. How neat is that? Since workers are created with fork(2)
they already have the whole application in memory as soon as they are created.
Preloading is just deciding if the Unicorn calls a lambda before or after the
call to fork(2). And if Unicorn called it before, creating new worker
processes is really fast, since they are basically ready to go right after
creation, except for some callbacks and setup work.

With copy-on-write, which works in the Ruby VM since 2.x, this is even faster.
The reason is that “inheriting” involves copying from the parent’s to the
child’s memory address space. It’s probably not as slow as you imagine, but
with copy-on-write only the memory regions which the child process wants to
modify are copied.

And the best part of it is this: the kernel is doing all the work for us. The
kernel answers the call to fork(2) and the kernel copies the memory. We just need
to decide when to create our objects: before or after the call to fork(2).

This comes in really handy when we now look at another great feature of Unicorn.

Scaling workers with signals

Unicorn allows us to increase and decrease the number of its worker processes by
sending two signals to the master process:

$ kill -TTIN 93821
$ kill -TTOU 93821

These two lines add and then remove a new worker process. The signals used,
SIGTTIN and SIGTTOU, are normally sent by our terminal driver to notify a
process running in the background when it’s trying to read from (SIGTTIN) or
write to (SIGTTOU) the controlling terminal. Since Unicorn doesn’t allow not
using a logfile when running as a daemon, this shouldn’t be an issue, which
means that Unicorn is free to redefine the signal actions (the default for both
signals is to stop the process).

It does so by defining signal handlers for SIGTTIN and SIGTTOU that, as we
saw, only add the name of the signal to the signal queue and write a byte to
the self-pipe to wake up the master process.

The master process, as soon as it wakes up from its main-loop sleep, sees the
signals and increases or decreases the internal variable worker_processes,
which is just an integer. And right before it goes back to sleep, it calls
#maintain_worker_count, which either spawns a new worker or writes SIGQUIT to
the pipe connected to the now superfluous worker process to gracefully shut it down.

So let’s say we send SIGTTIN to Unicorn to increase the number of workers.
What will happen is that the master wakes up (triggered by the write to the
self-pipe), increases worker_processes and calls #maintain_worker_count,
which in turn will call another method called #spawn_missing_workers. Yes,
that’s right. We looked at this method before, its the same one that’s used to
spawn the worker processes when booting up. In its entirety it looks like this:

Again, this is just a loop that calls fork(2) N times. Now that N is
increased by one, a new worker process will be created. The other calls to
fork are skipped by checking whether WORKERS already contains an instance
of Worker with the same worker_nr.

Take note of worker_nr here, it is important. All worker processes have a
worker_nr by which they are easily identified in the row of spawned
processes.

If we now send SIGTTOU to the master process, the following is going to
happen. First of all, the master is woken up by a fresh byte on the self-pipe.
Instead of increasing worker_processes now, it decreases it. And again, it
calls #maintain_worker_count, which doesn’t jump straight to
#spawn_missing_workers. Since no worker process is missing,
#maintain_worker_count now takes care of reducing the number of workers:

It may not be idiomatic Ruby, but these 3 lines are still fairly easy to
understand. The first line generates the difference between the number of
currently running worker processes and returns if it’s zero. If the difference
is negative, a new worker will be spawned (which is where the path of SIGTTIN
ends in this method). But since the difference is positive after decreasing
worker_processes, the master process now takes the workers with a worker_nr
that’s too high and calls soft_kill(:QUIT) on the worker instance.

This in turn sends the signal name through the pipe to the corresponding worker
process, which will catch that signal through select(2) and gracefully shut
down.

After this, the master process calls Process.waitpid (which in turn calls
waitpid(2)), which returns the PID of dead children (and doesn’t leave them
hanging as zombies). The worker process with this PID now just needs to be
removed from the WORKERS hash and Unicorn is ready to go again.

All of this is pretty simple: fork(2) in a loop, pipes, signal handlers and
keeping track of numbers. Again: it’s the combination of that makes these Unix
idioms so powerful.

The same can be said for my favorite Unicorn feature.

Hot Reload

This fantastic feature has many names: hot reload, zero downtime deployment,
hot swapping and hot deployment. It allows us to deploy a new version of our
application, while the old one is still running.

With Unicorn “hot reload” means, that we can spin up a new master process, with
new worker processes serving a new version of our application, while the old
master process is still running and still handling requests with the old
version.

It’s all triggered by sending a simple SIGUSR2 to the master process. But how?

Let’s take a step back and say that our Unicorn master and worker processes are
just humming along. The master process is sleeping, waking up, checking up on
the workers and going back to sleep. The worker processes are handling requests
without a care in the world. Suddenly a SIGUSR2 is sent to the master
process.

Again, the signal handler catches the signal, pushes the signal onto the signal
queue, writes a byte to the self-pipe and returns. The master wakes up from its
main-loop-slumber and sees that it received SIGUSR2. Straight away it calls
the #reexec method. It’s a fairly long method
and you don’t have to read through it now. But most of “hot reload” is
contained in it, so let’s walk through it.

The first thing the method does it to check if the master process is already
reexecuting (reexecuting means that a new master process is started by an old
one). If it is, it returns and its job is done. But if not, it writes the
current PID to /path/to/pidfile.pid.oldbin. .oldbin stands for “old
binary”. With the PID saved to a file, the master process now calls fork(2),
saves the returned PID of the newly created child process (to later check if
it’s already reexecuting…) and returns. The old master process adds “(old)”
to its process name (by changing $0 in Ruby) and is now done with #reexec.
But since a process created with fork(2) is executing exactly the same code,
the new child process goes ahead with #reexec.

Right after the call to fork(2) the child writes the numbers of the sockets
it’s listening on (remember: sockets are files, files are represented as file
descriptors, which are just numbers) to an environment variable called
UNICORN_FD as one string, in which the numbers are separated by commas. (Yes,
it keeps track of listening sockets by writing to an environment variable. Take
a deep breath. It’ll make sense in a second.)

Afterwards it modifies the listening sockets so they stay open by setting the
FD_CLOEXEC flag on them to false.

It then closes all the other file descriptors it doesn’t need (e.g.: sockets
and files opened by the Rack/Rails application).

With all preparations and cleaning done, the child process now calls execve(2).

The execve(2) system call turns the calling process into a completely
different program. Which program it’s turned into is determined by the
arguments passed to execve(2): the path of the program, the arguments and
environment variables. This is not a new process we’re talking about: the new
program has the same process ID, but its complete heap, stack, text and data
segments are replaced by the kernel.

This is how we can spawn new programs on a Unix system and what every Unix
shell does when we try to launch Vim: it calls fork(2) to create
a child process and then it calls execve(2) with the path to the Vim
executable. Without the call to execve(2) we’d end up with a lot of copies of
the original shell process when trying to start programs.

That’s also why Unicorn needs to set the FD_CLOEXEC flag to false on the
sockets before it calls execve(2). Otherwise the sockets would get closed,
when the of the process is being replaced.

Unicorn calls execve(2) with the original command line arguments it was
started with (it keeps track of them), in effect spawning a fresh Unicorn
master process that’s going to serve a new version of our application. Except
that it’s not completely fresh: the environment variables the old master
process set (UNICORN_FD) are still accessible by the new master process.

So the new master process boots up and loads the new application code into
memory (preloading!). But before it creates worker processes with fork(2), it
checks the UNICORN_FD environment variable. And it finds the numbers of our
listening sockets! And since file descriptors are just numbers, it can work
with them. It turns them into Ruby IO objects by calling IO.new with each
number as an argument and has thereby recovered its listening sockets.

And now it calls fork(2) and creates worker processes which inherit these
listening sockets again and can start their select(2) and accept(2) dance
again, now handling requests with the new version of our application.

There is no “address already in use” error bubbling up. The new master process
inherited these sockets, they are already bound to an address and transformed
into listening sockets by the old master process. The new master process and
its workers can work with them in the same way the worker processes of the old
master process do.

Now there are two sets of master and worker processes running. Both are
handling incoming connections on the same sockets.

We can now send SIGQUIT to the old master process to shut it down and as soon
as it exits the new master process takes over and only our new application
version is being served. And all of this happened without the old worker
processes stopping their work once.

It’s just Unix

All of this is just Unix. The master-worker architecture, the signal handling,
the communication through pipes, the preloading, the scaling of workers with
signals and the hot reloading of Unicorn. There is no magic involved.

I think that’s the most amazing part about all of this. The combination of
concepts like fork, pipe and signals, that are easy to understand on their
own, and leveraging the operating system is where the perceived magic and
ultimately the power of great Unix software like Unicorn comes from.

Why?

You might be thinking: “Why? Why should I care about this low-level stuff? I
build web applications, why should I care about fork and select?

I think there are some really compelling reasons.

The first one is debugging. Have you ever wondered why you shouldn’t open a
database connection (a socket!) before Unicorn calls fork(2)? Or why you get
a “too many open files” error when you try to make a HTTP request (sockets!)?
Now you know why.

Knowing how your system works on each layer of the stack is immensely helpful
when trying to find and eliminate bugs.

The next reason I call the design and architecture reason and boils down to
having answers to questions like these: should we use threads or processes? How
could these processes talk to each other? What are the limitations? What are
the benefits? Will this perform? What’s the alternative?

With some understanding of your operating system and the APIs it offers, it’s
far easier to make architectural decisions and design choices when building a
system or single components of it.

One more level of abstraction. Someone somewhere at some time said that “it’s
always good to know one more level of abstraction beneath the one you’re
currently working on” and I totally agree.

I like to think, that learning C made me a better Ruby programmer. I suddenly
knew what was happening behind the curtains of the Ruby VM. And if I didn’t
know, I could make a good guess.

And I think that knowing deeply about the system to which I deploy my (web)
application makes me a better developer, for the same reasons.

But the most important reason for me, which is a personal one, is the
realization that everything Unicorn does is not magic! No, it’s just Unix and
there is no secret ingredient. Which, in turn, means that I could write
software like this. I could write a webserver like this! Realizing
this is worth a lot.

]]>2014-10-13T17:40:00+00:00http://thorstenball.com/blog/2014/10/13/why-threads-cant-forkThere is an interesting thread on the Go issue tracker about
daemonizing processes. Most of the thread is not about daemonizing
processes though, but more about why Go has no Fork() function which you can call
directly in your code. The first time I read through it I was wondering and
saying to myself: “Yeah, why is there no Fork()? It surely can’t be that hard
to implement.” After all you can already call system calls with the
syscall package. As I read more and more I realized that the
problem is not implementing Fork() per se, but rather implementing Fork()
to work safely in a multi-threaded environment, which most Go programs are. So
I tried to find out why.

And it turns out that the problem stems from the behaviour of fork(2) itself.
Whenever a new child process is created with fork(2) the new process gets a
new memory address space but everything in memory is copied from the old process
(with copy-on-write that’s not 100% true, but the semantics are the same).

If we call fork(2) in a multi-threaded environment the thread doing the call
is now the main-thread in the new process and all the other threads, which ran
in the parent process, are dead. And everything they did was left exactly as it
was just before the call to fork(2).

Now imagine that these other threads were happily doing their work before the
call to fork(2) and a couple of milliseconds later they are dead. What if
something these now-dead threads did was not meant to be left exactly as it
was?

Let me give you an example. Let’s say our main thread (the one which is going
to call fork(2)) was sleeping while we had lots of other threads happily
doing some work. Allocating memory, writing to it, copying from it, writing to
files, writing to a database and so on. They were probably allocating
memory with something like malloc(3). Well, it turns out that malloc(3)
uses a mutex internally to guarantee thread-safety. And exactly this is the problem.

What if one of these threads was using malloc(3) and has acquired the lock of
the mutex in the exact same moment that the main-thread called fork(2)? In
the new child process the lock is still held - by a now-dead thread, who will
never return it.

The new child process will have no idea if it’s safe to use malloc(3) or not. In
the worst case it will call malloc(3) and block until it acquires the lock,
which will never happen, since the thread who’s supposed to return it is dead.
And this is just malloc(3). Think about all the other possible mutexes and
locks in database drivers, file handling libraries, networking libraries and so
on.

In order to call fork(2) in a safe way the calling thread would need to be
absolutely sure that all the other threads are to fork too. And this is hard,
especially if you’re going to implement a wrapper around fork(2) in a library
and have no idea what’s going to be happening all around you.

If the new child process is going to be turned into a different process with
execve(2) the problem is not that big, since the heap, stack and data will be
replaced. That’s why there is a os.StartProcess() in Go,
which uses fork(2) under the hood (see line 65
here). There is still the problem of open file
descriptors, which the new child process will inherit but were intended to be
used only a now-dead thread. But it’s still possible to close them up, since
the new child process would have direct access.

Now you might realize that the title of this post is a lie, since threads
can fork. But in practice it’s really hard to pull off, which explains why
the Go issue mentioned at the beginning is nearly 5 years old.

There are of course a couple of attempts to provide a solution.
[pthread_atfork(3)][http://linux.die.net/man/3/pthread_atfork] allows users
to register handlers in threads to be called right before and after fork. But
as you can imagine, this can be cumbersome too. Solaris has forkall(2), which
does not kill the non-forking-threads but keeps them alive and doing exactly
what they did before. This behaviour comes with its own share of problems:

if a thread calls forkall(), the parent thread performing I/O to a file is
replicated in the child process. Both copies of the thread will continue
performing I/O to the same file, one in the parent and one in the child,
leading to malfunctions or file corruption.

To conclude: yes, the title is a lie, and yes, you can fork(2) in a
multi-threaded environment, but it is really, really difficult to pull off
safely. So let’s just say that threads can’t fork and leave it at that.

]]>2014-06-13T14:52:00+00:00http://thorstenball.com/blog/2014/06/13/where-did-fork-goA couple of days ago I was playing around with strace and bash on a
Linux box. My goal was to get a better understanding of Unix shells and how
they operate at a systems level. Happily launching commands inside the bash
process and watching the output of strace it suddenly struck me: “Wait! Where
is fork? It’s a system call! strace is supposed to show this! Where is it?”
Nowhere in the strace output was a call to fork(2) to be found.

I was really confused and my curiosity was spiked. So I spent the next couple
of hours searching for an explanation — and I found one, which I think is worth
sharing. But first of all let me explain why I was confused.

fork and execve in the Unix environment

Processes in Unix environments are based on a pretty simple idea: the
combination of fork(2) and execve(2).

Every process running on a Unix system started out as a call to fork(2)
followed by a call to execve(2). Well, not every process, since the first
process, the init process, the one that starts up the rest of the operating
system, didn’t. But every process that came after.

The idea is rather simple: fork(2) creates a new process and execve(2) turns
the new process into the kind of process you want it to be.

Let’s say you’re a shell and your user wants to start his productivity utility
vimwonderhorse. Now, the first thing you’ve got to do is to start a new
process. The reason for that is simple: when the user quits vimwonderhorse
you should still be there and wait for the user’s input again. If you, as the
shell, would have changed into vimwonderhorse and the user quit, well, then
you would be gone, too. So you start a new process with fork(2).

A new process started with fork(2) is a replica (ignoring some details here)
of the process making the call: the same instructions, the same open file
descriptors, the same working directory and the so on. Only the PID (and the
parent PID) and your memory address space have changed. But since the user
wanted to start another program it’s not that useful to now have two shell
processes. And that’s what execve(2) is for.

A call to execve(2) transforms the calling process into the executable
specified in the arguments to execve(2). execve(2) never returns, except
something goes wrong. That means that once you execve(2) into something else,
you can’t go back.

So, breaking it down, it’s just this: fork(2) to create a new process and
then execve(2) to turn the newly-created process into the process you want it
to be. Normally you would close open and unneeded file descriptors and clean up
other things between fork(2) and execve(2).

All this snippet really does is creating a new process with fork(2) and using
execve(2) to turn it into ls. We set up a few variables to help us and then
start right away: we call fork(2), which returns the PID of the newly created
process (in the parent process) and 0 in the created process itself, the child
process. That sounds funny, but actually makes a lot of sense when you think
about the fact that fork(2) does not much more than duplicating the current
process.

After the call to fork(2) the parent and the child process run the same code.
To find out in which process we are in we need check the return value of
fork(2). In the child process we call execve(2) to turn the process into
ls with some arguments (effectively running ls -l .). In the parent process
we call wait(2) to check the exit code of the child and to not leave a zombie
process.

I left out some lines that are not relevant here (and which basically show
memory allocation, loading of shared libraries and the internals of ls).

But we can see what’s happening. A new process is created, the parent process
calls wait for the child, the child execves into /bin/ls, which writes the
contents of the working directory to STDOUT and then exits. After the child
exits the call to wait in the parent returns and the parents writes the status
message to STDOUT.

So, where is fork? We explicitly called fork() in the code, which
is supposed to be a system call but it’s nowhere to be seen. Come to speak of
it, where is wait(2)? wait4 shows up, yes, but that’s not what we called.

fork, clone, library and system calls

It turns out, that when we call fork() in our code, we don’t actually call
the system call fork(). Instead, we call a library function in the C standard
library (yes, called fork()) that is a small wrapper around the system call.

The top answer to this post on Stack Overflow explains in
detail and with links to the relevant parts of the glibc source that the
fork(2) system call we use in our code is actually a wrapper in glibc that
calls the clone(2) system call. (The same goes for wait(2) — see code
here)

Even the man page for fork(2) explains this:

Since version 2.3.3, rather than invoking the kernel's fork() system
call, the glibc fork() wrapper that is provided as part of the NPTL threading
implementation invokes clone(2) with flags that provide the same effect as the
traditional system call. (A call to fork() is equivalent to a call to clone(2)
specifying flags as just SIGCHLD.) The glibc wrapper invokes any fork
handlers that have been established using pthread_atfork(3).

If we use ltrace, instead of strace, which traces library calls instead of
system calls, we can see this happening:

We could stop here and conclude by saying that reading man pages is a wise and
noble thing to do and nobody should speak about anything without checking the
man page for it. But, here’s the thing: digging deeper provides some really
interesting information about processes in the Linux environment. So let’s do
that and get back to the topic at hand.

Why does glibc do that? Why call clone(2) instead of fork(2)? And why
does it wrap system calls in library functions?

After digging around a bit I found out that making a system call is
actually harder than just calling fork() somewhere in my code. I’d need to
know the unique number of system call I was about to make, set up registers,
call a special instruction (which varies on different machine architectures) to
switch to kernel mode and then handle the results when I’m back in user space.

By providing a wrapper around certain system calls glibc makes it a lot easier
and portable for developers to use system calls. There is still the possibility
to use syscall(2) to call system calls somewhat more directly.

So why does fork() in glibc call clone(2) instead of just being a wrapper
for the fork system call? The reason for that is the implementation of
threads and processes in Linux. Processes are just “fat” threads. Under the
hood they don’t differ too much, at least from the kernel’s point of view. The
main difference is that instead of sharing a memory address space with other
processes, each process gets its own. Of course, this is a simplified idea of
what is actually going on, but what it boils down to is this: threads are
lightweight processes that can be created with clone(2).

In contrast to fork(2), which takes no arguments, we can call clone(2) with
different arguments to change which process will be created. Do they need to
share their execution context? Memory? File descriptors? Signal handlers?
clone(2) allows us to change these attributes of newly created processes.
This is clearly much more flexible and powerful than fork(2), which creates
the “fat processes” we can see when we run ps.

The functionality fork(2) provides is covered by clone(2). So the Linux
kernel uses clone(2) to implement fork(2) to not break the API and to
centralize the creation of processes in a single system call.

And that is the reason why strace won’t show fork(2): calling fork(2)
uses the wrapper provided by glibc, which uses clone(2) to create a process.

]]>2014-03-12T08:00:00+00:00http://thorstenball.com/blog/2014/03/12/watching-understanding-ruby-2.1-garbage-collectorThe most common way to check up on Ruby’s Garbage Collector (GC) is probably
calling GC.stat, which returns a hash of of information about the current
state of the GC. Since version 2.1 Ruby comes with a generational GC and the
output now contains a lot more information than in previous version. Let’s have
a look at its output:

As we can see, the output of GC.stat in Ruby 2.1.1 contains a lot more
information than in previous versions. But, to be honest, it had little to no
meaning to me until I had a better grasp on how Ruby manages memory and how the
GC in Ruby 2.1 works. So let’s have a look at these topics and let me show you
what I found out about them before coming back to the specifics of GC.stat.

Ruby’s Memory Management and Garbage Collector

Disclaimer: I am not an expert on Garbage Collection nor on Ruby’s internals.
What follows is what I learned by reading about Ruby’s memory management, its
garbage collector, garbage collection in general and by digging through the
Ruby source code (especially gc.c). Please correct me if anything is wrong.
I’m happy about feedback and eager to learn more.

The first thing we need to know about Ruby’s memory management is that it
exists. Ruby manages the memory and not the programmer. There is no way, at
least none that I know of, to manually allocate memory as in C with
malloc(2). Ruby does that for us. It asks the kernel for memory, it
initializes the memory and we, as programmers, do not have to care about it.

The memory Ruby manages and to which we have access is organized on the Ruby
heap, which itself is split up into slots. Each slot is big enough to hold
one Ruby object. In the context of Ruby’s memory management and GC an object
is represented as a simple struct called RVALUE. Each slot on the Ruby heap
has the size of one RVALUE, which is 40 bytes.

GC::INTERNAL_CONSTANTS[:RVALUE_SIZE]# => 40

Now, whenever we create a new object Ruby doesn’t immediately ask the kernel
for 40 bytes of memory. Instead Ruby already has a large enough heap to hold a
lot of objects. Whenever we do something like this:

foobar=MyFoobar.new

Ruby doesn’t need to allocate more memory, at least in the majority of cases.
Chances are Ruby still has free slots on its own heap to hold our newly created
object. Ruby manages its memory in such a way that it doesn’t need to ask the
Kernel for new memory every time a new instance of an object is created.

But eventually, after creating enough objects, there are no free slots left and
this is the point where the GC kicks in. The GC checks which of the
slots/objects are not being referenced by other objects anymore and frees them.
Freed slots are then ready to be used again as slots for newly initialized
objects. If there are still not enough slots for new objects, even after
freeing unreferenced slots and making them available again, only then does
Ruby ask the kernel for more memory.

That’s the basic and super simplified model of how Ruby manages memory and we
need to keep that in the back of our heads when we now take a closer look at
the GC in Ruby 2.1.

Unlike in previous versions the GC in Ruby 2.1 is a Generational
GC
that uses a mark-and-sweep strategy to maintain the Ruby heap. The
implementation of the current GC in Ruby is called RGenGC and was developed by
Koichi Sasada as part of the Ruby Core team at Heroku.

In a simple world without implementation details and edge cases the basic
strategy behind a mark-and-sweep garbage collector is to traverse the object
graph, the Ruby heap in our case, and check which objects are still in use and
which ones are not. The unused objects are being freed, making their memory
available to us again, and the used objects are kept where they are.

Starting at a object that is known to be referenced the GC traverses along
every reference to the other objects on the heap. Every time the GC comes across
an object, which means that the object is still being referenced, since the GC
traverses along references, it marks the object (by switching a bit on its
underlying structure, for example) and moves on until it can find no more
references.

After the mark-phase comes the sweep-phase: the GC goes through the whole
heap again and “sweeps” away every object that is not marked and frees it.
Marked objects are being unmarked, enabling the next cycle of garbage
collection.

A generational GC which uses mark-and-sweep (a generational GC does not
necessarily have to use a mark-and-sweep algorithm) works basically in the same
way, but implements some other ideas in order to speed up the traversal of
objects.

The main assumption behind a generational GC is this one: most objects die
young. In more words: it is much more likely that young objects (created since
the last GC run) are referencing old objects (which survived the last GC run)
than the other way around. Only a small amount of new objects need to be marked
and not swept way, which are the objects that are being referenced by old
objects.

Based on that the GC can save itself a lot of time and useless work when it
concentrates on the young objects for the majority of its time, since these are
the ones that most likely need to be collected, thus yielding the greatest
benefit for a GC run. Traversing old objects would not result in the same
amount of freed memory.

So, a generational GC needs two modes. In Ruby’s case they are called minor and
major GC runs. In a minor GC run the GC only traverses the young objects and in
a major GC run it traverses the whole object graph, including the old
generation. A minor GC should be faster than a major GC and is typically run
more often.

In order to classify objects as new or old the GC does the following: whenever
it marks an object in a mark-phase (which means that the object will survive
this GC run) it promotes it to the old generation. Unmarked objects are swept
away.

In the next minor GC run the GC can now ignore the old generation and only
traverse the young generation.

But there is one problem: Imagine an old object starts referencing a new object
in between minor GC runs. In its next run the GC (since it traverses from
object to object along their references) will not “mark” this newly created
object, since it ignored the old generation and their references and never came
across this new object.

The fix to this problem involves a “remember set” and a “write-barrier”. And
here is how it works: around every object is put up a write-barrier, through
which every write operation on the object has to pass. And the important part
is this: adding a reference to an object is also a write operation.

This snippet of C code should clarify how adding a reference is a write-operation:

old_array[0]=&new_object

&new_object is the memory address of new_object and that line simply says:
“Save (write) the memory address of new_object as the first element of
old_array”.

With a write-barrier around objects the GC can now detect references from old
objects to newly created objects. Whenever it does that, it adds a reference to the
old object to the remember set.

Without the remember set the GC previously skipped the old generation in minor
GC runs and failed to “mark” new objects that are being referenced by old
objects. But with the write-barrier and the remember set the GC can now
traverse the young generation AND the old objects in the remembered set and not
miss references from old to new objects.

That’s the theory. Now let’s switch back to the real world, you know, the one with
implementation details and legacy code.

When trying to implement a generational GC for Ruby the developers came across
a problem with the write-barrier. It is a huge undertaking to put up effective
write-barriers around objects while Ruby C extensions have low-level access to
their memory addresses. A possible solution are write-barriers on the C level
(e.g. in the form of macros for pointer access) but that not only entails
rewriting the internal C API used by Ruby itself but also means that a lot of C
extensions would need to be rewritten to make use of the new API or be
deprecated. So they came up with a better solution: shady objects.

At the time of creation an object in Ruby 2.1 is either classified as sunny
or shady. Sunny are objects protected by write-barriers and shady objects are
not.

With the distinction between shady and sunny objects at hand the minor GC run
of RGenGC now works slightly different compared to the “simple” theory described
above. Whenever the GC comes across a shady object while traversing it “marks”
but does not promote it to the old generation. The reason for this new behavior
is the missing write-barrier. Promoting a shady object to the old generation
would result in missing references from old to new objects. And the whole
purpose of shady objects is to not miss those references in the first place.
Instead the GC checks if the shady object is referenced by an old object and if
that is the case it adds the shady object to the remember set.

An object that was created as sunny doesn’t have to stay sunny. The GC can also
“shade” objects. Shading an object means turning them from sunny to shady
objects, demoting them from the old generation, and adding them to the remember
set. When does that happen? Whenever low-level access to the memory address of
the object is gained through the C API, which the Ruby Virtual Machine can
detect. After the user of the C API has the pointer to the object an effective
write-barrier is not possible anymore, which would result in missing references
from old to new objects, so the object gets “shaded”.

Instead of solely relying on write-barriers around all objects to detect
references, RGenGC adds shady objects to the remember set when it can’t tell
whether that object is referencing a new object.

In its next run the GC can now traverse the young generation AND objects in the
remember set. The set now contains old objects referencing new objects and
shady objects referenced by old objects. Traversing through this set the GC
should not miss to mark newly created objects.

And that is basically how RGenGC in Ruby 2.1 works. Now, that was a lot to take
in, even though this is a really simplified description of the implementation.
But it helps us tremendously when we now go back to the output of GC.stat.
You’ll see, it will make more sense now.

Analyzing GC.stat

Here is the output of GC.stat again, straight from a fresh irb process, so
we don’t have to scroll back up again. I also reordered it, making it easier to
explain what each key and value means.

Now, with our knowledge about Ruby’s generational Garbage Collector and its
memory management, let’s go through the output and see what each line means.

GC.stat[:count]

This one is pretty much self-explanatory. :count is the number of GC runs,
major and minor combined.

GC.stat[:minor_gc_count]

The number of GC runs that only traversed the young generation of objects and
the objects in the remember set.

GC.stat[:major_gc_count]

The number of GC runs that traversed the whole Ruby heap, including old,
young and remembered objects.

GC.stat[:total_allocated_object]

The total number of objects Ruby has allocated in the lifetime of the
current process.

GC.stat[:total_freed_object]

Number of freed objects in the lifetime of the current process.

GC.stat[:heap_length]

There’s something I didn’t mention before: Ruby’s heap is not only organized
in slots, where each slot holds an object, but also into pages. A Ruby heap page
holds a specific number of slots.

GC::INTERNAL_CONSTANTS[:HEAP_OBJ_LIMIT]# => 408

Each Ruby heap page holds 408 slots (further up we saw that the size of one
slot is 40 bytes, which means that the page size is around 16kb).
GC.stat[:heap_length] returns the number of pages the current Ruby process has
allocated. Remember: allocated memory here does not mean that it is in use,
since Ruby manages the memory for us and may hold the allocated memory back for
when times are tough and we run out of memory and so on, which leads us to…

GC.stat[:heap_used]

This is the number of heap pages that are currently in use. Either filled
with live objects or free slots.

GC.stat[:heap_eden_page_length]

Ruby separates its heap into “Eden” and “Tomb”. Eden is the part of the heap
where pages reside that contain (at least one) live objects. Tomb are only
pages that contain no live objects but are there to be used when Eden runs out
of space. :heap_eden_page_length is the number of pages in the “Eden” part of
the Ruby heap.

GC.stat[:heap_tomb_page_length]

The counterpart of :heap_eden_page_length. This is the number of pages that
do not contain live objects. Since the Ruby heap is divided into “Eden” and
“Tomb”, together they make up the heap: the sum of :heap_tomb_page_length and
:heap_eden_page_length equals :heap_used.

GC.stat[:heap_live_slot]

The number of objects that survived all the GC runs in the past and are still
alive. We can calculate this number ourselves:

GC.stat[:heap_free_slot]

Number of allocated but unused/free slots on the Ruby heap.

GC.stat[:heap_final_slot]

I am not too sure about this number but after reading through Ruby’s gc.c
my best guess is that it is the number of slots that have a finalizer that
still needs to be run, which makes the Ruby VM consider the slot as still
being used. The relevant piece of code in gc.c that gave me this idea are
these 5 lines:

When the object is swept away by the GC the finalizer is run. And I’ve played
around with that a lot but I couldn’t get GC.stat[:heap_final_slot] to return
something other than 0. I’m happy about feedback and suggestions here.

GC.stat[:heap_swept_slot]

This gets reset before every page sweep (Ruby sweeps heap pages one by one) to
zero. After the sweep it gets incremented by the number of freed and already empty
slots in the swept page.

GC.stat[:heap_increment]

The number of pages that get added to the Ruby heap if it needs expanding.
This number is dynamically adjusted whenever the Ruby heap needs to grow with
this formula:

GC.stat[:remembered_shady_object]

Number of shady objects in the remember set. This gets reset with every
major GC run.

GC.stat[:remembered_shady_object_limit]

If remembered_shady_object crosses this limit a major GC run is triggered.
The number is dynamically adjusted after each major GC run with this formula:

remembered_shady_object_limit = factor * remembered_shady_object

The factor is 2.0 by default.

GC.stat[:old_object]

The number of old generation objects.

GC.stat[:old_object_limit]

If old_object crosses this limit a major GC run is triggered. It is
dynamically adjusted in the same way that remembered_shady_object_limit is,
using the same factor and formula.

GC.stat[:malloc_increase]

Not every object fits into a 40 byte Ruby slots and needs more memory (e.g.
long strings). Objects that need more memory can get it with Ruby’s internal
wrapper of malloc(2) and use it as their own buffer. Whenever the wrapper
is called, :malloc_increase gets incremented by the number of newly allocated
memory. And whenever that memory is freed the size of :malloc_increase is
reduced by the size of the freed memory. So :malloc_increase basically
reflects the current size of memory allocated besides Ruby heap slots.

GC.stat[:malloc_limit]

If :malloc_increase crosses this limit a minor GC is triggered. It is
dynamically adjusted before every sweep with this formula:

malloc_limit = factor * malloc_increase

The factor is 1.4 by default.

GC.stat[:oldmalloc_increase]

This is the old generation counterpart of :malloc_increase: the size of
currently additional memory allocated by old objects.

GC.stat[:oldmalloc_limit]

If :oldmalloc_increase crosses this limit a major GC is triggered. It is
dynamically adjusted before every sweep with this formula:

oldmalloc_limit = factor * oldmalloc_increase

The factor is 1.2 by default.

It does make more sense now, doesn’t it? But watching and understanding the
Ruby GC normally goes hand in hand with some tuning experiments which might
make this whole matter easier to grasp. So let’s have a look at some
environment variables.

Environment variables

Ruby allows users to tweak its GC via environment variables. And since
2.1 there are a lot more variables to use than in previous versions. Ruby 2.1.1
even has one more than 2.1. The following is a list of GC tuning variables
straight from Ruby 2.1.1’s
gc.c:

Again, let’s go through each one and see how settings it effects the GC.

RUBY_GC_HEAP_INIT_SLOTS

The initial number of slots Ruby allocates on its heap. Default value is
10000. Increasing this number to cover the live objects after a process is
fully booted can reduce the number of GC runs when booting and thus the boot
time:

RUBY_GC_HEAP_FREE_SLOTS

The minimum number of free slots that should be available after a GC run.
Default value is 4096.

RUBY_GC_HEAP_GROWTH_FACTOR

The factor by which the size of the heap grows when it needs to be expanded.
Default value is 1.8. This has a direct influence on
GC.stat[:heap_increment] since it is the relevant factor for heap resizing.

But it does make sense to lower this number when RUBY_GC_HEAP_INIT_SLOTS is
already really high, since the memory consumption might be go through the roof.

RUBY_GC_HEAP_GROWTH_MAX_SLOTS

The maximum number of slots on the Ruby heap. Default value is 0, which
means the feature is disabled. But if the number is higher than 0 it sets
the maximum number of slots Ruby is allowed to add to its heap at once.

The higher the number the less GC runs does Ruby need for its heap to grow to the
needed size:

RUBY_GC_OLDMALLOC_LIMIT_MAX

RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR

The growth factor by which GC.stat[:oldmalloc_limit] is resized before
every sweep. Default value is 1.2.

Words of Caution

And that’s it. Please keep in mind that the output of GC.stat and the GC
environment variables are easily a subject to change while the Ruby core team
is working on the GC. Ruby 2.2 is supposed to have a three-generation GC, which
means that at least GC.stat will probably change.

Advice, recommendations, corrections or thanks are welcome! Leave a comment,
send me an email to me@thorstenball.com or ping me on Twitter
@thorstenball.

Essential Resources

If you want to learn more about Ruby’s GC and memory management here is a list
of resources I found invaluable while researching for this blog post:

]]>2013-08-11T16:41:00+00:00http://thorstenball.com/blog/2013/08/11/named-pipesA lot of people know and love Unix pipes, myself included, since they let you do
stuff like this: cat access.log | awk '{print $9}' | sort | uniq -c. What a
lot of people don’t know are Named Pipes,
which are pretty interesting and worth knowing about.

In contrast to “unnamed pipes” (|) a named pipe has a file name on your file
system and can be accessed by independent processes that were not spawned by the
same parent process. To create a named pipe use mkfifo(1):

The p in the left column of the ls -l output indicates that my_named_pipe
is a pipe. You can change the permission bits as with any other file.

Using it is straightforward, just write something to it in one process and read
from it in another. In the first shell:

$ cat access.log | awk '{print $9}' > my_named_pipe

As you can see, this will block until another process reads from the pipe. So
open up another shell and read from it:

$ sort my_named_pipe | uniq -c

The first process will exit when it’s done writing and sends EOF. The second
process stops when it sees the EOF and exits.

Even if you have never used a named pipe that’s been created with mkfifo(1),
you might have used one without knowing about it, since shells (at least Bash
and ZSH) use named pipes whenever they encounter command substitution:

$ diff <(ls ./old/) <(ls ./new/)

The shell will spawn two subshells here, running ls ./old/ and ls ./new/
respectively, redirecting their output to two named pipe it creates and names.
It then passes the name of the pipes to diff(1), which expects filenames as
arguments.

Communication Between Processes

What else can we do with named pipes? Since they can be read from and written to
by independent processes, we can use them for communicating between them. Of
course, we can write to and read text from them, yes. But since writing or
reading from the named pipes will block until the other end is doing
something, we can use that behaviour for means of communication.

Imagine one process waiting for another process to finish. Set up a named pipe
and have it read from it:

This reminds me of using dedicated quit channels in Golang to signal other
goroutines when to quit. The cool thing about this is that multiple
processes can wait on one process. Or multiple processes can write to the
named pipe and just one is reading.

Sharing Terminal Output With script(1), netcat And Named Pipes

Let’s use a named pipe to do a “terminal screen sharing” session, by using
script(1). script(1) allows us to record a shell session by writing a
typescript to a file. We will use a named pipe instead of a file. So open up the
first shell and type in the following:

$ mkfifo screenshare
$ script -t 0 screenshare

Again, this will block. So let’s read from the pipe in another shell:

$ cat screenshare

The first shell shouldn’t block anymore and open a new script(1) session.
Switch to it and type something in, e.g. ls -l. You should now see your first
shell session mirrored in the second one, by the power of script(1) and
mkfifo(1).

Let’s take it one step further. Let’s use netcat (nc(1)) to stream the shell
session over the network! The first thing we need is a listener on one computer:

$ nc -l -p 9999

This computer will now wait for connections and data on port 9999. Now we need
a named pipe and script(1) in one shell session just like before:

$ mkfifo screenshare
$ script -t 0 screenshare

But instead of using cat(1), we use nc(1) to send the typescript to our
listener in another shell:

$ nc <ip-of-listener> 9999 < screenshare

If you now start working in the shell that ran script(1) a typescript will be
written to the named pipe, which netcat will read from to send it the listener
computer. Of course, this is totally insecure, but it’s really, really cool
nonetheless, right?

Another cool more practical thing to do is using named pipes to asynchronously
run tests, by writing test commands in on one end and running those commands on
the other end. Gary Bernhardt demonstrated that in a great
screencast.

I have to admit, learning about named pipes is by no means a world changing
event and experience. Mostly because an “unnamed pipe” is more often than not
the better and easier way to go. Still, I think it’s useful to know about and to
understand them. Having named pipes in your tool belt is certainly not something
you will regret.

]]>2013-04-07T14:24:00+00:00http://thorstenball.com/blog/2013/04/07/watchgopherIn the last couple of months I’ve been playing a lot with
Go and did what I always do when learning a new
language: use it for a small project. That project has been open to the public
on GitHub for a few weeks now and is now in a usable state:
Watchgopher.

Watchgopher allows you to watch certain directories and run custom commands
whenever a file in a specified directory changes. It is supposed to be
simple and give you control of what happens to your files: it only notifies
one of your commands whenever something happens it should know about.

Let me guide you through a simple example to show you what Watchgopher can do.
But first, make sure you have Go installed and then run
the following command to install Watchgopher on your system:

go get -u github.com/mrnugget/watchgopher

Now the watchgopher command should be available on your system and it’s time
to tell Watchgopher which directories to watch and what to do about any file
events occurring in them. So let’s create a simple configuration file in JSON
format:

This file tells Watchgopher to watch the Downloads directory in my home
directory and whenever something happens to a file whose name matches *.zip,
Watchgopher will run the specified command. If the command is in your $PATH
you won’t need to put the absolute

When running the command it will pass two arguments to it:

The type of the file event. This can be CREATE, MODIFY, DELETE,
or RENAME.

The absolute path to the file triggering the event.

After saving the config file we need to create the command which will be run. As
you may have guessed by reading the file names, I want to create a small command
that unzips every newly created zip file in my Downloads directory. The code
to do this is pretty simple:

Exit with exit code 0 if the file event is not CREATE, since I’m not
interested in DELETE events here, because unzipping deleted files is a
pretty difficult thing to do.

Run the unzip command with the filename as first argument and the directory
the file is in as -d option, which means it will unzip to the same
directory the zip file was created in, no matter from where you run Watchgopher.

It checks whether the unzip command exited successfully. If it did, it
exits successfully too and if it didn’t, it exits with error code 1. (You can
achieve the same thing with Ruby’s Kernel#exec, but explaining what exec
does is not part of this post)

Let’s the save the file at the specified place
(/Users/mrnugget/.watchgophers/unzip.rb) and give it the right permissions:

$ chmod +x ~/.watchgophers/unzip.rb

Now is the time to launch Watchgopher and point it towards the configuration
file:

And if we take a look inside the Downloads directory we will find a newly
extracted Octocat directory, which means everything went as planned! Great!

As you can see, this concept of handing over the handling of certain file events
over to specified commands gives you a lot of freedom when dealing with file
changes on your system. Since Watchgopher passes the two arguments to every
specified command, the command can decide what to do about it: delete newly
created files, move them to another directory, upload them to a server, or
ignore the file events altogether and just have a look around a directory for
old files and delete them whenever something happens in there. To make it
short: It’s up to you how you will react to file changes, Watchgopher just tells
you about them.

There are still a lot more features that I want to implement that are currently
missing: including the output of the specified commands in Watchgopher’s log
output, allow to specify the current working directory of a command, allow path
expansion and many more features and tweaks.

I’d be happy if you give Watchgopher a try and I’m also thankful for every
comment, question, issue opened and pull request sent, so don’t hesitate!

]]>2013-01-15T09:04:00+00:00http://thorstenball.com/blog/2013/01/15/disciplineBeing disciplined seems to be hard, something you are or not, no matter what.
People certainly talk that way. What I found is that being disciplined is quite
easy. It just takes some tricks and practice.

From time to time someone tells me I’m disciplined, mostly because of something
I did or said about something I’m doing. And after being called disciplined,
more often than not, I hear: “I could never do that, I wish I could, but I’m too
lazy.” But here is the thing, you see, I’m lazy too, but I don’t want to be. So
I keep myself from being lazy.

“I don’t have the time” — That’s what I thought. Then I got up a little bit
earlier every day. At first, just 30 minutes, which is enough to squeeze in 20
minutes of guitar practice every day. That’s not a lot, but it’s a lot more than
nothing. Repetition is key here. Then 30 minutes became one hour, then two.
Suddenly I could get a lot more done than before, by minimizing the time in the
evening I spent doing nothing and maximizing my morning, where I feel fresh
and productive.

“I could never get up that early” — If I had my alarm clock right next to my
bed, I too wouldn’t. I’d turn the alarm off, go back to sleep and wonder what
the hell happened two hours later. But instead I put my phone, serving as the
alarm clock, on the other side of the room before going to bed. So when the
alarm goes off I have to get up, walk over there and turn it off, by which time
I’m already too awake to just go back to bed. Putting the phone on the other
side of the room also keeps me from lying in bed at night checking Twitter and
instead makes me pick up that book I wanted to finish.

“With all that free time I’d just do useless stuff” — I would do too. But what I
found is this: setting the right goals, as small as they may be, is one of the
biggest steps away from being unproductive and lazy. Spending an hour working
through that new programming book, practicing scales on the guitar or running in
the park seems like an awful lot. “I’ll do that tomorrow then.” Scratch that
hour, make it 20 minutes. When those 20 minutes are up, the task for today is
done, see you tomorrow. I’m not saying having huge goals is wrong, no, but I
found it a lot easier to get closer to achieving those by splitting them up in
several small steps and goals. The best thing about this, is that achieving
feels good, as small as the goal is, and certainly helps growing motivation to
do more, by keeping the fun and getting rid of that “I’m finally done, I’m glad
it’s over” feeling.

I’m lazy by my own definition but I keep myself from giving in and instead try
to do things I’m proud of. Things, that don’t make me feel “I wish I had done
something productive instead” the next day. The key to all this is my wish to
get better, to be more productive, to create more, to learn more and to spend my
time without regretting it later as wasted. When I feel that wish, that urge to
do something about my laziness, is when I set my goals low and put the alarm
clock on the other side of the room. That helps me tremendously when I feel like
making excuses and reading another trivial article online.

All my discipline is born out of the recognition of my laziness and my will to
do to something about it. The results are those tricks I play on myself to help
me get something done, to do something I’m proud of. With a little practice it
gets a lot easier, and I certainly look forward to my daily morning routine,
where I can achieve yet another goal.

]]>2012-10-24T09:38:00+00:00http://thorstenball.com/blog/2012/10/24/command-line-rideWhen your harddrive is running out of space, chances are good that there are
files you can safely delete in order to free up some of that space.
A safe bet are archive files, like ZIP and RAR-files, you’ve already
decompressed.

I went on a little ride to put together a command that shows me how much space a
certain type of file take up. There first thing I did was finding those
files…

find / -type f -name '*.rar'

What does find do here? It looks up all the files on the harddrive that are
really files (-type f) and not directories and whose names end in .rar. That
got me pretty far. But still not far enough: sometimes RAR-archives are splitted
up into several files called archive.rar archive.r01 archive.r02 and so on.
They should be listed as well:

find / -type f -regex '.*r[0-9][0-9]' -o -name "*.rar"

That’s better! Here find lists the filenames matching the provided regex or
(-o) those ending in .rar. Running this command in a directory containing
such an array of RAR-files outputs the following:

But I still didn’t know how big those files are. So the first thought was: use
ls -l on every file since that gives me the size of every file. But ls takes
filenames as command line arguments and doesn’t read from the standard input
stream. So I couldn’t just pipe the list to ls, since a Unix pipe connects the
standard output stream of one program to the standard input of another program.
Just try it:

find . -name '*.txt' | ls -l

That shouldn’t give you the desired output. What happens here is that ls
doesn’t get an argument and lists the contents of the current directory. So how
does one call ls -l on every file in the list above? Pipe the list of files to
xargs. xargs’s job is to construct argument lists for the provided command.
It does so by splitting up the data it receives on the standard input and using
each chunk as an argument. By default xargs splits the incoming data by
newlines or blanks, which is normally fine but could lead to problems when
find outputs a filename containing whitespaces. In that case, be sure to use
man find and man xargs: You can specify a delimiter other than blank or
newline. So far the output should look like this:

Great, I thought, now I just need to get all the different filesizes, add them together
and print the total sum! Shouldn’t be too hard, right? Well, it isn’t if you got
awk. awk has too numerous capabilities to explain in this blog post. So
let me make it short: awk read its input from either the STDIN or from files
passed in as arguments and then performs actions on matching lines. To make it
even shorter: awk is awesome. There is a lot of free information available on
the internet about awk, but a single man awk goes a long way.

Here awk takes the fifth field (the fields are by default separated by blanks)
and increments the variable sum by it. At the end of the awk-program (after
awk ran it over each line) it prints out the sum, which gives us the sum of
the filesizes. But that’s not really readable since the output of ls -l
contains filesizes in bytes and I think it’s safe to say that megabytes would be
far more handy in this case. So I had to divide the sum by 1024 to get
kilobytes and then again by 1024 to get megabytes and I did this with the help
of xargs and bc:

That looks great! So what happens here? xargs uses a name for the data it
reads from standard input, sum, and then the echo command to output the
calculation that needs to be fed to bc. Without the bc the command above
would just output 6144000/1024/1024. bc then takes this as input and gives
us the result. Be sure to do this: man bc. This example here doesn’t even
scratch the surface of what bc is capable of.

So now the job is done, right? The command line above now outputs the total size
of all the RAR-files on the harddrive or in the current directory. Well,
technically yes, it’s done. But as you can see, that was pretty heavy lifting,
nobody will remember that command above and when first looking at it nobody will
know what it exactly does.

And here’s the kicker: it’s useless. That command above is obsolete. As soon as
I finished hacking up that command line I remembered a tool that I basically use
every day but totally forgot about while hacking together the right find-regex,
looking up awk formulas and how bc works. There is one tool that does
exactly what that long line above does and it’s called du. du is built for
the job. It’s a simple tool that does one thing very well (and I quote the man
page here) and that is to “display disk usage statistics”. I can’t for the life
of me explain how I forgot it. With du in hand, the line shrinks down to this:

That looks a lot better than that monster I hacked together. And it’s easier to
understand too: 1) find all the files matching a certain pattern 2) then pass
them to du to display how much space they take up. After remembering du I
thought: “Well, maybe I can shrink this down even further”. And I did.

When you use find and the -regex option you’re in for a challenge. find
allows you to use many different types of Regular Expressions
and the differences sometimes make it really difficult and frustrating to get
the regex you want to work. Just have a look here.

Most of the time it’s probably easier to use the globbing functionality of
your shell rather than using find and regex. Especially ZSH is pretty good with globbing
and bash also does its job very well. Since I’m using ZSH I tried to get rid of find and use my shell’s
built-in globbing functionality. And what I came up with is so much better than
that long line above:

That line recursively lists all files ending in either rar or r01 up to
r99 and then passes them over to du. Easy to read, easy to understand
and more importantly: easy to reuse.

There is a lot I learned on this ride and most importantly it was seeing the
Unix philosophy in action:

“This is the Unix philosophy: Write programs that do one thing and do it well.
Write programs to work together. Write programs to handle text streams,
because that is a universal interface.” - Doug McIlroy

All of those programs work very well together, they are combinable, they are
reusable and they do one thing well. And in this case du was that program that
did the one thing I wanted to achieve very well and could be used to replace
another complex “program”, if you want to call that line above a program.

That is not to say that complex command lines are always wrong to use, no. Sometimes
you need that many programs to work together in order to get the desired output.
And when that happens it’s great to see how good every tool is at doing its own
job and which problems can be solved by combining them. And Unix pipes make it
dead easy to combine them by offering a clean and easy to understand inteface.

Seeing that philosophy in action shows extremely well how much code and programs
can profit from complying with it.

]]>2012-09-02T22:43:00+00:00http://thorstenball.com/blog/2012/09/02/software-covers-and-programming-licksThe next time you’re trying to think of something you could put into code and
you shrug of ideas because they’ve already been implemented, try this: Write a
cover version of a piece of
software.

Try to imitate software. A project you like, a project you use, a project that
implemented the idea you just shrugged of. Imagine yourself in a software cover
band and try your best to replicate it.

Why? It’s surely a waste of time, I hear you say. And if we’re talking about
paid software development I fully agree. But consider yourself having a couple
of hours to spare and you’re eager to write some code: covering a piece
of software proves to be a great way to gain knowledge and become a better
programmer.

I’ve been playing guitar for nearly eight years now and spent a great deal of
that time learning and playing other peoples songs. Creatively speaking this is
not as fulfilling as writing your own material and coming up with new, fresh
ideas is certainly better than playing The Thrill Is Gone at a bar gig. But by
doing covers you learn new chord sequences, new
[licks](http://en.wikipedia.org/wiki/Lick_(music) ‘Licks (Wikipedia)), new
tricks and probably discover something new you haven’t thought of yet. And once
you’ve got a song down and can play it note-perfect, when your muscles remember
how to play a difficult part without struggling, then comes the best part: you
can have fun and play around with it. You can improvise over it, you can change
it, you can disassemble and reassemble it and (this is most important bit) you
can use those little tricks and ideas and incorporate them in your own material.

I’m not talking about copying here, I’m talking about learning solutions to
problems and applying them when the need arises. Learning to play a Led Zeppelin
song on guitar might sound dull since every guitar player on the planet knows
how to play at least one of them. There is nothing left to prove. But there is
still something left to be learned: when you’ve got the solo of Since I’ve Been
Loving You down to the note, the next time your solo spot comes up and you
want have a intense dynamic build-up, try to think of Jimmy Page and how he’d
do it, which tricks he’d use.

You do this with software and programming too. I’ve been playing around with
statsd a couple of weeks ago and I
thought about implementing a stats collector in Ruby until I found out that it’s
already been done. So I was about to shrug it
of and do something else when I decided to do it anyway. Just for fun. Just to
find out how to do it.

I wouldn’t have learned as much as I did in the last week about buffering,
concurrency, streaming and UNIX sockets if I hadn’t tried it. And the best part?
Whenever I got stuck, I read through the statsd or the batsd code to see how
they solve a certain problem and I learned something new. Surely you learn
something new too when trying to implement a piece of software that hasn’t been
written before. But this is different: you already have solutions to problems to
look at and learn from. And when facing the same problem you can learn about the
problem and the solutions at the same time. This is a great way of learning software
development, since you can’t fully understand and judge a solution if you don’t
fully understand the problem you’re facing.

And once you know and understand different solutions to different problems you
can then apply them in other situations. By trying to write software covers you
get to know the problem and how a particular piece of software solved it. Or, as
in my case with statsd and batsd, you start to understand different
solutions to one problem. Doing this, you can learn a whole lot of new
programming tricks and licks and use them whenever they fit.

]]>2012-07-09T11:35:00+00:00http://thorstenball.com/blog/2012/07/09/vim-learning-resourcesI’ve been a Vim user for a long time but only in the last year I actually
started to really use it. No, I don’t mean to say that I spend years of my
life with an opened editor on my workspace and just stared at it. What I really
mean is that those early years of my Vim usage and knowledge mostly consisted
of h, j, k, l, :wq and :%s/replace/this/g. That and syntax highlighting was
pretty much Vim for me.

One of the things that makes Vim so great for me is that it doesn’t force you to
use or understand something. I was able to get work done with just the basic
knowledge I had. But then came that moment when I stumbled upon a
blog post about Vim’s text objects and my jar dropped on my keyboard: “So
that’s how it works?!”

Vim has a steep learning curve and there are bumps on the road where you can get
stuck. And I know a lot of people do get stuck. You may like it there, being
stuck, but keep in mind: sometimes it’s just a little tip that can get you
forward. A small push that gives you an AHA! moment. Thankfully there are a
lot of these available everywhere, though sometimes they are not that
easy to find, which is the reason I’m writing this post.

The following is a list consisting of screencasts, video tutorials, blog posts,
tips and tricks concerning Vim and how to master it. The items on the list are
the ones that got me forward when I was stuck on my way to Vim wizardry (mind
you, I’m not there yet, not even close but I’m wearing the nice hat anyway). I
bet someone can find some use in this.

Video Tutorials

vimcasts.org:
Everybody who’s starting to use Vim nowadays knows about this site and rightfully so.
Vimcasts is a great resource when learning Vim. Drew Neil
teaches you Vim in 35 screencasts of top notch quality. Also, I
hear his book is
great too, as are his workshops.

Derek Wyatt’s Vim videos:
Bookmark this, now! I mean it! This collection of video tutorials is one of the
most hidden gems in the Vim galaxy and I don’t know why. Derek Wyatt has an
immense knowledge of Vim, his videos are fun to watch and packed with Vim
knowledge, explaining topics interesting to beginners and experts alike. Watch
them!

PeepCode - Smash Into Vim:
PeepCode is known for excellent screencasts and the two videos about Vim are no
exception to the rule. If you want to invest some money in your mastery of Vim:
this is the way to go.

Blog Posts

Coming Home To Vim:
This post by Steve Losh pops up on Hacker News from time to time and always gets
lots of positive feedback. It’s one of those posts that made me ‘get’
what text objects are and what ‘speaking to the editor’ is all about. Totally
worth the read, especially if you have the same background as Steve, coming from
a different editor but determined to dig into Vim.

Vim: revisited:
In this post called Vim: revisited Mislav explains how and why he finally
“got” Vim, how his approach to text editing changed through Vim and what to keep
in mind when trying to do the same thing as he did. The ‘Essential
plugins’ section at the end is a good starting point of you feel you’re missing
out on some features when using Vim.

Why, oh WHY, do those #?@! nutheads use vi?:
“Learning an editor, master it, put work into it — are you crazy? Why would
someone do that? We’re talking about a text editor, for god’s sake!” Well,
read this post, learn the answers to your questions and gain some knowledge
concerning vi and Vim.

Vim Text Objects: The Definitive Guide:
If you have used ciw, di”, dat in Vim before but never really found out
what those commands mean and how someone can actually memorize them: read this
post by Jared Carroll on Vim’s text objects.

Stack Overflow

Another way of getting a great insight into the usage of Vim is by reading
through the most upvoted answers tagged with ‘Vim’
on stackoverflow. The questions themselves might not sound interesting to you at
first sight, but digging the answers is worth it most of the time. Check out the
top two of those questions and read the first answers to get a taste of what’s
waiting for you:

What is your most productive shortcut with Vim?:
The first answer to this question is something you should keep in your bookmarks
toolbar: lots and lots of information about Vim and the editing philosophy
behind it. Also: further down on the page are cool mini-screencast-gifs.

Vim Tips

Vim Tips Wiki:
This is one of the all time greats. What was formerly known as the ‘Tips’ section
on vim.org is now a wiki. That means there are
thousands of tips concerning Vim, vimrc files, plugins and everything else
connected with Vim in any way plus comments and updates.

Best of Vim Tips:
This might not be the prettiest of all websites, but if you keep digging through
this one you might find one or two treasures waiting for you. This was
one of the most upvoted tips in the old Vim.org tips section. Probably because
it contains the knowledge of five hundred tips combined.

Tools & Tutorials

Vivify: Vim looks ugly? Well, it
shouldn’t. Check out Vivify. This site lets you preview thousands of Vim
colorschemes without the need to download.

Vim Cheatsheet for Programmers:
There are a lot of Vim cheatsheets floating around
on the web and I don’t know if there is another one better than this one. All I
know is that this one was pinned to the wall next to my screen for years.

Vim Recipes: A huge collection of
Vim recipes ranging from ‘Quitting Vim’ to ‘Extending Vim with Scripts and
Plugins’. Always worth a visit.

Help Yourself!

Open Vim and type in :help. This may be the most important command in Vim
you’ll come across, it certainly is the most helpful. Especially when looking
for specific information: e.g. :help global. And if you’re just starting out
and don’t know how to type in commands into Vim you should type this command
into your shell: vimtutor

]]>2012-06-20T18:58:00+00:00http://thorstenball.com/blog/2012/06/20/how-i-used-98840-commands-less-and-saved-4-secondsIn my last post I explained how to implement a search autocompletion backend
using Redis. This week I used the described implementation with its add_movie method to put thousands of movies into my Redis database
in order to use them for the autocompletion on
anygood.heroku.com. It’s the same method as described in the last post, but with
important change I didn’t think about too much: I did not overwrite the score of the members in the sorted
sets. The method I used for adding movies looked like this:

Adding movies took suspiciously long, longer than I thought it would.
Looking at it now it’s pretty easy to tell why: it was exactly that small change
that lowered the speed of my importing process, I did cache the member score
in every prefix set and then used in the REDIS.zadd line. Every time a movie
was added, for each of its prefixes the score of the correspondent sorted set
member was saved and used again. That’s not bad at all, no, it didn’t slip in
there either! Heck, I wrote tests for this. But looking at the code I concluded
that it was not a feature I needed. Using the code above, every movie would have
a different rank in the autocomplete output according to the user input. One
movie could be ranked higher when the user typed in the and ranked lower when
the input was dark. That is pretty cool, but I wanted a different behaviour:
the sorted set members associated with one movie should have one score, the same
one in every set. And after running some benchmarks I found out how implementing
the wrong behaviour slowed me down.

Testing this was simple: I wrote a small script that allowed me to add ten
thousand movies with the aforementioned method. The rest is just a shell and
time. The script looks like this:

#!/usr/bin/env ruby -wKUrequire'redis'require'digest'REDIS=Redis.new# We need to generate the prefixes for every movienamedefprefixes_for(string)prefixes=[]words=string.downcase.split(' ')words.eachdo|word|(1..word.length).each{|i|prefixes<<word[0...i]unlessi==1}endprefixesend# Adding 10000 movies(1..10000).eachdo|i|movie_name="The Number#{i}"hashed_name=Digest::MD5.hexdigest(movie_name)prefixes=prefixes_for(movie_name)prefixes.eachdo|prefix|score=REDIS.zscore("testing:redis:index:#{prefix}",movie_name).to_i||0REDIS.zadd("testing:redis:index:#{prefix}",score,movie_name)endREDIS.hset("testing:redis:data:Number#{i}",movie_name,"This is Number #{i}")end

It’s a good idea to delete all keys in our instance (using FLUSHALL) before
running this script and monitoring is especially and always great: Redis’ handy
MONITOR command outputs all the commands our instance receives, complete
with timestamp. So, monitoring our instance in the background, its time to see
how often ZSCORE gets called when we add those ten thousand movies:

With this in place, ZSCORE will get called exactly one time for each movie and
not for every prefix of every movie. Instead of using ZSCORE 108840 times,
after that small change it only gets called ten thousand times. And 108840
minus 10000 is 98840. That’s 98840 commands less!

But let’s not stop here. On my MacBook Air it took roughly 7 seconds to insert
those ten thousand movies using ZSCORE on every prefix. With 98840
commands less it only takes around 4:

Redis and the Pipeline

When sending requests to Redis using pipelining the server doesn’t wait
for the client (our code) to progress its responses, it just accepts them and
does as it’s told. And the client doesn’t read any responses, it just fires
the next request until all commands in the pipeline are sent.
redis-rb fully supports
pipelining to Redis and using it is pretty straightforward: Just put the part of your code
that’s sending requests to Redis into a REDIS.pipelined block and everything
in the block will be going through the pipeline. That should save us some
seconds, right? Time to make a use of it:

# [...]score=REDIS.zscore("testing:redis:index:#{prefixes.first}",movie_name).to_i||0REDIS.pipelineddoprefixes.eachdo|prefix|REDIS.zadd("testing:redis:index:#{prefix}",score,movie_name)endREDIS.hset("testing:redis:data:Number#{i}",movie_name,"This is Number #{i}")end# [...]

What we’re doing here is precisely what I just described: for each movie we open
a pipeline to Redis and send all the ZADD and finally the HSET command
straight to Redis without blinking an eye. The response won’t get processed
until the block is finished. Doing that our script runs noticeably
faster. But since we all love benchmarking, hard numbers and speed, let’s see
how fast exactly:

4 seconds less! That is more than 50% faster! Wow! This is great, this is
magnificient! Let’s pipeline all the commands!

Not so fast, buddy!

Woah, easy, buddy, hold it right there! Let me tell you something: It would be
great if we could put all our Redis interactions within a REDIS.pipeline
block, but there is one small problem. As I said, the server doesn’t wait for
the client to process its responses and the client waits until the pipeline is
empty. That means you can’t pipeline blocks of code in which you work
with the servers response. The code at the beginning of this article did use
the server’s responses, remember? We used ZSCORE for every prefix to get the score
of the member in that particular set, saved it and then updated the member of the
set with exactly that score. That’s not possible when sending our orders through
the pipeline.

I guess an example should clear things up so let’s add a couple of movies with
the following code, resulting in every member of every index set having a score
of 10:

That looks fine. Now, if we were to add those movies again (with the keys still in our Redis
instance), and try to cache the scores inside a pipeline block, it won’t
work, since the client doesn’t process the servers response until after the
pipelining block is finished. The following change in the code will show when
the response is there and when it isn’t:

The servers responses are only there (the last element in the scores array)
after the block is finished, inside the block the scores array is filled with
nils: no response has been read yet. Using nearly the same code but
without pipelining we get totally different results:

We can see now that pipelining is a great way to gain speed when progressing huge lists
of commands, which is exactly what I was after when adding thousands and
thousands of movies, but when you need to work with the servers response, say
when getting a value and using that value in another request, the value just
won’t be there inside a pipelining block, only after it finished executing.
Pipelining is a great thing to know, but only when the requirements are right
and it’s applicable.

That means I used 98840 commands less by adjusting the code for its use case and
saved 4 seconds using the right tool for the job, which is always great, isn’t
it?

]]>2012-06-08T16:42:00+00:00http://thorstenball.com/blog/2012/06/08/search-autocompletion-with-redisI spent the last weekend building a caching system for AnyGood
using Redis to save API responses for certain
amount of time. Since the next item on my TODO list was autocompletion for my search
form I started googling around looking for solutions involving Redis. I
stumbled upon two great posts examining this particular topic: The first one
written by Salvatore Sanfilippo is a great explanation on how to use
Redis for autocompletion and goes in great detail when explainin the algorithm.
The second one
by Pat Shaughnessy is a great help when explaining Sanfilippos algorithm and comparing
it to Soulmate, “a tool to
help solve the common problem of developing a fast autocomplete feature. It uses
Redis’s sorted sets to build an index of partially completed words and the
corresponding top matching items, and provides a simple sinatra app to query
them.”

So I spent a good amount of time studying those articles and reading through the
source code of Soulmate before I decided to write my own solution. Why? Because
it was a rainy sunday afternoon, playing around with Redis is fun and I wanted
to learn more about it. Plus: I wanted multiple phrase matching and
search result ordering. Soulmate did all that and a bit more but just setting
it up wouldn’t have the same learning effect. And since I was out to play, I
might as well have fun. So let’s see how to this…

What it should do

Let’s suppose we want autocompletion for our search form that not only returns a
single value for each proposed item but also comes with some data that we can
present. Also, let’s go one step further and say that the search form
should present the items available to the user in an ordered way, e.g. by
popularity.

So when a user types into the search form, we want to to show him all the
possible movies matching his search phrases. That means the first thing we’ve
got to do is dump the data we want to present into Redis. Like Soulmate, we use
a Redis hash here,
where each movie has its own unique key. The key can be anything as long as it’s
unique. If you don’t have unique IDs for your data, you could use MD5 to
generate some. But let’s suppose we do have an unique ID, dumping the data into
Redis is pretty simple:

HSET is the Redis command to save something into a hash. The key for this
hash is moviesearch:data. The single keys for every movie in this hash are the
unique IDs: 1, 2, 3, …. And the data we’re saving are JSON strings, which
represent a pretty convenient way of saving objects to Redis. Also, it’s easy enough
in Ruby to convert a Ruby hash to JSON and after retrieving it from Redis back
to a hash:

Oh and btw. I’m using redis-rb
as the Ruby client to connect to Redis, which does not only support Redis
transactions and pipelining, but the best thing about this client is that the
Redis commands have the same
name and take the same arguments (well, most of the time), which is especially
great when getting to know Redis and looking up commands in the documentation.
So now we’ve got the movies in Redis, how do we find them again?

Prefixes everywhere!

People are most likely trying to search for a movie by starting to type its name
into the input field. And we want to show them the matching movies before
they’re even finished typing the whole name, right? That’s why we’re talking
about autocompletion here. That means we need an association between word
prefixes and the movies. If someone were to type in The Dar we want to
show The Dark Knight and The Dark Knight Rises as possible search terms.
Long story short: we need to get the prefixes of every movie we just dumped in our
moviesearch:data hash. For that I’m using a simple method, which is heavily based
on the one Sanfilippo uses in his example script:

We need to generate those prefixes for every movie we want to use in our search
autocompletion and so I use a minimum prefix length of two characters here,
because using one character prefixes is a lot of overhead for search completion
where most people are going to type in more than one character.

(Of course it’s entirely possible to not only use prefixes, but use every range
of characters from any position in the word. Instead of using fi, fis, fish
for Fish, we could use fi, fis, fish, is, ish, sh. But people are more
likely to type the beginning of a word, I guess.)

Save those prefixes to sorted sets!

Redis uses sorted sets as a list of unique strings that can be ranked by a score. For now, we
will ignore the score and just use this as a set where every entry is unique and
trying to add one with the same name won’t result in a new entry. So let’s
create a sorted set for every prefix. This set will include the key of the
movies in which the prefix occurs and the pattern for those keys is the one
Soulmate uses: moviesearch:index:$PREFIX.

In order to associate the movie The Dark Knight with its prefixes we need
to do the following for every prefix:

ZADD moviesearch:index:dar 0 9

As you might have guessed, the prefix here is dar. The next number in this
command is the score the member of this sorted set will have, but as I said,
ignore this for now and keep in mind that the 9 is the key of our moviesearch:data
hash pointing to the The Dark Knight. So we need to associate all prefixes of
every moviename with its key in the moviesearch:data hash. Written in Ruby a
method doing exactly that would look like this:

That’s pretty simple, isn’t it? The method takes two arguments: the name of the
movie and the key of the moviesearch:data hash pointing to its data. After
using that method for all the movies we added to our data hash, we have a lot of
sorted sets with its members being the keys for our data hash. That means, after
adding The Dark Knight and The Dark Knight Rises the
moviesearch:index:dark set has two members: 9 and 10. So, what does that
give us?

Let’s imagine a user is visiting our website and typing dar into the input
field of the search form.

We now get all the entries in moviesearch:index:dar, which are the keys of our
moviesearch:data hash. The sorted set with the key moviesearch:index:dar
contains 9 and 10 as members. With those numbers we can now just fetch all
the hash entries with these as key and present them to the user. But
let’s see how that works.

Matching more than one word using Redis’ ZINTERSTORE

If we want to get all the entries for moviesearch:index:ki we use the ZRANGE
command provided by Redis:

ZRANGE moviesearch:index:dar 0 -1

Starting from the first element (0 since Redis starts indexing with 0) and going
to the last (-1) we get the hash keys for movies whose names contain the prefix ‘dar’:

$ redis-cli ZRANGE moviesearch:index:dar 0 -1
1)"9"
2)"10"

And now, let’s fetch all the entries from our moviesearch:data with those
keys:

Instead of using HGET we use HMGET to fetch multiple hash entries. That’s
quite neat! Now we have all the movies containing a word that starts with
‘dar’. And we could present those to the user, who is typing and waiting for
suggestions. Let’s go one step further though:

Let’s suppose a user has typed ki bi into our search form. We now want to
present him Kill Bill, Kill Bill 2, Kilts for Bill as suggestions, but not
Killer Elite and not King Kong. That means we need to find a movie
containing both those prefixes in its name and not only one of them. And this is
exactly where Redis’ ZINTERSTORE
comes out to play.

ZINTERSTORE creates a new sorted with a given key containing all the members
occuring in the sets passed to it. Let’s use it to create a temporary set
containing the hash keys of the movies having ‘ki bi’ in their names:

What happens here? ZINTERSTORE looks up which members are in both
moviesearch:index:ki and moviesearch:index:bi and creates a new sorted set
with the key moviesearch:index:ki|bi containing those members. The pattern for
this key is also from Soulmate: dig through the code as there are lots of great
ideas! Now we can use the ZRANGE command again and see which movies contain
those prefixes:

$ redis-cli ZRANGE 'moviesearch:index:ki|bi'
1)"1"
2)"4"
3)"5"

And those are exactly the keys pointing to Kill Bill, Kill Bill 2 and Kilts
For Bill in the moviesearch:data hash! Great! All we have to do now is use
HMGET to get the data for those keys from the hash and present them to the
user.

Score & Popularity

Until now we ignored the score of our sorted sets. But let’s say all our users
are looking up Kill Bill by typing in Ki Bi and hitting enter. Let’s also
assume there are a lot more users looking up Kill Bill than there are people
interested in Kindergarten Cop (as weird as this assumption may sound, bear
with me here). Remember when we associated the movies with the prefixes? We did
this, using 0 as the score:

ZADD moviesearch:index:ki 0 1

Now, everytime a person looks up Kill Bill instead of Kindergarten Cop we
can increment the score of the 1 entry (pointing to Kill Bill) in the
moviesearch:index:ki set (and all the other sets containing 1) using
ZINCRBY. ZINCRBY
increments the score of a given member in a given set by a given value. In order
to sort our results by popularity we could increment the score of a given movie
everytime a user looks that movie up. A simple method for doing this would
probably look like this:

We take every prefix occuring in the movie name and increment the score of the
member pointing to the movie’s data in the moviesearch:data hash.

Now, before thinking about scores of set members we used ZRANGE to get the
members of a given set. Working with scores now, the next time we try to
match a movie with the given prefixes we’ll use
ZREVRANGE to
return the matching hash keys ordered by their respective score. The following
should then give us Kill Bill at the top after we incremented the score for
this movie every time a user looks it up.

ZREVRANGE moviesearch:index:ki 0 -1

Using ZREVRANGE, ZUNIONSTORE and HMGET combined we can write a Ruby method
to look up movies matching the passed prefixes and order them by score:

This method takes an array of prefixes as argument, creating the index keys for
them, then it creates a temporary sorted set using ZINTERSTORE (EXPIRE tells Redis to delete a
given key after the specified time in seconds) containing the data hash keys
pointing to the movies. After that ZREVRANGE gives us all the members of this
set ordered by their score and finally HMGET is used to get the data for all
the matching keys from the moviesearch:data hash. There are a couple of helper
methods involved, but it should still be pretty clear what happens. If not, look
at the code of the whole MovieMatcher class I use in Anygood here.

So everything combined we use Redis hashes to save our data, sorted sets for
every prefix whose members point at our movies in the hash and in order to find
movies containing multiple prefixes we use ZINTERSTORE as a cache to point us
to the movies containing both. And now we’ve got search autocompletion
presenting ordered results to the user matching multiple phrases!

If you want to dig deeper, please read through the source code of Soulmate and
study those two articles mentioned in the first paragraph. They both do a great
job at explaining what exactly is going on here and why to use sorted sets and
the other data types as we do.