Hi, I'm Mathias Meyer, nice to meet you!

I always forget what kinds of crazy things you can do with Ruby's blocks and
their parameters, so here's a little write-up on them. I regularly forget things
I've learned (must be an age thing), and I found that not even books on the Ruby
language fully cover all the gory details on block (and method) parameters. So
consider this my personal reference of crazy Ruby block syntax features for future use.

The Basics

In its simplest form, a block parameter is a list of names, to which values
passed into the block are assigned. The following iterates over all elements in
the hash, emitting key and the corresponding value, printing both key and value.

Splat Parameters

Notice that crazy syntax for calling the block too, pretty wild. The fun starts
when you combine a splat parameter with one that's fixed.

blk = ->(first, *tail) {puts first}
blk.(1, 2, 3)
# => 1

Why not put another fixed parameter at the end? That'll assign the first element of
the arguments to the variable first, the last element to last, and
everything in between to middle

blk = ->(first, *middle, last) {puts last}
blk.(1, 2, 3)
# => 3

This can grow to an arbitrary complexity, adding more fixed parameters before
and after a splat. In this example, middle will just be an empty array, as the
fixed parameters are greedy and steal all the values they can match.

Don't do that, though. But good to know you can. Note that it only works for
parameters listed before the one you're assigning to. You can also shorten the
example above by quite a bit.

blk = ->(list = [1, 2, 3], sum=list.inject(:*)) {
list.sample
}

Block-local parameters

To add more character variety, you can declare variables local to the block by
adding a semicolon and another list of parameters. Helps when you want to make
sure variables used in the block don't accidentally overwrite or reference
variables outside the block's scope. Blocks are closures, so they reference
their environment, including variables declared outside the block.

You'll be pleased to hear that there's no craziness you can do with block local
parameters, like assigning defaults.

Ignoring arguments

This one may look familiar to folks knowledgeable in Erlang. You can ignore
specific arguments using _. Combine that with the splat parameter and you can
extract the tail of a list while ignoring the first element. Then you can
recursively iterate through the tail, ignoring the first element.

blk = ->(_, *tail) {blk.(tail) if tail.size > 0}

When is this useful? Ruby is not a pattern-matching language after all. For
instance, imagine an API that expects blocks handed to a method call to expect a
certain number of arguments. Ruby gives you warning if the block's arity doesn't
match the number it was called with. This way you can silently dump parameters
you're not interested in while still conforming to the API.

Okay, I lied to you, this is actually not an operator of sorts, this is a simple
variable assignment to a variable called _. It's a neat little trick though to
make it obvious that you're not interested in a certain parameter. Also note
that _ in irb references the value returned by the last expression executed.

Tuple arguments

This one blew my mind when I found it somewhere in the depths of Rack's source
(or somewhere else I don't remember). Think of a hash where each key points to
an array of things. Wouldn't it be nice if you could extract them all in one go
while iterating over them without having to first iterate over the hash and then
over the embedded arrays?

Turns out, the tuple operator is just what we need for this. This is an example
from a Chef cookbook I built a while back, specifying some thresholds for an
Apach configuration for Monit.

Notice the definition of (operator, limt). That little bead nicely extracts
the array with operator and a percentage in it into two parameters. Here's
another thing that blew my mind, chaining enumerators, collecting values and index
from a hash, for example. Note that hashes are sorted in Ruby 1.9, so this is a
perfectly valid thing to do.

In preparation for the talk I got curious about EventMachine's innards. So I thought it'd be nice to share my findings
with you. Node.js kids, pay attention, this concerns you as well. It may be JavaScript, but in the end Node.js works in
a similar fashion, though it builds on libev, which does most of the
plumbing for the different operating system implementations of non-blocking I/O.

Most of the magic happens inside the C++ part of EventMachine, so now's as good a time as any to dig into it and find
out how it works. There'll be code in here, not assembler, but I'll be throwing constants, standard library functions
and TCP networking bits (from C, not from Ruby) at you. There's no magic however, and when in doubt, consult the man
pages. You do know about man pages, right? They're awesome.

while(true): The Event Loop

EventMachine is based on the idea of an event loop, which is basically nothing more than an endless loop. The standard
snippet of code you're wrapping all your evented code is this:

EM.run do
# go forth and handle events
end

You can look at the details of what the method does in its full
glory.
Other than initializing some things, it dives down into the C++ layer immediately, and it's where most of the magic
happens from now on.

Three C/C++ extension files are of importance,
ext/rubymain.cpp
is the bridge between Ruby and the C code layer. It uses Ruby's C functions, mostly to convert datatypes for the later
below. It then calls into code defined in
ext/cmain.cpp,
which in turn bridges the C and the C++ code.

When you call EM.run to start the event loop, it calls down into the C layer to t_run_machine_without_threads, which
is called as run_machine, and which in turn calls
EventMachine_t::Run(),
whose interesting bits are shown below.

All the timers specified through either add_timer or add_periodic_timer are run here. When you add a timer,
EventMachine stores it in a map indexed with the time it's supposed to fire. This makes checking the list for the ones
that should be fired in the current iteration a cheap operation.

_RunTimers()
iterates over the list of timers until it reaches one entry whose key (i.e. the time it's supposed to fire) is higher
than the current time. Easy and efficient.

On a side note, _RunTimers always returns true, so it's a bit weird that the return value is checked.

Add new descriptors (line 6)

Whenever you open a new server connection, EventMachine adds an object representing the connection and the associated
callbacks to this list. All connections and descriptors created in the last iteration are handled, which basically
includes setting additional options if applicable and add them to the list of active connections.

On the operating system level a descriptor represents a file handle or a socket connection. When you open a file,
create a connection to another machine or create a server to listen for incoming connections, all of them are
represented by descriptors, which are basically integers pointing into a list maintained by the operating system.

Modify descriptors (line 7)

Modify existing descriptors, if applicable. This only has any effect when you're using epoll, which we'll get to
later.

Run the event (line 9)

Check all open file descriptors for new input. Read whatever's available, run the associated event callbacks. The
heart of the event loop, worth taking a closer look below.

The event loop really is just an endless loop after all.

Open a Socket

When you call EM.connect to open a connection to a remote server, the connection will be immediately created, but it
may not finish until later. The resulting connection will have a bunch of properties:

The descriptor is configured to not block on input and output by setting the socket option O_NONBLOCK. This way
reads will immediately return when there's no data instead of waiting for some to arrive, and writes don't necessarily
write all the data they're given. It also means that a call to
connect() to create a new connection returns
before it's fully created.

The Nagle algorithm is disabled to prevent the TCP stack from delaying sending packets by setting TCP_NODELAY on the
socket. The operating system wants to buffer output to send fewer packets. Disabling Nagle causes any writes to be
sent immediately. As EventMachine does internal buffering, it's preferrable for the data to be really sent when it's
eventually written to a socket.

Reuse connections in TIME_WAIT state before they're fully removed from the networking stack. TCP keeps connections
around for a while, even after they're closed to ensure that all data from the other side really, really made it to
your end. Nice and all, but in environments with a high fluctuation of connnections, in the range of hundreds to
thousands per second, you'll run out of file descriptors in no time.

Opening a socket is an immediate event, it happens as soon as you create a new connection. Running any callbacks on it
won't happen until the next iteration of the event loop. That's why it's safe to e.g. fire up a new HTTP request and
then attach callbacks to it. Even if that wouldn't be the case, EventMachine's
Deferrables (not to be confused with
EM.defer) ensure that callbacks are fired even after the original event fired, when they're added at a later time.

What is immediately called, though, is the post_init method on the connection object.

Opening a network connection is just one thing you can do with EventMachine, but as it's the one thing you're most
likely to do when you're using it, let's leave it at that.

Don't call us, we'll call you

Working with asynchronous code in EventMachine usually involves callbacks, unless you work with your own connection
class. Libraries like em-http-request rely on deferrables to
communicate with your application. They're fired when a HTTP request finished or failed. But how does a library keep
track of data that only comes in bit by bit?

The answer is simply buffering. Which brings us to the core of the event loop, checking sockets for input, which is done
from the ominous _RunOnce() method in the code snippet above. EventMachine can utilize three mechanisms to check
descriptors for new input.

select(*)

The default is using select(), a standard
system call to check a collection of file descriptors for input, by way of Ruby's implementation rb_thread_select(),
which wraps the call to select() with a bunch of code ensuring thread safety.

Using select() pretty much works everywhere, and is perfectly fine up to a certain number of file descriptors. If
you're simply serving an asynchronous web application or API using EventMachine, this may be totally acceptable.

Implementing this way of handling I/O is rather straight-forward, if you look at the
implementation. Collect all file
descriptors that may be of interest, feed them into select, read and/or write data when possible.

What makes using select() a bit cumbersome is that you always have to assemble a list of all file descriptors for
every call to _RunOnce(), so EventMachine iterates over all registered descriptors with every loop. After select ran,
it loops over all file descriptors again, checking to see if select marked them as ready for reads and/or writes.

When select() marks a descriptor as ready for read or write operations that means the socket will not block when data
is read from or written to it. In the case of reading that usually means the operating system has some data buffered
somewhere, and it's safe to read that data without having to wait for it to arrive, which in turn would block the call.

Instead of using select(), EventMachine could also use
poll() instead, which just handles a bit nicer in
general, but is not available in the Ruby VM.

epoll

epoll is Linux' implementation for multiplexing
I/O across a large number of file descriptors.

The basic steps of using epoll are simple:

Set up an epoll instance using
epoll_create, done initially when the
event loop is created. This creates a virtual file descriptor pointing to a data structure that keeps track of all
real file descriptors associated with it in the next step.

You only need to reference this single file descriptor later, so there's no need to collect a list of all file
descriptors, as is the case when select() is used.

Register interest for events on a file descriptor using
epoll_ctl on the epoll instance created
above.

This is used in _AddNewDescriptors and _ModifyDescriptors to register and update EventMachine's file descriptors
with epoll. In fact, both methods only do anything noteworthy when epoll is used. Otherwise they just iterate over a
list of descriptors, pretty much doing nothing with them.

Wait for input with epoll_wait for a
specified duration. You can wait forever, return immediately if nothing happened, or wait for a specific amount of
time.

EventMachine seems to have chosen to return immediately if there's no activity. There's an alternative implementation
calculating the time to wait based on the likelihood of a specific event (e.g a timer firing) to fire on the next
event loop iteration, but it doesn't seem to ever be used. Seems to be a relict from the time it could also be used as
a C++ library.

epoll events are registered for both reads and writes, with epoll_wait returning the number of file descriptors that
are ready for both events.

Using epoll has a big advantage, aside from being much more efficient than select in general for larger sets of file
descriptors. It spares code using it the burden of constantly iterating over a list of file descriptors. Instead you
just register them once, and then only iterate over the ones affected by the last call to epoll_wait.

So epoll requires a bit more work when you add or modify connections, but is a bit nicer on the eyes when it comes to
actually polling them for I/O availability.

Note that epoll support must be explicitly enabled using EM.epoll.

kqueue

kqueue is the BSD equivalent of epoll, and is available on e.g. FreeBSD and Mac OS X. It works very similar to epoll. If
you want to know more details, I'd suggest reading the paper on it by Jonathan
Lemon.

You can enable kqueue support using EM.kqueue, which is, just like EM.epoll, a noop on systems that don't support
it. Hopefully future EM versions will use whatever's available on a particular system as default.

Call me already!

All three mechanisms used have one thing in common: when data is read, receive_data is called immediately, which
brings us back to the question of how a connection objects collects data coming in.

Whenever data is ready to be consumed from a socket, EventMachine calls EventDescriptor::Read(), which reads a bunch
of data from the socket, in turn calling read()
on the file descriptor, and then immediately executes the callback associated with the descriptor, which usually ends up
calling receive_data with the data that was just read. Note that the callback here refers to something defined on the
C++ level, not yet a Ruby block you'd normally use in an asynchronous programming model.

receive_data is where you will usually either buffer data or run some action immediately. em-http-request feeds the
data coming in directly into an HTTP parser. Whatever you do in here, make it quick, don't process the data for too
long. A common pattern in libraries built on EventMachine is to use a Deferrable object to keep track of a request's
state, firing callbacks when it either succeeded or failed.

Which brings me to the golden rule of programming with libraries like EventMachine and Node.js: DON'T BLOCK THE EVENT
LOOP!! Defer whatever work you can to a later run of the loop when it makes sense, or push it to another asynchronous
processing facility, e.g. a message queue like RabbitMQ or Redis' Pub/Sub.

In a similar fashion, whenever you write data to a connection using send_data, it's first buffered, and not actually
sent until the socket is ready for a non-blocking call to
write(). Hence all three implementations check
for both read and write availability of a descriptor.

Fibers vs. Spaghetti

Where do Ruby's Fibers come in here? Callbacks can easily lead to spaghetti code, especially when you have to nest them
to run multiple asynchronous actions in succession.

Fibers can stop execution of a process flow at any time and yield control to some other, controlling entity or another
Fiber. You could, for example, wrap a single HTTP request into a fiber and yield back control when all the callbacks
have been assigned.

In the callbacks you then resume the Fiber again, so that processing flow turns into a synchronous, procedural style
again.

Fiber.yield returns whatever object it was handed in Fiber.resume. Wrap this in a method and boom, there's your
synchronous workflow. Now all you need to do is call get('http://paperplanes.de') and assign something with the return
value. Many props to Xavier Shay for digging into the Goliath
source to find out how that stuff works.
Helped me a lot to understand how that stuff works. If you never had a proper use case for Fibers in real life, you do
now.

em-synchrony is a library doing just that for a lot of existing EventMachine libraries, and
Goliath is an evented web server, wrapping a Rack-style API using Fibers.

Things you should be reading

Here's a bunch of free reading tips for ya. These books are pretty old, but have gone through some revisions and
updates, and they're still the classics when it comes to lower level Unix (network) programming and understanding
TCP/IP, which I consider very important. TCP/IP Illustrated is one of the best books I've read so far, and I consider it
essential knowledge to be aware of what happens under the networking hood.

Also, read the fine man pages. There's a whole bunch of good documentation installed on every Unix-style system, and I
linked to a couple of them relevant to this post already. Read it.

yield

This concludes today's whirlwind tour through some of EventMachine's internals. There's actually not too much magic
happening under the covers, it's just wrapped into a bit too much code layering for my taste. But you be the judge.

Play with EventMachine and/or Node.js if you haven't already, try to wrap your head around the asynchronous programming
model. But for the love of scaling, don't look at evented and asynchronous I/O as the sole means of scaling, because
it's not.

The method instance\_methods now returns an array of symbols instead of an array of strings. So do all the other methods that return methods, e.g. singleton\_methods, public\_instance\_methods, etc.

There's now a method define\_singleton\_method that will remove the need of using instance\_eval when you want to define a singleton method. This is both true for classes and objects. Though if you're really picky, those actually are the same.

public\_send is already an oldie, but goldie, but it can't hurt to mention it. It will fail when you're trying to dynamically call a non-public method. Good old send still works as advertised.

Enumerable#grep can work on arrays of symbols. Add that to an array of methods, and you have a way of searching for methods that's still compatible with Ruby 1.8. As a matter of fact, and thanks for David Black for pointing that out, symbols seem to be more string-like in Ruby 1.9, so you can do :symbol.match(/ymbol/).

instance\_exec is a nicer way of calling a block in the context of an object when you need to access variables outside of the block. You can give the parameters you need in the block as parameters to instance\_exec which will in turn hand it to the block.

Now, this is a terrible example, I know, but honestly I'm not too sure how useful this is in practice. instance\_exec was also backported to Ruby 1.8.7, if you're up for that kind of thing.

Blocks are now handled very similar to methods, at least when they're lambdas. Don't ask me why the good old proc method is still in there. You get ArgumentErrors when your argument list doesn't match the list of parameters specified for the block. So checking the arity is probably a good idea when you're working inside a library dealing with blocks handed to you.

Don't get me started on the new way parameters are handled in both methods and blocks. You can have optional parameters, splat parameters, and another mandatory parameter, afterwards. It's crazy, but true. Ruby will match things from the outside in. To fully understand it, I can only recommend to play around with it.

Fibers! Man, this stuff is neat. It's not really the place to explain everything around them, I've written a long-ish article on them for the RailsWay Magazin to fully understand what they actually do. Play with them, but not with the semi-coroutines version that's the default. require 'fiber' is where it's at. That gives you the full power of this neat little gem.

The Enumerable methods in Hash will now return hashes where appropriate. This is kind of a big deal, because it can break compatibility when you're solely relying on it. When you're talking to code on the outside, it's probably a good idea to still convert any results to an array using Hash#to\_a

Even though it's supposably still the same version, there's some differences in the code of WEBrick. It will simply fail on requests with request URIs longer than 1024 characters. That was a bit surprising to me, and since there was no reasonable way around it, I had to patch it to work with SimplerDB.

String now has start\_with? and end\_with?, they're also in Ruby 1.8.7.

In Ruby 1.9.2 there's now Method#parameters, which gives you a neat way to inspect parameters of a method (duh!):

As much fun as Ruby 1.9 is, having to deal with unmaintained code that is not compatible yet is a real pain in the ass. But still, it's totally worth checking out, and if you have a vanilla project coming up on the horizon, consider doing it with Ruby 1.9.

Next week I'll attend and speak at the RailsWayCon conference in Berlin. You can see me speak at the tutorial "Deploying Rails Applications", together with Jonathan Weiss, and talk about asynchronous processing in Rails applications. The line-up looks pretty good. See you there!

When we were at Scotland on Rails (excellent conference by the way, you should definitely go), and we sat in Dave Thomas' keynote, where he talked about the "Ruby Object Model", funnily enough we ran across a meta-programming wonder in that very session. It has been keeping me busy for a couple of hours, and I'd like to share my revelations. With some of them I'm not completely in the clear, but maybe they'll be a start for discussion. It's more a tale of one of my adventure through the Ruby object model than a fancy tutorial, so if you're up for that kind of thing, keep going. If not, just keep on reading.

It all started with a simple piece of code which a friend borrowed from the CouchFoo plugin. The code is from a longer file, but the basic gist of the original code looks like this:

The method named_scope is available as a singleton method on the including class. But that's the boring part, line 13 is where the action is. It's obvious what the code does, it fetches the singleton class of the current object, and defines a new singleton method on it using define_method, all that whilst being called on the class. Pretty straight forward so far, the context is pretty clear. But I was wondering, why on earth do you need to get the singleton class? Why not use define_method directly? After all, you're in the class object already. When that didn't work, I tried using instance_eval directly, calling it on the implicit self, since that must be the class, right?

We played around with a statement a little bit, and since it only worked in the way used above, it scratched my itch, and I tried all different combinations of declaring methods on the class that includes the module, and I also tried to find out where the different methods were defined in the hierarchy of modules, classes and singleton classes. Now, in terms of a normal object you're using in your code, that hierarchy is straight forward, okay not always, but you can always turn to Dave Thomas' rule of going to the right and then up. But here we're talking about classes of classes, and that boggled my mind. It didn't even always help to go down the route Dave Thomas suggests, to ask yourself what self is in the context you're in.

There's different ways to get a singleton method onto the including class, not all of them suitable for dynamically creating methods, but still good to know where they go in the hierarchy.

Let me just go ahead and share the piece of code I ended up with to work through the problem:

As you can see I'm defining the method speak on the including class Matt in five different ways. I turned to my good friend super as one way to find out where methods go into the hierarchy, and which ways of declaring methods overwrite others. The other way is simply to comment out method definitions and see how the resulting code works. I've also added some basic tracing to see where the method is called from. You'd think the puts is enough to follow the trace, but either it was already too late, or I'm too stupid, but this way it just was easier for me to follow the method chain. It's been fun playing with this, and it still hurts my brain, so let's go through it as long as the pain's still there.

The principle is simple. The class Matt itself defines a singleton method called speak. Then it goes ahead and includes the module Speech. Don't worry about the two different modules Speech and Speech::Support, they're just for conciseness and could be easily removed, I just wanted to reproduce the way the code we were originally banging our heads on as good as possible. By extending the class with the module ClassMethods it puts that same module into the class hierarchy of the class Matt. That means, when the method speak is called Ruby will first look in the included module for the method. Since it finds it there it is simply executed. By calling super we can still reference the original method defined on the class, which also means that the method was not overwritten with this code. Simple enough.

When the code reaches can_speak the magic is about to unfold, and we're going through three different kinds of method definitions. Before we look at what happens in those, let's run this code and see what the output is:

Okay, so only three of our five methods were actually called. Not too bad, but it still leaves questions unanswered. Commenting out some of the declarations will show you that all the definitions work on their own. That means that some declarations must overwrite others. Let's try and comment out the code in lines 24 to 29. You can believe me or try it yourself, but the output is still the same. What if we comment out lines 17 to 21:

Aha, we're starting to see some progress. You can play around with all of them, I sure did for a while. Change the order of the definitions, but the gist will be the same in the end, believe me.

Okay, enough playing, let's look at the details. In can_speak we define three different methods, one using class_eval, one using instance_eval on the singleton class, and one using instance_eval directly on the class object. As you can see in the first output, only three methods were being called, the method defined on the module, the one using the singleton class and the last one using the class object. At this point it dawned on me what's happening here.

When the module is inserted into the hierarchy, it's put before the original class (which was Matt, in case you've forgotten), the method speak in that class is not overwritten, but the module method will be found first during a lookup. But since we're defining a method on the singleton class of the class Matt, this one will be found before the module. The hierarchy order is singleton class, modules and then the original class, then the lookup continues through the superclasses. Fair enough, but where does that last method come in? It's the last in the chain which means that it's above the module's method.

I'm not exactly sure what happens here, I'm guessing that the code is being run directly on the class object, and not on its singleton class. If that's not the case that would basically mean that Ruby would insert another singleton class between the module and the class, but I don't think that's the case. If you have any pointer to clear things up, please add them as a comment.

The method definition using class << self on the other hand, definitely works its magic on the singleton class. It overwrites a method that might've been declared before which includes the method defined using class_eval in line 24. Both work below the level of the module which means that they must be working on the singleton class of class Matt. This is all based on a mixture of conjecture and output evaluation, so feel free to call bullshit on me, and correct me if I'm wrong.

Now, to get back to the original question, why doesn't just using define_method work? define_method is a tricky fella, it always works on the implicit self which is the class Matt, even if you use instance_eval or class_eval on the implicit self. The result will be the same in both cases: An instance method for objects of the class Matt. Thanks to Jay Fields for writing about this (two years ago already, but still)

Also, why couldn't we just use instance_eval from within the context of the method can_speak? Did you just read the last paragraph? It answered the question already. But to go into a bit more detail, you need to call instance_eval on an explicit class object or its singleton class, and not within the context of the class itself. It's mind-boggling, but true.

The irony in all this? I banged my head on this for quite a while, but in the end the simple Dave Thomas example of looking at method lookups applies: Go one to the right, and then go up. It's harder to imagine this with just classes, but in they end it's the same, because everything in Ruby is an object, you didn't forget that, did you? You just need to figure out where the methods go when you go up. All this started when he was talking about the Ruby object model, and it ends with the very same. Funny like that.

Please, do yourself a favor and watch Dave Thomas' [screencasts on the Ruby Object Model]. I tend to avoid meta-programming magic as much as I can, but it's still an excellent way to learn more about Ruby's internal workings. The screencasts sure helped me a lot.

Actually, not so much woes, as general musings. I just finished upgrading a project I've been maintaining for the last 15 months or so to Ruby 1.9, and I thought I'd share some of my experiences with the process. Looking back it wasn't so hard after all, but there were some pitfalls.

General

In general the code base is not too big, and there were only some minor issues that needed to be taken care of, some syntactical changes were required, and that was pretty much it.

The biggest problems you're likely to run across are outdated libraries or gems you're using. A while ago mocha didn't fully support the new mini-test included in Ruby 1.9, but since version 0.9.5 it runs fine. I also had to upgrade Machinist, but these are just minor issues.

The site runs on Passenger, so the most recent version was in order, and it works like a charm.

I have yet to delve into potential encoding issues, since Ruby 1.9 complained about some characters in several strings, but the new encoding header should solve these no problem.

MySQL

The biggest headscratcher, but just for a second. The official MySQL gem has received several updates, and the most recent version (2.8.1) runs just fine on Ruby 1.9, but unfortunately it's not available as a gem for convenient installation using the gem command. Thankfully Makoto Kuwata has stepped up and provides a nice gem on GitHub. Install using gem install kwatch-mysql-ruby -s http://gems.github.com, and you're good to go.

RSpec

The specs didn't run at all for a starters, but that was due to the wrong test-unit gem being installed. Make sure you have version 1.2.3 installed, then the specs run no problem. Be sure to use the latest versions of the rspec and rspec-rails gems. RSpec has a small wiki page dedicated to Ruby 1.9.1, so be sure to keep an eye on that.

Cucumber

Cucumber had similar problems, it requires a class that only exists in the test-unit gem, so you definitely need to install that anyway. The features ran almost from the get-go, but there was on problem with date selects and webrat. When I didn't explicitly select a date, it would hand over weird array constructs to the controller, which in turn resulted in assignment errors from within ActiveRecord. The solution (for now) was to explicitly specify the date I wanted, but I'd much prefer being able to leave the defaults as they are.

NewRelic

The RPM plugin references a method in Ruby's Thread class that are no longer available in Ruby 1.9. I had to manually remove the calls in the plugin to get it to work. All the changes are in lib/new_relic/agent/instrumentation/dispatcher_instrumentation. Look for Thread.critical and removed its usages. I have yet to find out if that in any way affects the plugin, but for now it'll have to do.

And yes, that was it for me. At least on that specific project. To sum up, I spent about two hours fixing the issues, and now, on Ruby 1.9, my full test suite is running about 30 seconds faster than on Ruby 1.8. Totally worth it, if you ask me.

Yehuda Katz recently wrote a post about good old super, probably one of the most underused keywords in Ruby, sadly enough. What can I say, it hit right home. It pretty much nailed what's wrong with alias_method_chain, and pretty much put in words how I felt about it too. It helped to explain why I get a weird feeling in my stomach when I see how plugins like authlogic implement some of their functionality. To sum up: it just feels wrong.

If you don't remember, super is what will call a previously overwritten method in the class chain. The cool thing is that this chain also includes any module that got included in another class or module.

So what does that mean? It means that e.g. ActiveRecord can through out its current approach of hooking things in using alias_method_chain and make life a lot easier by just using super. Neat, huh?

When I wrote the Capistrano extension to extend it to support parallel execution of arbitrary tasks, I started out re-opening the existing classes and modules of Capistrano, and with aliasing some of the existing methods. That was nice as long as I ran e.g. my tests from inside the Capistrano source. When I wanted to move it into a separate project, things got ugly, depending on the order in which my and Capistrano's source files were loaded. They just overwrote each other's methods. No surprise here, but it made me rethink the strategy.

I ended up moving the extension for each class into a separate module, using a different namespace called Extensions, and finally just included the modules in Capistrano's class Configuration, where all the magic happens. Where I referenced overwritten methods I just used super. The code in question now looks like this:

In my opinion a lot nicer to read than the previous version using alias_method_chain. But that's just me. Some libraries prefer to go overboard instead of using what's obvious. I would mention inherited_resource again, but that would be two times in a row.

It's weird that after some time you pretty much rediscover what is still so natural in other object-oriented methods. Using aliases is cool, but I prefer to avoid them, especially when the other option is to use simple inheritance mechanism. After all, Ruby is an object-oriented language.

One of the cool things about Ruby is the possibility to make your method's intent more expressive using the question mark or the bang. There's no need to argue about the use of the question mark method, their intent is to ask something, whether it's the the status of an object or if the current weather is suitable for getting out your longboard. Their result will usually be true or false, including nil or not nil.

The other punctuation mark method on the other hand, the bang method, has lead to some confusion as to what its intent really should be. I'm guilty as charged here, a long time I was confused about the difference of using a bang at the end of your method name really means. I guess I should thank David Black for making me (and everyone else) aware of what the difference between a method with a bang and a method without a bang really is.

And here is where the confusion already starts, there being a difference would imply that there need to be two different methods, like in Rails, you have save and you have save!, create and create!, and so on. They usually differ in that the bang version will raise an error, and the normal version will return a value with which it will tell the caller that the call succeeded or failed.

A weird notion arose from that, and I have found in it lots of projects. The notion is that when a method calls save it changes the object, and therefore can have a bang at the end, because it's doing something potentially dangerous. Hold on, saving an object is something dangerous? If you're thinking about it this way, you might as well start banging your methods (pun intended) throughout your project.

A simple example:

def publish!
self.published_at = Time.now
save
end

Now, to use the method in your code, we could have something like this:

if !publish!
# ...do whatever you do in this case
end

I don't know about you but that just looks confusing to me. You're abusing a method whose intent is to signalize that you're doing something potentially dangerous to simply make it "obvious" that your method also saves the object. If you're going down this road, then why not write the name using all uppercase?

def PUBLISH!
self.published_at = Time.now
save
end

There, that'll show 'em. If you really want, you can use define_method and give a method name like "PUBLISH!!!".

The Rails extension inherited_resource pushes this a little bit too far, and thank goodness you don't have to use the following way of implementing your RESTful actions:

Here destroy! is an alias for the method destroy in the superclass. The reasoning is that calling super is not readable, and using destroy! gives it a more DSL-like lookie. I just find this style of using bang methods extremely confusing, and the intention is far from being clear. You'd expect destroy! to do something "dangerous", but it's just an ugly way to call the destroy method in the superclass. But the story on super is a totally different story, and material for another blog post.

What you should be doing instead is something along the lines of this:

A bang method should exist together with a non-banged version, or to have a dangerous and a non-dangerous version of your method. Whatever dangerous means depends on the context of your method, but you get the idea.

No need to use if, when someone fancies the bang version, he can just go ahead and use it anywhere. This is how several state machine plugins implement their state changing methods, heck, this is how the Ruby standard library uses it, and this is how you should build your own methods. The bang is not a way to just express that the call will change something in your object, that's what methods on objects usually do, big surprise.

I recommend reading over David Black's post on the issue, it sure gave me a clearer picture. I've written no new bang method since then, because if you think about it, you don't have the case very often where you actually need two versions of the same method. In a library sure, but in your application? Meh. Using non-banged methods in my opinion makes code a lot clearer, especially when you accept the notion that the bang method should only exist in the context of a non-banged version.

The guys over at Pivotal Labs wrote a small piece on a neat tool called XRay. It hooks into your Ruby code to provide Java-like signal handlers which dump the current stack trace into whatever log file seems fit. Using Passenger that'll be your Apache's error log file.

Say what you want about Java and it's enterprisey bloated-ness, but some features of it come in quite handy, especially when they allow looking into your processes without immediately having to turn to tools like NewRelic or FiveRuns.

Just add the following line to your application.

require "xray/thread_dump_signal_handler"

From then on you can use kill -QUIT <pid> to get a trace dump in your log file. Neat! The website says you need to patch Ruby, but for me it worked with the Ruby that comes with Leopard, and Ruby Enterprise Edition.

There's something I see in lots of projects is an overuse of self. Sure, it looks a lot nicer than this, but its overuse can clutter code quite easily. Here's a rather simple example.

def published?
!self.deleted_at? && self.published_at?
end

Others use self even for just calling a method from another method. Why's that again? I just don't get it, it feels unnecessary, and is just five characters too much for every usage.

Sometimes I think that the programmer who wrote the code either doesn't understand the concept of the current scope of self and when using it is necessary, or is coming from Python (no offense Python, I still like you, but I'm not so fond of your usage of self). Let's remedy that, shall we?

Some also claim that it improves clarity, for the courtesy of other programmers working on the code, and to make it clearer in what context the method is called. If you need to make that clear, your method is simply too long.

The biggest confusion surrounding the usage of self from within a method stems from the different handling of local variables and method lookup. Consider the following code:

def publish
self.published_at = Time.now
save
end

It's a totally valid piece of code, and one I can totally get on board with, because it does what it's supposed to do. In my early days with Ruby, I used to write the code like this:

def publish
published_at = Time.now
save
end

Then I wondered why the instance variable published_at wasn't saved. The answer is simple. When you have something that looks like local variable, and you assign to it, Ruby will obey and create a new local variable called published_at, no matter if you have a write accessor defined that looks exactly like it. It will go out of scope as soon as the program's flow leaves the method.

But what about save? Ruby will first look for a local variable. Now, I'm not arguing that you shouldn't have a local variable called save anywhere in your program, and if you do you might want to rethink that. But since, for now, there is no local variable with that name, Ruby will turn to its method lookup mechanism to resolve the identifier in question. If it can't find anything you'll get the much loved error NoMethodError.

If clarity is what you're longing for, learn Ruby's rules of resolving local variables and methods. Will make your life much easier.

So what is Ruby doing in the former version of the method body? You're pretty much forcing it to go directly to the method lookup. With the accessor magic it will find a method published_at=(published_at) and call it. Easy.

What's up with that? Pretty self-involved if you ask me. It's like using self just for the sake of it. Using self when assigning to instance variables, so might as well use it everywhere in the method for consistency, right?

Now, imagine that piece of code also having a call to a private or protected method in it. Of course you can't call those directly on an object, only on an implicit self:

Gross! The code looks more and more confusing, and I don't appreciated confusing looking code.

Of course, if you're using ActiveRecord, why not save a full line?

def publish
update_attribute(:published_at, Time.now)
end

So let's review the initial example. We're only doing method calls, and Ruby will figure out our intention all by itself. So how about the simple version:

def published?
!deleted_at? && published_at?
end

Wow, so simple. Much easier on the eyes, and the intention is clear right from the start. My rule is simple: When assigning to an instance variable, use self, calling a method on the other hand should stand all by itself. Now, you could argue, that assigning to an instance variable using its accessor is also a method call, but if you really want to argue about that, you should really read this blog entry again.

After being annoyed with running multiple versions of Ruby just by using MacPorts I finally gave in and tried out rvm, the Ruby Version Manager. That stuff got even more annoying when I tried to make Bundler behave well with multiple Ruby versions, because it just doesn't by default. It's not really a problem with normal gems, but Bundler falls apart with its defaults when you're trying to run gems with native extensions. Hint: Set bundle_path to include RUBY_VERSION and make some links from one cache directory to another to not have every gem cached for every Ruby version.

The promise of being able to easily switch between different versions and still having just one ruby binary and not one called ruby1.9 with MacPorts is just neat. While installing them is straight forward, using them from e.g. TextMate is not great. The common solutions of just launching it from the command line or modifying the TextMate Ruby bundle (these changes will have to be made again with the next TextMate update) are not fully acceptable for me, because it still doesn't allow me to switch Ruby versions while TextMate is running. That's one "flaw" rvm has, at least for me. It switches the paths for the Ruby versions for the current shell, it doesn't offer anything to set links in ~/.rvm/bin to the currently active Ruby version, at least as far as I know. No big deal, if it's by design I can live with that, I do think it'd be a nice addition though.

Anyway, I wanted to switch Ruby versions from my shell and have it affect the version I'm using to run my tests from TextMate too. The way to go seems to be rvm <version> --default which will set the default for all other shells. Be aware that it will do what it says, but I could live with that. It's more important to me to be able to make that switch than just having several shells with different versions in each. First step was to shorten that command, because let's face it, that's a lot of text. I added a function to my .zshrc. It should work just as well with bash, but really, you're still using bash?

rvmd() {rvm use $1 --default}

Now you can just rvmd 1.9.1 in your shell prompt and be done with it. Much better.

The other part was telling TextMate what Ruby binary to use. The problem outlined above made that a bit of a pain, so I broke out my shell scripting fu and cranked out this amazing wrapper script, using what rvm already dumps in your rc files:

Impressive, eh? It just sources the rvm script and then calls the ruby binary that is currently set as default. Make it executable and set a shell variable in TextMate called TM_RUBY and make it point to that script, and you're good to go.

I wanted to play the field of continuous integration tools a little bit more, so I finally gave Integrity a go. Its promise of being lightweight sure was tempting.

It's a small Sinatra app, and therefore should've been easy to set up using Passenger. The Integrity team recommends using nginx and Thin. Though I'm rather fond of nginx, I don't see any point using a different setup just for my CI system.

Getting it up and running is rather straight-forward. You create your local Integrity template directory using integrity install /path. For this to work with Passenger you also need a directory public in that directory, so if you create that you can just throw the usual Passenger virtual host block into your Apache configuration, the document root pointing to the freshly created public directory, and you're good to go. In the current master, there's already a fix for this issue, and running integrity install will create the public directory for you.

I have some gripes with Integrity though, one of them being that configuring notifiers for projects currently seems to be broken. It's sort of a big deal to me, because continuous integration lives from the team receiving notifications.

But otherwise, do give it a go, it's pretty slim and pretty slick too, though it doesn't yet have asynchronous building. It needs some sort a hook, e.g. a GitHub one to run automatically. There's also a demo available for your viewing pleasure.

Update: The issue with notification configuration not being saved seems to be resolved in the current master. It's not perfectly fixed, but at least now I can have Integrity notify me through Twitter. So if you need to, fetch the master, and build your own gem. Remember to rename it to 'foca-integrity' in the Rakefile's Jeweler configuration, otherwise it won't really catch on.

For a new project I wanted to try some new things, the moment just seemed right, so let me just give you a quick round-up.

Machinist - Now I liked factory_girl, but after looking into Machinist it still seemed too tedious. It took me a while to replace the fixtures with Machinist, but it was totally worth it.

resource_controller - Way to DRY out your controllers. It abstracts away a lot of the tasks you repeat in RESTful controllers, but in a way that doesn't feel like it's totally out of your hand. Just the right amount of abstraction.

Cucumber - When I first saw the PeepCode on RSpec user stories I was a little bummed, but that was mainly because the PeepCode itself didn't really show the power of stories for integration tests. Quite the opposite, it used the stories to directly work with the model, and to test validations. Not really what I fancied, I already had a tool for that.

But Cucumber, where have you been all my life? I started working with it today, and just after a few hours it already felt so natural to put the things you expect from your application on the user level into sentences, and to write or reuse the according steps. If you haven't already, do give it a go. It's been the missing tool for integration testing in my toolbox, and I'm in love with it already.

It integrates nicely with a lot of things, for me right now, Webrat is sufficient, but if you fancy it, use Selenium, Celerity, Mechanize or whatnot.

In other news, I gave actsassolr a new home, it's not fancy yet, but at least there's an up-to-date RDoc available.

Personally, I'm a big fan of Webistrano, a neat web app that sits on top of Capistrano, adds some nice features, and generally makes the deployment process a little bit easier.

But I didn't want to have to be in the web application all the time to monitor the deployment's progress. Plus, I wanted to have a project to play around with RubyCocoa. Webistrano comes with an XML-based REST-API. So why not throw all these together, and build a nice application around it?

I originally started working on it last April (I think), using plain Ruby APIs, but quickly discovered that they just don't match with Cocoa's view of the world, especially when it comes to asynchronous things. It's still not a perfect match, there are some glitches, but the current state works out pretty well.

Enough blabber, in the spirit of Getting Things Done, I'm officially announcing the first public release of Macistrano. It allows you to run and monitor deployments from the comfort of your desktop. That's pretty much all it does, but the goal is to make it do that perfectly of course.

If you're using Webistrano (which you should, go and install it asap), give a Macistrano a whirl, and let me know how you like it. Head over to the project's page for more information and download. Check the GitHub project page if you're interested in the source code.

It's no secret that I totally dig Webistrano. It's superior to just using Capistrano in so many ways. Although I'm still working on Macistrano (it's bound to be released soon as well, I promise), some people I told Webistrano about asked me if they still could use a simple cap deploy to fire off a deployment.

Obviously I told them "No", but I thought about the problem, and it turns out it shouldn't be that hard. Basically you need to hook into Capistrano and hijack all the defined tasks and callbacks. Turns out that this is not too hard, so I gave it a shot, and the result is a simple gem called cap-ext-webistrano.

It's a drop-in replacement which hooks into Capistrano and replaces all tasks with one that sends the task to Webistrano. Neat, huh? So if you want to take advantage of Webistrano, but are still attached to your command line, this is the tool for you.

You'll still need Capistrano's default configuration files. You'll need to add two lines to your Capfile. Also you can use Capistrano's way of setting variables to configure the plugin. Patch your deploy.rb file according to the example below, and you're all set.

To get a weird RSpec mock error working again, I tried to look for a solution to dynamically add and remove methods on each spec run due to some odd ends in the current RSpec edge version. Sounds weird I know, but what are you gonna do. I went for a different solution in the end, but still this was good to know.

You need to get the class' singleton class to remove the method again. Everything else will fail miserably. But this works like a charme:

RailsConf Europe 2008 is over (and has been for a few days now, I know), so it's time for a recap. In all it was much better than I expected, last year's conference was a bit of a disappointment, so my expectations were low enough to be positively surprised.

Bratwurst on Rails was, if not a huge success, a great event nonetheless. Probably because of the rain not as many people showed up. There werre around 200 guests, but I sure hope they had a good time. In the end most of the cupcakes were gone, and all the bratwursts. So it's all good.

Day 1 started with the tutorial sessions, including one I hosted with Jonathan Weiss on deploying and monitoring Rails applications. Over the course of four hours we gave an introduction on deployment setups, doing the dirty-work with Capistrano, and doing basic monitoring. The slides are up on Slideshare, enjoy!

The day ended with a panel consisting of David Heinemeier Hansson, Jeremy Kemper and Michael Koziarski. They talked pretty much with themselves for the lack of questions from the audience. Someone came up with the question (probably for the millionth time), if Rails is gonna switch to RSpec for testing.

RejectConf sure was a highlight in itself. Good fun, entertaining people and drinks in a pleasant environment. What more could you ask for?

Second day started off with David's keynote, and I gotta say, it was most excellent. He talked about legacy code. Still working on the oldest piece of Rails software he knows his fair share about it. Lots of programmers coming to Rails tend to forget that eventually every new project they work on will turn into legacy code, and needs to be maintained. So David's talk took the more inspirational approach, compared to his previous keynotes.

Quick summing up of the day's sessions:

Jon Dahl's talk gave a good introduction on MapReduce, not specific to a certain framework, but to how MapReduce works.

Yehuda Katz's talk on jQuery sure filled up the seats quickly. And it actually was a pretty good introduction. Favorite quote (more or less accurately): "Valid markup is when every browser can display it, not when the validator says it's valid."

Intellectual Scalability presented an approach on how to scale applications by separating them into different micro-applications running on different servers.

Wilson Bilkovich's talk on Rubinius gave some updates on the state of Rubinius, but was probably more interesting for people interested in building compilers (LLVM, if you must know).

Jonathan's talk on Security on Rails attracted a big crowd too, and rightly so. Hadn't seen it before, so I can safely say I learned some things as well.

The day ended with Jeremy Kemper's keynote, though I think that it'd have fit better into a normal session. It was a good talk on performance of Rails applications, but it wasn't exactly keynote material.

I attended the BoF sessions on MagLev and Merb, and both were pretty good. One thing I didn't understand is the heat some people gave the guy from GemStone. MagLev is still in its early stages, but I'm so looking forward to giving it a spin once it's ready for the public.

On to day two

It started off with with Matt Wood's talk on Genomes on Rails. He's working on an impressive project, involving an even more impressive amount of data.

Jay Fields talked about lessons learned from functional testing. Pretty good talk. I can't say it was all news to me, but still a good inspiration for people not yet into testing.

Justin Gehtland's talk on Small Things, Loosely Joined and Written Fast sure was one of the best of show. He's an excellent presenter, and introduced using ActiveMessaging and RubyCAS (welcome to my list of things to try out) to keep things loosely coupled.

For lack of anything more interesting I attended the talk on "Treading Rails with Ruby Shoes". Let's just say it was a different approach on presenting. And that's that.

Tammo Freese flashed the crowd with some serious Ruby meta magic afterwards. Tough stuff, but it still matters to people writing and/or using plugins.

I finished off the day with Adam Keys' talk on handling complexity in Rails applications. While nothing new to me (I'd used the things he mentioned in several projects already) it gave a pretty good overview on how to handle some of the complexity in Rails applications.

In all it was a pretty good conference, I met a lot of nice people and had a pretty good time. Sadly it won't be in Berlin next year, but let some other European city be the center of the Europe Rails community for a change.

"In fact, stop worrying so much about other people. Every time I've
worked on a project I thought other people would really love, it was a
massive flop. Every time I've worked on a project I loved, it worked.
If you're sitting in this room, your taste is not as far off from
those around you as you'd think. Build something you love and others
will love it, too. (Not everyone, of course.)"

That's what it's been. And who needs to work when there's so many nice projects to work on, eh? Well actually, I did work, but in my free time I also worked on some other things, a new one, and existing projects.

I started wrapping up Macistrano for a first release. It's looking good so far, but I still need some new icons, so that it won't look like you have two instances of CCMenu running in your menu bar. If you're wondering what it is: It's a small desktop frontend written in RubyCocoa (so yes, it's Mac only) to monitor deployments running on Webistrano. Webistrano in turn is an enhanced web frontend for Capistrano. So as you can see, there's lots of meta involved here. I basically built a tool to access a tool that runs a tool. Neat stuff, eh? If your deployment is finished in Webistrano, Macistrano will notify you through the convenience of your desktop. You can also fire off deployments with it.

Speaking of Webistrano, I had good fun working on it too. Mainly some stuff that I wanted to see in it, like code preview for recipes, syntax check, versioning. But something that really scratched my itch was import for existing Capfiles, so I just had to try implementing it. As Jonathan will no doubt confirm, it was one of the first questions that pop up, when you tell people about Webistrano: Can it import my Capfile? Fret no more, it might just do that in the near future. Nice experience, because your definitely have to have a look at the Capistrano source to find out, how it actually stores the configuration internally.

Then there's ActiveMessaging, a small library I have a certain knack for. I wanted to see support for storing messages in case the message broker is down. JMS has something similar, so why can't ActiveMessaging? I built the initial code in February, but didn't get around to actually finishing it until recently. What it does is save your message through ActiveRecord if there's an error that indicates your broker is down. That way your application most likely won't be harmed if your messaging infrastructure has a problem. To recover them you can simply run a daemon that will regularly poll the storage and try to deliver the messages. The code didn't make it to the trunk of ActiveMessaging yet, but you can have a go at it on GitHub.

I also read "The Ruby Programming Language", a highly recommended book. A detailed review will follow. But first I'm off to my well-deserved honeymoon.

After more than two days of removing deprecation warnings, adding plugins, fixing some custom additions, going through the whole application, it's finally done. We're running Rails 2.0. Nothing more gratifying than seeing this, well except for the application running without problems:

There were some minor annoyances, but in all it was straight-forward work. One thing was that actsasferret 0.4.0 does not work with Rails 2.0.2, but the upgrade to 0.4.3 doesn't go without any pain either. In 0.4.1 index versioning was introduced which will happily start indexing your data when you first access the index.

Static typing considered harmful. While the post itself is spot on, it was more the quote from Stuart Halloway that caught my eye: "In 5 years, we'll view compilation as the weakest form of unit testing." I have nothing to add.

Several of my friends are picking up Ruby these days. Just like me, they're coming mostly from the Java world. Good thing about that is that they're asking me questions about Ruby. Always a great opportunity to dig more into the language, and to write down some tidbits that came up.

When you're coming from Java you're used to private being private. When you declare a method as being such it's off limits as soon as you leave the class' scope. No way to reach it from subclasses or, god forbid, call it from another object. Well, you could use that really awkward reflection code and really ram it in, and eventually call it, but that's just tedious.

Different story in Ruby. Using the keyword private is more of a marker than a mechanism to enforce access restrictions to an object's methods. You can even call private methods from subclasses, no problem. There's just one caveat, a private method can't be called on an object. It must be a lonely method call. No self for you today.

Huh? Did we just call a private method from a subclass? We sure did. Again, private restricts you in a way that you can only call the method from within the context of a class (subclass or not), but without calling it on an object. Of course there's a different way, you can just un-private the method, but that's just mean.

class Freedom < Restricted
public :secret_method
def public_domain
self.secret_method # free as a bird
end
end

In a world where you can change almost everything about a class, private is merely something to remind a developer using a class that calling this method out of context or from outside the class is probably not the way to go. But if he still wants to go ahead, it's his responsibility, including all the risks, possible internal changes in a future version and the like.

The title totally overemphasizes the topic, but here we go. By default ActiveMessaging will process all your queues in one thread. All messages from all queues will be processes sequentially. This isn't always what you want. Especially in scenarios where you have both long-running tasks being kicked off through messages, and rather short-lived tasks that you just fire and forget.

ActiveMessaging has a rather simple and not-so-obvious way of dealing with that: processor groups. There's some documentation on them, but it doesn't bring out the real beauty of them.

Basically you split your processors in groups, how finely grained is up to you. A simple way would be to just separate long-running from short-lived tasks. You just have to define these in config/messaging.rb:

Now that you have these, how do you get them to run in different threads? If you just use script/poller start, it will continue to work through all messages from all queues. You need to start each processor group individually:

Keep in mind though that you can't stop just one poller for one particular processor group. Running script/poller stop will tear them all down. Which comes in handy during deployments. That way you only have to ensure that all your process groups are started during the deployment, but not about stopping every one of them.

ActiveMessaging will run each group in a separate thread which all are monitored by the poller_monitor. The latter will only be started once, courtesy of the daemons package.

I've been playing around with ActiveMessaging recently. Well, actually more than that. I integrated it into a project for asynchronous processing. It's a pretty neat plugin. We're using StompServer as a message broker, and therefore the Stomp protocol to publish and poll the messages.

Now Stomp is a pretty simple protocol and breaks down when you're trying to deliver "complex" data structures like hashes, arrays or *gasp* objects. That's not a bad thing per se, since we can serialize them with YAML. Of course you could just always do that by hand before publishing a message, but let's face it, that's just tedious.

The author of ActiveMessaging recently added support for filters. They can be run after publishing a message and/or before processing it on the polling side. I hear it clicking on your end, why not use filters to do the serializing work for us? Right on!

There's a lot of complaining, especially from people coming to Ruby from the Java world, about the lack of a language specification. And while a lot of effort is put into the RubySpec project to at least have a test-driven specification, the written word has been silently ignored for a long time. At least in terms of information technology.

There used to be a book called "Ruby in a Nutshell", written by Matz himself, but it mainly dealt with Ruby 1.6, and is therefore seriously outdated.

David Flanagan (author of several great books on JavaScript and Java) set out to fix that problem. With the help of Matz he dove deep into Ruby and wrote what I can only describe as its only valid written language specification. The result is "The Ruby Programming Language".

The book doesn't take any unusual path when it comes to its structure. It deals with the basic structure of a Ruby program, datatypes and objects, expressions, operators, control flow, methods, procs, lambdas, classes, modules and finally, reflection and metaprogramming.

Whatever you consider part of the Ruby language, you'll be sure to find it in one of those chapters. David manages to get the whole of the language into a mere 300 pages. Both a testament to the compactness of Ruby and David's skill to explain each part as simple as possible.

You'd expect something like a language specification to be boring (if you don't, you obviously haven't read the Java Language Specification or The C++ Programming Language), but I'm happy to report that is not the case. While you shouldn't expect an entertaining read, you can expect to learn all the little details you somehow have not yet grokked about Ruby. The book finally opened my eyes on the difference of proc and lambda. The book ends with a discussion of the core classes of Ruby, including API changes between 1.8 and 1.9.

It is up-to-date with Ruby 1.8, and includes most of the features of Ruby 1.9 including of course, fibers.

My verdict is simple: If you work with Ruby, buy this book. It's pretty much the most complete book on Ruby you'll find. While "The Ruby Way" is an excellent reference "The Ruby Programming Language" is meant as a guide through the language. You can read it once, and get back to it when you need to. And seriously: You should read it. There's very likely some parts of Ruby you haven't worked with in all detail yet. This book does a good job in helping you uncover them.

Disclaimer: O'Reilly provided me with a copy of the book for reviewing purposes. My excitement about it is real.

It does feel a little shaky still, I couldn't get debugging to work, but it's a good start. I'd love having something like IntelliJ at hand, not necessarily for every day work, but especially for stuff like a good debugging session.

Keep it coming, JetBrains.

If you want to try it out, make sure you have a lot of screen real-estate at hand.

This post has been lying in my drafts folder for a while now, and since I'm trying out new approaches to shrink oversized controllers, it's about time to put this out, and get ready for describing alternatives.

One of the basic memes of Rails is "Skinny controller, fat model." And even though most examples, especially the ones using the newer REST features in Rails advertise it, there are still heaps of controllers out there that have grown too big, in both the sense of lines of code and the number of actions. I've had my fair share of super-sized controllers over the last months, and it's never a pleasant experience to work with them.

If you don’t believe me, look at the Redmine code. Whether it's laziness or lack of knowing better doesn't really matter. Fact is, if it keeps growing you'll have a hard time adding new features or fixing bugs. Error handling will become more and more cumbersome the more code you stuff into a single action.

And if it's pretty big already, chances are that people will throw in more and more code over time. Why not? There's so much code in there already, what difference does it make? Broken windows are in full effect here. If the controller is full of garbled and untested code already, people will add more code like that. Why should they bother writing tests for it or refactoring it anyway? Sounds stupid, but that's just the way it is.

On a current project I refactored several (and by that, I mean a lot) of these actions to merely four to ten lines of code. The project is not using RESTful Rails, but it wouldn't make much of a difference anyway. I've made some experiences that worked out pretty well for me, which would very likely help to make a controller RESTful. But that wasn't really the main objective on my current project. If they're still up to par when fully using REST I'll leave up to you to decide or, even better, update.

I’m not going to put code into this article, since most of the stuff is graspable without looking at code. If you see the sort of code I’m talking about you’ll understand.

It's actually just a few simple steps, but they can be both frustrating and exhausting, even when you take on step at a time (which you should, really).

Understand what it does

It's too obvious, isn't it? But still, a beast consisting of 160 of code should be approached carefully. Understand what each line does, and more importantly, why it does it. In an ideal world you can just read the code, and understand what it does, but we both know that’s wishful thinking. If you’re lucky you can just ask someone who knows their way around. But oftentimes you’re out of luck, and just have to gather as much information from the code as you can, or from playing around with the application itself.

Don't just look at the code, run it from the application, look at the parameters coming in from the views. Annotate the code if it helps you, it might help others as well.

Look at the view code too. This also isn't always a pleasant experience, but you will find hidden fields, parameters set and handed through in the weirdest ways for no apparent reason.

Test the hell out of it

Most likely the action at hand does not have any tests available to ensure your refactoring will work out well, otherwise you very likely wouldn't be in your current position. If it does they might've been abandoned a long time ago, and it's not even safe to say if the tests are still testing the right thing. If you have a big controller with a good test suite in place, even better. Check if they're testing all aspects of the code about to be refactored.

If not, take this opportunity to write as much tests for it as possible. Test even the simplest features, with as much detail as you can or as the time available allows for. You don't want even those features to break, do you?

I easily end up with 50 new test cases for a bigger action during such a run. Resist the temptation to refactor while you write tests. Mark parts of the code if you get ideas what to do with them, and get back to them in the refactoring phase.

Avoid testing too much functionality at once in a single test case. Keep them small and focused on a single aspect of the code in question. It doesn't have to be tests with just one assertion, but keep it focussed on a specific aspect of the method in question.

Basically it's now or never. This is the chance to improve test coverage, so do it. You'll be happy you invested the time, it will give you a better understanding of the code, and will ensure that the code still works.

It’s a painful process, but it also helps you to really understand what the code does.

Avoid complex fixtures

I don't use a lot of fixtures anymore in my functional tests. They're not just a pain in the ass, they're hard to set up, especially for complex actions, and they kill test performance. Try to use mocks and stubs instead. If you test your action line by line you can easily identify the methods that need to be stubbed or mocked. If you prefer it the scenario way, use something like factory_girl to setup objects for your tests. I'm a fan of stubbing and mocking, but too much of it will clutter your test code. I've been using it heavily for a while, but it tends to become a sign for bad design when you're using too much of it. So I've returned to using scenarios based on the task at hand, even if they hit the database.

If you turn to mocking/stubbing initially, make sure you untangle the potential mess afterwards. Even though the database can make your tests slower, in the end you want to test the whole thing.

You also want to stub out external collaborators, like web services, Amazon's S3 and the like. They don't belong into your controllers anyway, but moving them somewhere else might just open another can of worms (think asynchronous messaging), and introducing that level of complexity is just not what you need right now. Though you might want to consider it eventually.

Move blocks into methods

I'm not speaking of a block in terms of proc and lambda, but in the sense of conditional branching and the like. Longer if/else clauses usually are good candidates for getting code out of a long method into a new one, and you usually can go ahead and do just that. Once you've moved stuff out into methods, it's a lot easier to move them into modules or the model, but only if the blocks depend on parameters you can't or don't want to reach in your model.

Try to avoid the temptation to look for similar code to consolidate in the same or other controllers just yet. Make a note, and wait until you have tests in place for all the affected code. Then start to make the code DRYer by moving it into a place more common for all the classes that require it.

Break out new controllers and new actions

The same rule that applies to adding new actions to controllers also applies to working on existing ones: Having lot of actions in one controller usually means that it's doing more than it's supposed to. More and more actions usually mean there's code that's not really the controller's responsibility, solely speaking in terms of concerns. If the controller responsible for logins also takes care of a user's messages, then it breaks the separation of concerns. Move that stuff out into a new controller.

But if you can, it's also feasible to break out new actions. That's a good option when you have an action that responds differently based on input parameters or depending on the HTTP method, an old Rails classic. It will have the advantage that stuff like error handling will get a lot simpler. Big actions that do different things all at once tend to have a complex setup for catching errors. Several variables are assigned along the process, and at the end there's a long statement that checks if somewhere along the way an error occurred. If you separate the different tasks into smaller actions, you'll end up with much simpler error handling code, since it can focus on one thing, and one thing only.

The same goes for all classes really. Although with a model it's not always easy to break out a new class. But what you can do is break out code into modules and just include them.

Extract filters

Filters are a nice fit for parts of the code where objects are fetched and redirects are sent in case something isn't right, especially since the latter always require that you step out of your action as soon as possible, before doing any further logic. Moving that code out into methods, checking their return code and returning based upon that will make your code look pretty awkward. Filters are also nice to set pre-conditions for an action, pre-fetch data and the like. Whatever will help your controller actions do their thing with less code, and doesn’t fit into the model, try fitting it into a filter.

Try to keep them small though. It's too easy to just break out filters instead of moving code into the model, and it will slightly improve the code for sure. But what you really want is a small and focussed controller with a couple of lines of code in each action, and a few supporting filters around them.

Move code into the model

This is where it gets tricky, but now you can get all that business logic where it belongs. To get code out of the controller and into the model, you have to make sure it doesn't rely on things that's only available in it. params, session, cookies and the flash are the usual candidates here.

But there's always a way to work around that. Oftentimes you'll find code that assigns an error message or something similar to the flash. That kind of stuff is sometimes easier to handle in validations in the model, if it's dealing with error messages. I've seen that a lot, and it's just not the controllers responsibility to do all that work.

If your controller code is heavily dealing with stuff from the params hash, you can usually just hand that over to the model. Given of course that you properly cleaned it up first into a before_filter, or ensured that proper validations are in place.

You’ll usually find lots of constructed finders in controllers. Go after those too. Either use named scopes if you can, or create new finders in the model. It's already a lot easier on the eye when all that hand-crafted finder code is out of the way, and tucked neatly into a model.

Code that checks model objects for errors, validity, etc. belongs into validations or callbacks in your model. Just like any other code that’s none of the controllers’ business. Which basically is to mediate between the view and the model, and to do everything required to get that task done fast and without hassle. So that's the next step. A lot of times you'll find controllers setting arbitrary instance variables based on the state of the model. Rings a bell? Sure, why should the controller store the state of the model? It just should not. That's what the model is for, right?

When you’re done moving code into the model, move the according test code from functional to unit tests. Tests that used to test the business logic from the controllers perspective can now do so from a unit test. That way your functional tests can solely focus on what your web layer does.

Over time you will get an eye for code that just belongs into the model, and code that could be moved into the view, or that needs to stay in the controller. It takes practice, but the more the better. Working with legacy code is oftentimes an opportunity, not a punishment.

Know when to stop

Now that you have a skinny and tested controller, why not just keep going? It’s easy to fall into the refactoring trap. It’s just such a nice feeling of accomplishment. If you look at it that way, you could just keep refactoring your application’s code forever. But who will build the new features? Who will fix the bugs?

Avoid refactoring for the sake of it. Refactoring is an important part of the development life-cycle, but you need to find a balance between your role as a programmer, where you add new features and write tests for them, and the refactoring part.

So I could say “rinse and repeat”, but when you’re done with the controller at hand, leave it be. Get a coffee, and bask in the glorious feeling of just having done your code and the developers to come, a big favor. Unless of course, you have a lot of time. In that case, for the love of code, keep going. But that’s usually a luxury. What you can do instead is plan in time for more refactorings when you’re adding features on a controller that’s another mess. Clean it up first, then get going with the new code. When you see code that could need a refactoring while working on something different make a note (and by note I don't mean TODO, FIXME, and the like, they will get lost in the code, and never be looked at again), and get cracking on it later.

This was just a beginning though. There's still things that surprise me when I work with legacy Rails code, and they want to be dealt with in their own specific ways. As I mentioned earlier, I'm still trying out new things, and I'm going to keep posting about them.

Please, share your experiences on the matter. I'm pretty sure I'm not alone with the pain of oversized controllers.

At RailsConf Europe, Yehuda Katz showed off a small yet totally useful feature of Merb. A method called run_later that does nothing more than queue the block it receives as an argument to be executed after the request is done.

After the announcement that Rails 2.2 would be thread-safe, and after seeing Nick Sieger's work on a connection pool for ActiveRecord I thought that this feature should be usable with Rails as well.

So without further ado, run_later is now available as a plugin for Rails. It simply fires of a separate thread when your application starts and will work through whatever you put in its queue.

The worker thread will be on a per-process basis due to Ruby's in-process threading. So each Mongrel you start will have its own worker thread. It's not the place to put long running tasks that need some kind of separation or need to run in the order they arrive in the queue. For that, turn to tools like Starling/Workling or ActiveMessaging. Also, it doesn't use a persistent queue, so if you have important messages to deliver, again, turn to more advanced tools.

But for smaller tasks that don't need to run within the context of the request or would potentially block it, run_later is pretty much all you need.

Integrating it in your controller is simple:

The only requirement: Rails 2.2, the current edge version works fine. With any version lower than that especially the behavior of ActiveRecord will be unpredictable. But if you're using Rails 2.2, give it a go, and let me know about your results. Mind you, I haven't put this into production myself, so some stability testing is still on my list.

I'm planning to throw in some kind of scheduling so that you can say something like run_later :in => 10.minutes.

Credit where credit is due: To Ezra for adding the code to Merb, for Yehuda to mention the feature at the BoF session and giving me the idea to port it. Not to forget the effort to make Rails thread-safe in the first place. I merely adapted their code to work with Rails, and threw in some tests as well.

PeepCode on RSpec User Stories. The user stories look awesome as a replacement for Rails integration tests. The PeepCode is a good introduction on the topic, but falls awfully short on that issue. Using basic steps like saving an object, checking if it's valid and checking whether it was actually stored in the database is a little bit too simple in my book, and something you shouldn't be testing all that much anyway.

Don''t be tempted to overwrite method_missing in an ActiveRecord-based model class. It will open a can of worms that's hard to close without removing your custom version again.

A lot of stuff in ActiveRecord is based on it, for example all the beautiful finder methods like find_all_by_this_field_and_that_field or simply setting and getting of attribute values. While associations get their own methods, the plain attributes are routed through method_missing. So @user.name and @user.name = "Average Joe" all go through it. You can try that out by overwriting method_missing, strictly for educational purposes of course, and only if you promise to remove it afterwards.

You'd think that this code shouldn't break anything. I tried it, and the validations stopped working. Since there's a lo-hot going on in ActiveRecord, I haven't dug in yet to have a look why that's the case, again strictly for educational purposes. But I'm curious for sure.

If you want to bring dynamic code into your classes, for example generated methods, you're better off generating the code at runtime, just like ActiveRecord does it for associations.

As I wrote yesterday, Marcel Molina and Michael Michael Koziarski did a little Best Practices session for a starters. Other than that, day two was only okay.

Ola Bini repeated in his JRuby talk pretty much what Charles Nutter and Tom Enebo said on the first day, plus some shameless ThoughtWorks pluck.

I did enjoy the talk on Selenium by Till Vollmer. It's been on my list for a while, and it does look quite interesting. The questions that pop up in my head as a friend of continuous integration is of course how to automate this. But I'll just have to read up on that.

Ben Nolan (creator of Behaviour.js) showed some neat tricks using functional programming with JavaScript. He brought up some ideas and showed code, which I very much appreciated. Nothing new for the JavaScript cracks really, but still interesting.

Jay Fields talked about the presenter pattern in Rails. I bet a lot of people thought after the talk: wtf? To sum up his findings on the presenter pattern in Rails were rather negative and probably not what a lot of people expected. I found his talk to be a change to the others. It's not always the success stories that make you push hard, but also down-falls, even if they're small ones. He put up all the details in his blog. Definitely worth checking out.

In all I would've wished for more detail in the presentations. A lot of the presenters spent too much time introducing things, presenting theory, and so on. More code please, people! When people come to the RailsConf I take it for granted they know Rails enough to get down and dirty immediately.

As DHH wrote on his blog I too was quite impressed by the engagement of Sun in Ruby and Rails. Craig McCanahan (of Struts fame) talked about it and said he can't imagine going back to Java for web development after having worked with Rails. Amen to that.

I got some nice ideas and things to look into out of it, but I have hoped for more. But still I'm looking forward to next year.

This morning, on day two, Marcel Molina and Michael Koziarski did a little Best Practices session, a welcome change to the keynotes and sessions. It was very code-oriented. I did even take something out of it I didn't know before. Though I wish it would've gone into a little bit more detail (which I actually wish for a lot of the other sessions as well, but more on this in a later post), it was something that you could relate to on a practical level.

I took some notes, without code though, and here they are:

Keep the controllers skinny, keep logic that’s on the model's low level out of the controller

All that logic in the model makes it the fat model

The controller should not deal with that logic, because it’s a different layer of abstraction

Rough guide: 6 to 7 actions per controller, 6 to 7 lines per action

Use association proxy methods. Add custom finders for associations to keep the association logic in the model and to represent the business logic more clearly

Use explicit and short validation callbacks (e.g. validate :make_sure_something_is_as_it_should_be) instead of just long validate methods. It’s easier to read and understand

with_scope can make code harder to read and is (apparently) used in situations where it isn’t necessary. It can be used to fake associations through proxies, e.g. to find objects that aren’t associated with an object through the database, but through some conditions, e.g. a smart group or a smart folder

Day one of the RailsConf Europe is over (for me anyway), and so here's my summary of what I've seen and heard today.

It all really started yesterday with Dave Thomas' keynote on "The Art of Rails". The talk was inspiring. It wasn't really any new stuff, basically a nice speech with visuals about what the Pragmatic Programmers have already written about. The comparison to art sounds far-stretched for a lot of people, and it might even be. Still, there's a lot to learn from art that can be applied to software development. Casper Fabricius published a nice summary.

The morning keynote by David Heinemeier Hansson was okay. It wasn't great. It pretty much summed up all the neat new features of Rails 2.0. There's another nice summary over at the blog of Casper Fabricius.

My sessions schedule started out with Benjamin Krause's "Caching in Multi-Language Environments." He actually patched the REST-routing in Rails to support the content language as a parameter for a resource URI, e.g. /movies/231.html.de. Neat stuff. He also implemented a language-based fragment cache using memcached. Both will be available later this week on his blog.

Next up was Dr. Nic's talk on meta-programming with Ruby and Rails. My word, I love his Australian accent. His talk was highly entertaining, I was laughing a lot. But it was also inspiring. He's very encouraging about trying out the meta-programming features of Ruby and doing some weird, funny and useful stuff with it. He already put up his slides for your viewing pleasure.

Roy Fielding's talk on REST was something I was really looking forward too, but it turned out to be more of a summary of his dissertation. The part on REST was good, but he spent an awful lot of time telling history and the theories behind REST.

The smaller diamond-sponsor keynotes by Jonathan Siegel of ELC Tech and Craig McClanahan were short, but pretty good I'd say.

I don't know about you, but this looks highly appalling to me. Imagine these declarations cluttered across your Rails application. Throw in more compiler directives and you got yourself some nicely unreadable code. What immediately popped up into my head was annotated Java code:

I know, I know, it's not the same, but that's what came to my mind. I'm not a big fan of Java annotations for purposes like this. In my opinion the code gets very unreadable through the overuse of annotations, but maybe that's just me.

Another thing about the typing directives is the loss of dynamic typing. Dynamic typing is one of the great things about scripting languages. I had several discussions about that matter, and I'm well aware that Java fans love static typing, even though it's not even used to the fullest extent in Java.

With the above code annotation I'd restrict my methods to accept only a Fixnum, an Array and a Hash. Why bother? To make it easier for the compiler and in return jeopardize readability? To answer Ola's question: No, I don't think we need directives, and please don't try to make Ruby more like Lisp. There are a lot of people who wouldn't see that as a step forward, I bet. I'm all up for change and for a language to improve evolutionarily, but it's not a good idea to extent a language with features it wasn't built for. That already happened to Java.

After playing around with AP4R, the new kid on the asynchronous Ruby block, for a little while, I held a small presentation at last week's meeting of the Ruby User Group Berlin. While it is more of an introduction than a field report I put up the slides as PDF anyway.

One note on the load balancing though. That's the issue that brought up some questions I couldn't answer, so I had another look at the documentation (which is still rather sparse, by the way.

AP4R knows two kinds of balancing. The first one is the distribution of dispatching tasks over a number of threads. That way the messages in the server's queue can be handled in parallel. Since the threads only scale on one host this doesn't allow for load balancing on several hosts.

For this AP4R has an experimental feature called carriers. These allow for the AP4R server to redistribute the messages to a bunch of servers exclusively dealing with the messages. It should be added though that these carriers use polling to fetch new messages from the master server's queue. This has the clear advantage that new carriers can be added without changing any configuration on the other servers. Carriers aren't considered stable yet, but they point in the right direction.

As for the client, let's say a Rails application, it can only handle exactly one AP4R server to transmit its messages to. So if you're balancing your application's load over several servers, you can either send all the messages to one server or have each application deliver them to its own AP4R server. The downside of this is that if one server fails, one or all of your server can't deliver asynchronous messages. So it's probably best to always rely on store-and-forward to ensure that your messages won't get lost.

For some further information I'd recommend checking out the website, the "Getting Started" document, where they show how to set up the example application, and a large set of slides from a presentation at this year's RubyKaigi.

I'm looking forward to seeing what's next for AP4R. It's a promising start.

It's been a week full of Rails joy, and a little pain as well, but that's not to looked for in Rails itself, but just some code.

Been working with attachment_fu this week. Basically tried out its S3 storage capabilities when I switched from a custom implementation. Pretty neato. I'm starting to dig S3 itself more and more. Mike Clarkwrote a nice tutorial on the subject.

What I find neat about it, are the multi-stage capabilities and the multi-user support. It's easy to manage several projects and hosts. As you'd expect, it's a Ruby on Rails application and comes pre-packed with all the required goodies.

Chad Fowler and Marcel Molina are holding a full-day Rails testing tutorial on the day before the sessions of RailsConf Europe. That would be September 17th. You're expected to fork over $75 for the entry, but fear not, the money is for a good cause and the donation is tax-deductible.

A year after the initial listing on their website, The Pragmatic Programmers announced "Deploying Rails Applications". Written by Ezra Zygmuntowicz (of BackgrounDRb fame) and Bruce Tate it aims to offer all kinds of wisdom about running Rails applications in the wild. It's still in beta, but I know I'm gonna grab a copy.

Due to some developers not so keen about running the tests I got back to my trusty friend Continuous Integration for a project I'm currently working on. Being a big fan of CruiseControl I looked for similar solutions for the Rails and Ruby market. There are several tools you can use, and they have several ups and downs.

The Slim Solution: Continuous Builder
A small application disguised as a Rails plugin which you can hook into your Subversion post-commit. It's part of the Rails repository and, though it hasn't been cared for in a while, still fulfils its purpose. The commit hook runs checks out the source and runs the tests. Should one or (gasp) several of them fail, it will bug you with a nice email pointing out the abounding problems. It's the simplest solution, but it needs a separate environment in your Rails application. Ryan Daigle suggests to use a separate build environment to keep the build away from the rest of your application. This sort of repelled me from Continuous Builder since it also needs a database to work this way.

The Hell-Hound: Cerberus
It's simple and easy to use, it comes as a RubyGem and is up and running in no time. Once installed, projects are added through the cerberus command.cerberus add http://some.server/with/my/svn/trunk

By default, it uses the last part of your Subversion URL to create a name for your project, so you might want to add the option APPLICATION_NAME=<my_project> to specify a nicer name than just trunk.

When you're done you can easily build your project with cerberus build <my_project>. Cerberus will check for changes in your repository and only do its magic when there are changes. It supports notification through email (duh), Jabber, RSS, CampFire and IRC. It can be run as a commit hook or as a cron job.

The Classic: CruiseControl.rb
Of course I didn't discover it until Cerberus was up and running. CruiseControl.rb is the natural evolution that eventually had to come after CruiseControl, the bigger brother of the Java world. Both are products of ThoughtWorks.

CruiseControl.rb is, of course, a Rails application. It doesn't need a database though. It's pretty easy to use. You just unpack it, add a project and start the required processes:

That's about it. After that you can point your browser to the dashboard application and enjoy the magic of freshly made builds.

Downsides include support for a wider variety of version control systems. For now, only Subversion is on the list.

All of the above are pretty neat and depending on your continuous integration needs, can be recommended. If you need a fancy web application, there's no way around CruiseControl.rb. Its clear advantage is also that it polls the repository without needing cron or a commit hook. Otherwise I'd recommend Cerberus over Continuous Builder because it's a little bit more flexible and offers more notification options. What I realised is that even in a Rails project where the tests are mainly the proof that code hasn't been broken with a change, it's necessary to check this automatically. No more "But it works on my machine" and that warm, fuzzy feeling of not getting email after you checked in.

I held a small presentation about BackgrounDRb yesterday at the meeting of the Ruby User Group Berlin. It wasn't a very big presentation, since actually, there's not a lot to say about BackgrounDRb. But if you want to check it out, you can get the slides and the little sample application.

It's nice to see the user group grow. Yesterday there were about 35 guests, the month before almost 50. If you live in Berlin or in the area, it's definitely worth getting to know the people there. And there's a good chance that you can get new gigs there as well.

The Ruby User Group Berlin is having a party the evening before the conference, on September 16th. So if you're in town for the conference, why not drop by, say hello, meet fellow Rails friends and get free food. The event's name (and theme) is Bratwurst On Rails, and the registration is open as well. The location is yet to be announced, but that shouldn't prevent you from signing up, right?

The layout of that application has been done by lovely Joerdis Anderson, whose work you can witness over at her site Be A Good Girl.

If that's not exciting, then I don't know what is. Original Ruby finally has got competition. I tip my hat to the team that developed JRuby in such a short time frame and now fully conforms with Ruby 1.8.x. Though I'm not keen to run JRuby in Glassfish, it's nice to have the option to integrate a Rails application with J2EE services. Think of that EAI buzzword that came up a few years ago.

Being able to integrate EJB, JMS, or just every kind of Java backend with a (I'm tempted to put some buzzwords in here, think lightweight, agile, full-stack, etc., but I grew pretty tired of these) web framework like Rails is always a nice option to have. Now there's no excuse when Rails is put on the table in a Java/J2EE shop.

A chroot environment seems to be rare these days. Everything is virtualized, load-balanced and what have you. I recently found myself trying to deploy into a chroot’ed Lighttpd environment with Capistrano and immediately ran over several pitfalls. The biggest problem is that Capistrano uses absolute links for directories like current and the links to the log directory.

A base directory for your app could be /var/www/rails and your Lighttpd runs in a chroot environment in /srv/lighty. In the application directory Capistrano creates the directories releases and shared. After a successful deployment it also creates a symbolic link named current pointing to the latest release, and several links to directories in system.

In this scenario the link current would point to the directory
/srv/lighty/var/www/rails/releases/20070506135654. Now, since Lighttpd doesn’t know the directory structure above /srv/lighty that’s a bit of a problem and it won’t find the path when you point it to the dispatcher in the directory current. This is true if you launch your Rails application through FastCGI. In a Mongrel scenario it would pretty much result in the same problems. Additionally, your Rails application won’t find its log and other directories (if you’re up for it, these are public/system and tmp/pids).

Apparently not many people seem to use an environment like this. It’s pretty old-fashioned in this highly virtualized world, but you run across it from time to time. So what can you do?

Hacking the Symbolic Links

This isn’t going to be pretty. To get the thing to work somehow I created a filter for after_update_code and removed the links created by Capistrano to replace them with new ones, only this time they wouldn’t use absolute paths, but relative ones.

I’m not proud of this solution, but I had to come up with something pretty quickly, and it works for the moment, and was only supposed to do so. It will be replaced with a production-like deployment soon. I’ve replaced the task :symlink with my own which looks like this:

One remark: I remove some symlinks in the code, because Capistrano creates them in the update_code task.

As I said, it’s not pretty but it works. Here’s what it does: It removes the symbolic links created by Capistraon and replaces them with new ones using relative paths.

Putting the user in a cage

The better solution is to directly deploy into your chroot environment. Create a user that’s directly being send into it by SSH or by its login shell.

This scenario requires that all the tools needed for a deployment must be installed in the chroot environment, including Subversion. If that’s not a problem, this might be the ideal situation.

One thing you can’t do is restart your web server from here. A scenario like this would mean that you can only restart your FastCGI or Mongrel processes. This is a scenario I’ll use in the long run.

So is this hassle worth it? I’m not sure myself. It’s questionable, if the little added security is worth the effort you have to put in to get your applications working. In the end it depends on what your client's environment looks like, and if you have any control over it. If guidelines require chroot’ed services, then there’s not much you can do. Other than that I’d consider breaking the service out of the chroot and trying to find a better way of securing it. Xen and the like come to mind.

At first I was rather disappointed by what Sun announced with JavaFX, the newest competitor in the RIA market. In my taste F3 looks rather ugly and has a way too expressive syntax, compared to, say SVG. The target's not the same between these two, I know that, but comparing what code you need to draw, SVG is a winner. Anyway, that's not really the point.

An announcement on RailsConf shed a new light on JavaFX. A new framework called RubyFX is on a quest to close the gap between the DLR supporting Ruby and Sun's rather desperate attempt to get into the RIA market. The code looks very expressive, and to say the least, is what I'm talking about. It's still early alpha, but it looks promising.

In an earlier post I wrote about namespacing your Rails model. There's an additional issue that must be thought of when doing so. Rails (of course) has conventions dealing with namespaces.

Normally, when you put classes in namespaces, like Admin::User, Rails normally expects this class to be in a folder admin in your app/models directory. Responsible for this is the method load_missing_constant in the module Dependencies, part of ActiveSupport. It also uses the namespace to build the table name. So Admin::User would result in admin_users. This isn't always a desirable outcome. Of course you could set a table name explicitly:

class Admin::User < ActiveRecord::Base
set_table_name "users"
end

If you need to avoid namespace clashes, that's an acceptable option. But what if you only want to bring some order to your model directory? You want to create some subfolders to separate your model physically, if only to avoid having dozens of files in the model folder.

This is where load_missing_constant kicks in again. If you don't load the files in the subdirectories explicitly, it will assume that the files are in their according namespace. So having a class User in a file user.rb lying in the folder app/models/admin/ will lead it to assume User exists in the module Admin. To avoid that you'll have to add your model subfolders as load paths. To do that you can add the following line to your environment.rb:

config.load_paths += Dir["#{RAILS_ROOT}/app/models/[a-z]*"]

This will tell Rails to look for the the files in all subfolders of app/models on startup. This won't solve the issue yet, you'll still need to explicitly load the classes. So you put the following lines at the end of your environment.rb:

[ "app/models" ].each do |path|
Dir["#{RAILS_ROOT}/#{path}/**/*.rb"].each do |file|
load file
end
end

That way you'll avoid the implicit call to load_missing_constant. You can add directories to the list, e.g. a subdirectory in lib. You could also explicitly require the classes you need, but who wants to do that, really?

Now that's something. Microsoft announced IronRuby at their MIX07 conference. IronPython developer Jim Hugunin provides some details about extensions to the .NET CLR made to improve support for scripting languages. The extension is called Dynamic Language Routine.

The name IronRuby is of course an homage to IronPython, the Python implementation running on .NET. Considering the IronPython is famous for its speed, I'm excited to see what Microsoft's developers are able to squeeze out of Ruby on the CLR. If IronRuby turns out well, it might become the way to run Rails on the .NET (and hence Windows) platform.

I'm always up for competition, even coming from Redmond. With other Ruby implementations (including, of course JRuby, fully backed by Sun in case you missed it) gaining a wider audience there's more pressure on the original version. With YARV coming to Ruby, it too will run on a virtual machine. The speed improvements look quite impressive so far.

Here's to an interesting race.

Of course the DLR shines a new light on Silverlight, whose Technology Preview for version 1.1 includes support for scripting rich browser applications with Python, Ruby, VB or C#.

I had a nice revelation earlier this week, when I finally tested some code I wrote in the wild, the wild being BackgrounDRb. The code used some pretty common class and module names. I didn't expect many problems when I used something along the lines of Sync or Synchronizer. When running the code in test cases there wasn't any problem.

At first I got the error, that I couldn't include Sync (my module) with the include keyword into a class. I wondered why Ruby wanted to tell me that my module would be a class. I quickly found the answer. Ruby's standard library contains a class named Sync. As long as my code ran in a test case that class didn't come into play. But BackgrounDRb uses apparently uses it, therefore it got loaded before my module. That was the first name clash.

The second involved the class Synchronizer which is a class included in the package Slave used by BackgrounDRb. Slave's class even comes along with the same method I used, so you can imagine the look on my face, when I discovered that the code included there immediately started running when my BackgrounDRb worker ran.

That it took me some hours to find these issues because of a bug in BackgrounDRb is a totally different issue.

The moral of the story: namespace your model classes, and the code in lib/. The more classes you have, the bigger the chances that it clashes with a class in one of your installed gems. The best example is a class like User. The chances are pretty good that it's being used somewhere else. To avoid a clash choose a namespace corresponding to your business case or application (the project name would do).

Keep in mind that Rails expects a directory structure corresponding to your namespaces. So if you have a class Bacon in the module Chunky, be sure to put the class file bacon.rb into a subfolder named chunky in the app/model directory. The same goes for code in lib/.

For smaller projects this might not be necessary, but I learned the hard way that it can be a pretty good practice.

While setting up a new project with Rails 1.2.3 I ran across a weird issue in my tests. Somewhere from the CGI class (belonging to Ruby's core lib and extended by Rails) a nil object was used and threw an error. Turns out this was introduced in 1.2.2 with a fix that was supposed to improve CGI handling on Windows.

The issue has been fixed in the stable branch of Rails 1.2, but if you're stuck with 1.2.3 or 1.2.2 you can do what I do: use a plugin. I took this opportunity to write my first, so here it is.

Sorry I don't have a fancy SVN set up for your plugin-script-using pleasure, but this is a tiny fix anyway.

Though this was a rather annoying bug the fix emphasizes the beauty of Rails and Ruby: It's easy to extend and even to fix.

I worked on a similar problem two weeks ago and came up with a pretty similar solution, though without the fanciness of just defining new databases in database.yml and having the classes in corresponding modules automatically.

To sum up: no jealousy, I'm just glad other's have taken a very similar approach. I just wish, Alex Payne of Twitter would've given this interview a little earlier.

The Great Ruby Shoot-Out
Antonio Cangiano compared the current Ruby implementations. JRuby, Ruby 1.9 and Rubinius look awesome, at least by the numbers.

Upcoming Changes to the JavaScript Language
I'm still not sure if I'm gonna like what's coming. JavaScript is on the way to turning into a full-blown and statically typed object-oriented language, but with all its pros and cons. It looks a lot like C++, and if that's no a little bit scary, I don't know what is.

I ran across a weird bug the other day that seems to have been fixed in Ruby 1.8.5. It's nonetheless quite an interesting one. When you use a hash as a method parameter, and that hash happens to contain the key :do and you call the method without parentheses, like so:

def my_method(opts)
end
method :do => "commit"

It works when you put parentheses around the parameter:

method(:do => "commit")

Putting it in front of other entries doesn't work though. Ruby seems to think I want to start a block where it's not allowed. Putting the do into a string works just fine, of course.

Funny stuff. No mention in the Ruby changelogs, but it does work in later versions.

The book market is being swamped with new books. It seems like every day I discover a new announcement for an upcoming book on Ruby or Rails. Let's see what's currently in stock, and what's waiting for us next year.

Having been postponed the next due to be released is Deploying Rails Applications, written by Ezra Zygmuntowicz and Bruce Tate. It's been in beta for a while, but it got along quite slowly. Supposed to be released in January 2008.

Agile Testing with Rails. I'm fond of agile principles and testing, but I'm rather sick of books containing "agile" in their titles, but the contents does look interesting. Written by two guys working for ThoughtWorks who obviously know their stuff.

O'Reilly already put out their share of Ruby and Rails books, but right now the queue looks rather slim.

The Ruby Programming Language. Mats itself is writing this one together with David Flanagan (who wrote a lot of books on JavaScript). I'm not sure if it can replace the PickAxe as the most valuable Ruby book for me, but that might just be for strictly melancholic reasons, what with it being my first Ruby book.

Manning has only put out one book on Rails yet, the excellent "Ruby for Rails" by David Black. But they have some interesting stuff in the queue for 2008.

Ruby in Practice. This could become the mother of books on advanced Ruby topics. The topics include testing, integration, indexing, asychronous messaging and much more.

Flexible Rails. I can't say much on that topic, but maybe this book will change that. Flex and Rails. Nuff said.

Addison-Wesley recently released kind of the companion book to "The Ruby Way", a most excellent book.

The Rails Way. Just been released, covers a lot of Rails grounds, from basic to advanced. Should make for a nice desktop reference.

Design Patterns in Ruby. Though I have mixed feelings about design patterns and their overuse in the Java world, it should be interesting to read about how they can be useful in the Ruby world.

There, lots of new stuff to read for the new year. I'm sure there's more to come, as it naturally happens with a technology on the rise. The downside is that the quality of the books becomes more and more average over time, but there's a lot of talented people writing on these, so hopefully the Ruby/Rails book market will see more and more great books coming out in the near future.

Mocking is a great part of RSpec, and from the documentation it looks insanely easy. What had me frustrated on a current project is the fact that the mocks and stubs wouldn't always do what I'd expect them to do. No errors when methods weren't invoked, and, the worst part, mocks wouldn't be cleaned up between examples which resulted in rather weird errors. They only occurred when run as a whole with rake spec but not when I ran the specs through TextMate.

I was going insane, because noone on the mailing list seemed to have any problems, same for friends working with RSpec. Then I had another look at the RSpec configuration.

Turns out, the reason for all of this is that Mocha was used for mocking. Switching the configuration back to use RSpec's internal mocking implementation, everything worked like a charme from then on.

So what you want to have in your SpecHelper isn't this:

Spec::Runner.configure do |config|
config.mock_with :mocha
end

but rather

Spec::Runner.configure do |config|
config.mock_with :rspec
end

or no mention at all of mock_with which will result in the default implementation being used which is, you guessed it, RSpec's own.

When using CruiseControl.rb for continuous integration, and RSpec for testing, the defaults of CruiseControl.rb don't play that nice with RSpec. However, that can be remedied pretty simply.

By default CruiseControl.rb runs its own set of Rake task, which invoke a couple of Rails' tasks, including db:test:purge, db:migrate and test. You do want the first ones, but you'd rather have CruiseControl.rb run your specs instead of your the (most likely non-existing) Test::Unit tests.

Before running its default tasks it checks whether you have a specific task configured in your project settings, and if you have a task called cruise in your project's Rakefiles. You can use both, but I just declared a task cruise in the Rakefile, and I was good to go.

That task can to pretty much the same as the original CruiseControl.rb implementation, and even be shorter since it can be tailored for your project, but invokes spec instead of test. One caveat is to set the correct environment before running db:migrate. I split out a tiny prepare task which does just that, and can do a couple of other things, if necessary, like copying a testing version of your database.yml.

desc "Task to do some preparations for CruiseControl"
task :prepare do
RAILS_ENV = 'test'
end
desc "Task for CruiseControl.rb"
task :cruise => [:prepare, "db:migrate", "spec"] do
end

Simple like that. The task spec will automatically invoke db:test:purge, so you don't need that.

Over at InfoQ, there's a nice talk of Dave Thomas at last year's QCon. He talks about meta programming, and how it's used in Rails to achieve all the magic that makes it what it is. Not to the tiniest details, but he explains Ruby's mechanisms which Rails uses to achieve the magic.

A nice talk, and an hour well spent. Once again Dave proves that he's an excellent speaker.

QuickLook and TextMate, sitting in a tree. Makes QuickLook even better. It integrates the TextMate syntax highlighting into QuickLook, and integrates QuickLook into TextMate. Looks pretty neat.

Ruby Tool Survey
It's official (as far as this survey goes): TextMate is the number one development tool for Ruby and Rails. That's what I tried to tell the students during the lecture I gave last weekend, but the die-hard Linux fans wouldn't believe me ;)

Nginx and Mongrel for Rails
Nice tutorial on how to set up Nginx for Rails deployment. I'm using it on one of my current projects and it's pretty neat. RailsJitsu is well worth keeping an eye on.

Scripting the Leopard Termina
Tired of having to open four terminal session each time you open Terminal to work on your Rails applications? Get some nice hints on how to script Terminal with AppleScript to ease the pain on your fingers.

For a recent project I had the pleasure to work with Paypal, especially with the Instant Payment Notification API. I haven't heard a lot of things before I tried to marry it with Rails, but what I'd heard made me assume it wouldn't be a piece of cake. And I was right.

I'd love to share some code with you, but Vasillis Dimos beat me to it. He wrote two posts on Paypal IPN and Rails, one dealing with the basics and the other about mocking IPN, which you really need to do to test your code. Really.

Personally I did the testing a little differently, since all my payment handling logic was in the model. I didn't use ActiveMerchant either, but just the Paypal gem. But in general things are similar. Outside of the US and the UK you're pretty much out of choices for payments, since there's no Web Payments Pro available here, so IPN is (sadly) the way to go. It's a real PITA and here's why:

Paypal needs to reach your development machine from the outside. For testing this is not an issue of course, but when you need to do testing with the Paypal sandbox (which is painfully slow) and, god forbid, the real Paypal, there's no way around that.

The payment flow is unnatural. You have to handle the payment outside of the user's page flow. You have to rely solely on the stuff you get from Paypal, no session, no cookie, no user. It takes a lot of care to handle all that and there still might be a hole in your code that could be exploited.

IPNs might come in late, sometimes only after the user already got back to your site. Now you want to present him with a nice success message, but that's not gonna happen then. That's a rare case though. The IPN come in slower from the sandbox, that's for sure. It's up to you how to handle that. You can act in the favor of the user, or you can just make him wait till everything fell into place.

In rare cases you won't get an IPN from Paypal, for whatever reason. I've seen this happen. Be prepared to create the successful payment by hand or have something like a small GUI at hand to do it.

For subscriptions six different notification types need to be handled. And their even spread out over two different fields in the notification.

Some advice on how to get it right:
* Log everything. Store the IPNs in the database, in the log files, wherever. Just log them. Their your proof of things that happened. Just storing them with their raw post data should do while leaving the most important fields separately in different columns.
* Use mocks. It's not hard. But it's totally worth it. When you want to test all events that Paypal might send you, which is a lot for subscriptions, it's a painful development cycle. And some events aren't even fully testable by hand.
* Decide on strategy to handle fraud. While your IPN URL is not really public (nothing should link here, and it's hopefully transmitted to Paypal encrypted) it's not exactly safe to just accept everything.
* Don't return errors in your IPN handler. Paypal will try it again.
* Store a pending payment and make it a full one when the corresponding IPN arrives.

All that said, it was an experience, and while not always pleasant, at least I learned something. But Paypal is far from being a pleasant way to handle payments, if you want to make it secure and protect your the integrity of your application and prevent fraudulent users from abusing your services, all of which should be your primary interests.

David A. Black of Ruby Power and Light will be giving a four-day training in Berlin from November 19th to 22th 2007. The training targets intermediate Rails developers who already worked with Rails and are familiar with the basics. Topics include ActiveRecord (attribute handling, associations, optimisations), routing, REST, testing and deployment.

You can find the full details (in German) on the official website. Seating is limited, so hurry!