Common Trip-ups for New Rubyists, Part I

Once internalized, Ruby is a fairly straightforward language. Until that happens, however, many potential converts are turned off by some of its more unusual aspects. This series hopes to clear up some of the confusion that newcomers face when learning Ruby.

There will be an attempt to adhere to the RDoc notation for describing methods:

Array#length refers to an instance methodlength on class Array.

Array::new refers to a class methodnew on class Array.

Math::PI refers to a constantPI in module Math.

--ADVERTISEMENT--

Ruby 1.9+ is assumed.

Instance Variables and Class Variables

In most programming languages, instance variables must be declared before they can be assigned. Ruby is the diametric opposite – instance variables cannot be declared at all. Instead, an instance variable in Ruby comes into existence the first time it is assigned.

Creating an instance variable is as easy as taking a local variable and slapping a “@” on the beginning.

Underneath the hood, an instance variable is just a variable stored in self – the current instance. It might be tempting to assume that instance variables can be assigned with self.foo = like in Python. In fact, self.foo = would send the foo=message to self, and Ruby would try to call the corresponding method. This only works if the method exists.

The #initialize constructor has been left out here to show that it is optional. Also, notice that accessing an instance variable that has not been assigned does not raise an error. @title is nil when it is accessed before assignment. Part II will demonstrate how this can be used for lazy initialization.

Defining getters and setters over and over would get old pretty quickly. Fortunately, Ruby provides a trio of helper methods:

A consequence of the fact that instance variables are just variables defined in self is that instance variables also work at the class level. After all, classes are just instances of Class. These are commonly referred to as class instance variables.

In addition to instance variables, Ruby also has so-called “class variables” by using @@ instead of @. Unfortunately, these are frowned upon due to the fact that they replace their descendants’ / ancestors’ values. For this reason, it’s better to think of them as class-hierarchy variables.

Actually, it’s a lot worse than that. Think about what happens when a class-hierarchy variable is assigned at the top level, where self is main – an instance of Object.

@@value = 3
puts Parent.value # => 3

Since basically every Ruby object descends from Object, it will have access to the same class-hierarchy variables as Object. So, class variables are potentially global variables. This makes them highly unpredictable and prone to misuse.

Modules

In Ruby, a module is a container for methods. A class, on the other hand, is a special kind of module that has the ability to create instances and have ancestors.

Methods in a module can be instance methods or class methods – at least semantically. Ruby class methods are just instance methods on class objects. It’s a bit counter-intuitive, but modules are instances of class Module…even though classes are kinds of modules.

module MyModule
# Instance method defined on MyModule which is an instance of Module
def MyModule.hello
puts 'hello from module'
end
# Same as "def MyModule.hello" because self is MyModule here
def self.hello
puts 'hello from module'
end
# Instance method for an instance of a class mixing-in MyModule
def hello
puts 'hello from instance'
end
end
MyModule.hello # => "hello from module"

Just because modules don’t have instance factories doesn’t mean they can’t have data. Since they are instances of Module, they can have instance variables.

A seeming contradiction at this point is that modules can have instance methods in them, even though modules can’t create instances. It turns out that a primary use of modules is in the form of mixins where the module’s methods get incorporated into the class. There the module’s instance methods will be used in the class’ instances.

Sometimes it’s desirable to mix both class and instance methods into classes. The obvious, if redundant, way to do this is to put the class methods in one module, and put the instance methods in another. Then, the instance module can be included, and then class module can be extended.

Instead of this, it’s better to use a bit of metaprogramming magic. The Module#included hook method detects when a module has been included in a class. When the module is included, the class can extend an inner module (often called ClassMethods) that contains the class-level methods.

module Helloable
# Gets called on 'include Helloable'
def self.included(klass)
# 'base' often used instead of 'klass'
klass.extend(ClassMethods)
end
# Method that will become an instance method
def hello
puts "hello from instance"
end
# Methods that will become class methods
module ClassMethods
def hello
puts "hello from class"
end
end
end
class HelloClass
include Helloable
end
HelloClass.hello # => "hello from class"
HelloClass.new.hello # => "hello from instance"

A quick way to demonstrate the utility of mixins is with the Comparable module. It defines comparison operator methods based on the return value of the combined comparison operator <=> method. Let’s create a class StringFraction that provides proper comparison between fractions in strings.

Module Gotchas

1. If a module is included twice, the second inclusion is ignored.

module HelloModule
def say
"hello from module"
end
end
module GoodbyeModule
def say
"goodbye from module"
end
end
class MyClass
include HelloModule
include GoodbyeModule
include HelloModule
end
MyClass.new.say # => "goodbye from module"

2. If two modules define the same method, the second one to be included is used.

module HelloModule
def hello
'hello from module'
end
end
class HelloClass
def hello
'hello from class'
end
include HelloModule
end
HelloClass.new.hello # => 'hello from class'

Symbols

In other languages (and possibly Ruby), you may have seen constants used as names like this:

NORTH = 0
SOUTH = 1
EAST = 2
WEST = 3

Outside of Ruby, constants used as names and not for their values are known as enumerators (not to be confused with Ruby’s Enumerator). Typically, there is a cleaner way to do this as in an enumerated type (here, in Java):

public enum Direction {
NORTH, SOUTH, EAST, WEST
}

Ruby has something even better than enumerated types: symbols. A symbol is any reference that begins with :, including words like :name, :@foo, or :+. This flexibility is important because symbols are used to represent things like method names, instance variables, and constants.

From a practical standpoint, symbols -like enumerators- are basically fast, immutable strings. However, unlike enumerators, symbols do not need to be created manually. If you need a symbol to exist, just use it as if it already does. Here is how you would create a hash that has some symbols as keys:

A potential pitfall is creating symbols dynamically, especially based on user input. This isn’t a great idea because symbols are not garbage collected. Once created, they exist until the program exits. This becomes a security issue when users are indirectly responsible for creating symbols. A malicious user could consume a lot of memory that would never be garbage collected, potentially crashing the VM.

Blocks, Procs, and Lambdas

Ruby method calls can be followed by either {/} or do/end token pairs enclosing arbitrary code. This arbitrary code can receive pipe-enclosed (ex: |i, j|) arguments that are referenced in the code.

[1,2,3].each { |i| print i } # => '123'

The code between these enclosing tokens is known as a block. A block is a chunk of code attached to a method call. Here, for #each item in the array, the block is executed. In this case, it just prints the item to standard output.

What isn’t obvious to new rubyists is how each item in the array gets inside the block. A great way to understand this is to write our own #each.

The yield keyword executes the block with the arguments passed to it. In other words, The arguments in a block (ex: |i|) come from calls to yield inside the method the block is attached to. If you want to iterate through a collection, you just need to yield each item in it.

But what happens when the method is called without a block?

Colors.each #=> ...no block given (yield) (LocalJumpError)...

It turns out that yield will try to execute the block regardless of whether there really is one. If we want a flexible method that yields to a block only if one is provided, Ruby provides #block_given?.

class Colors
@colors = ['Red', 'Green', 'Blue']
def self.each
if block_given?
# send each color to the block if there is one
# #each returns the array
@colors.each { |color| yield color }
else
# otherwise just return the array
@colors
end
end
end

Storing Blocks

For someone coming from a language like JavaScript blocks are a bit like anonymous functions defined inside function calls (with the caveat that there can only be one per call). JavaScriptists will also be used to the fact that JavaScript functions can be stored in variables and passed to other functions. These are known as first-class functions. Strictly speaking, Ruby does not have first-class functions, but its blocks can be stored in callable objects.

Ruby provides two containers for storing blocks: procs and lambdas. These containers can be created in a number of ways:

The lambda executes without any problems. However, when we try to call the proc, we get a LocalJumpError. This is because the proc was defined at the top level. The easiest way to get around the proc/block return issue is to avoid explicit returns. Instead, take advantage of Ruby’s implicit returns.

Note: In Ruby 1.8, proc created procs that checked the number of arguments like lambda, while Proc.new created what are now called procs. In Ruby 1.9+, proc was fixed to behave identically to Proc.new. Keep this in mind when using code that was written for 1.8.

Local Variable Scope

Ruby has two local scope barriers – points where local variables and arguments cannot pass:

Module definitions

Method definitions

Since classes are modules, there are three keywords to look for: module, class, and def.

So how do you work around scope barriers? It turns out that blocks inherit the scope that they are defined in. We can take advantage of this to pass local variables past these barriers by using alternatives that take blocks.

Since each nested block contains the scopes from the higher blocks, the deepest block contains the scope with the local variable. Using nested blocks to provide access to local variables in this way is often referred to as flattening the scope.

Summary

Instance variables are references prefixed with ‘@’ and belong to self – the current instance.

Instance variables can be created anywhere, including modules.

The value of unassigned instance variables is nil

Ruby also has class variables, but they are almost global variables and shouldn’t be used.

Modules are containers for methods.

Classes are modules that also happen to be instance factories and have ancestors.

Ruby does not have multiple inheritance, but multiple modules can be “mixed-in” to classes.

All methods are instance methods in Ruby. Class methods are instance methods on class objects.

A block is a chunk of code attached to a method call that inherits the scope that it is defined in.

The arguments in a block (ex: |i|) come from calls to yield inside the method.

Blocks can be stored in procs or lambdas.

Procs are like portable blocks. Lambdas are like portable methods.

Symbols are Ruby’s answer to enumerated types.

Symbols are not garbage collected, so they can be a security issue when generated dynamically.

Local variables can’t cross the scope barriers of module and method definitions.

Blocks can be used to carry local variables past scope barriers.

Next, in Part II we’ll explore more of the inner workings of the Ruby language.