Avoiding strings of SQL

Now, when you want to write a scope that matches by equality, Active Record offers a nicer syntax. For example, you could find all blog posts whose title is an exact match to a query using the Hash-based syntax. The two following lines are mostly equivalent.

BlogPost.where("title = ?", query)
BlogPost.where(title: query)

I prefer the Hash syntax because there are no string literals involved. Also, as an added bonus, Active Record uses a fully-qualified column name such as "blog_posts"."title" instead of just title, which makes it easier to use in multi-table expressions that use joins.

Also, it’s still possible to generated invalid SQL when using the two-parameter version.

BlogPost.where("title = '?'", query) # extra ' surrounds query

Luckily this example always generates invalid SQL, so it’s easy to catch. But it’s still frustrating that the most common syntax doesn’t prevent bugs from leaking all the way into the database.

Symbolic expressions

That’s where Arel steps in. Arel is a Relational Algebra gem that allows you to generate SQL queries directly from an abstract syntax tree (AST) of nodes. It’s what Active Record uses internally to build up expressions symbolically from the Hash syntax.

The #where method in Active Record can accept an Arel node. For the title equality example, it would look like this:

However, digging into the source of the method gives some clues as to how it works.

# File 'lib/arel/predications.rb', line 15
def eq other
Nodes::Equality.new self, other
end

It turns out that #eq, when called on one Arel node and passed another, creates an Arel::Nodes::Equality instance. Looking at the docs for this Equality class shows that it is a subclass of Arel::Nodes::Binary.

It turns out that an Arel::Nodes::Binary instance is a symbolic representation of a operation between two Arel nodes. In this case, you should think of “binary” as meaning “having two arguments”, as opposed to the Boolean concept of true and false.

In our example, the nodes are a database column, BlogPost.arel_table[:title], and a Ruby string, query. And Equality is the particular Binary relationship that deals with the equality operator, =.

It turns out there are other subclasses, such as Arel::Nodes::GreaterThan, for other binary operations.

Putting it all together

Anyway, coming back to our original example using trigrams, we want to find a way to create a Binary expression that Active Record will accept and turn into a SQL query that uses the % operator.

After hunting around with my pair JT Archie, we discovered a class named Arel::Nodes::InfixOperation. An “infix” operator is a binary operator such as = or + that is written between its arguments. For example, in SQL notation, you would write 1 + 2, not + 1 2 or 1 2 +.

It turns out that % is an infix operator in PostgreSQL. So we can create an InfixOperation instance directly. Looking at its constructor, we see that it takes three arguments named “operator”, “left”, and “right”.

Looking forward

As of now, my pg_search gem builds most of its SQL through creative use of Active Record SQL-escaping methods and Ruby string interpolations. As far as I know, I have done everything correctly. Thus, I feel mostly safe from SQL injection attacks and other bugs. But it’s hard to know whether or not I’ve missed a spot.

But one day soon, I hope to convert it all into Arel objects. I’ve already started, and I hope it won’t be long before a fully string-interpolation-free version is released. Then I can rest easier, knowing that a typo somewhere deep in my code won’t end up going directly into the database unescaped.