Thursday, January 10, 2008

With the recent release of Sequel, I took some time to interview Sharon Rosner about it. Sequel is a really cool project, and uses some Ruby tools in novel ways. Read on and find out more.

There are already several ORMs out there. Why write another one?

Sharon I wrote Sequel mainly because I tried ActiveRecord and it really didn’t fit what I wanted to do. The first thing that frustrated me was the lack of support for multi-threading. I was messing with Mongrel and writing my own web controller framework, and couldn’t get ActiveRecord to work properly with multiple threads. It leaked memory like a sieve, and it just felt wrong. There was also no support for connection pooling for example.

The other, even more important, issue was that ActiveRecord is great for dealing with individual records, but if you need to work with multiple records or large datasets, it kinda feels awkward. If you need to filter then you have to write raw SQL expressions, and it’s a bit hard to do GROUP BY and than sort of stuff. Also, ActiveRecord loads the entire result set into memory before you can iterate over it. Not very nice if you work with millions of records.

So I started from there and tried to design a Ruby interface for databases that would feel like Ruby code, where I didn’t have to switch between SQL and Ruby. So the basic idea was that you can express an SQL query using Ruby constructs, and then iterate over the results just like you iterate over an array or any Enumerable, fetching each record as a Ruby hash with symbols for keys:

DB[:posts].filter(:category => 'ruby').each {|row| puts row[:title]}

I actually wrote Sequel to be used in this project I was working on (and still am), and added more features as I needed them. The development of Sequel is still very much feature-request driven. People ask for stuff and if it’s a good idea we add it the library.

Apart from that, I like the fact that there are several different ORM’s for Ruby. Each has a different mindset, each has its pros and cons. It’s very much in the Ruby spirit – one of the things I like most about Ruby is that you can write the same stuff a million different ways. Some people dislike it, but I think it’s brilliant!

What are some of the cooler things that you’ve added to Sequel based on other people’s requests?

Sharon Some people were asking for a way to change table schemas easily, so I came up with a DSL for doing that, so you can do stuff like:

What spaces do you think Sequel plays well in? Which spaces should be left to a different approach?

Sharon One of the things that separates Sequel from other ORM’s is that you don’t really have to define model classes. In fact one of the recent changes (in version 0.5) was to divide the code into two separate gems—sequel_core which takes care of connecting to databases and fetching records, and sequel_model which implements an ORM layer on top.

With sequel_core you can fetch records as naked Ruby hashes, which you can also use to insert and update records. So that gives you a lot of freedom in querying virtually any database, without making assumptions on how the schema looks. So Sequel can be used with legacy databases and also as a general purpose tool for writing short scripts that process records. In fact, Sequel also lets you fetch records with arbitrary SQL and iterate over the results using the same interface.

Sequel can be used as a general low level database access library. It is built so you can stick any ORM model implementation on top. I know for example that there was an effort to graft Og on top of Sequel, so this kind of stuff can be done. Sequel already has adapters for ADO, ODBC, PostgreSQL, MySQL, SQLite, and DBI. There are also experimental adapters for Oracle, Informix, DB2, OpenBase and JDBC (on JRuby). In that respect, I believe Sequel can a good replacement for DBI.

The flip side, however, is that Sequel is not really made for Ruby beginners. The API is very terse and very powerful if you know how to use it, but newcomers might be bewildered by how short everything looks. :-) They better stick with ActiveRecord, Og or DataMapper.

What other DB tools have you looked at, learned from as you’ve worked on Sequel?

Sharon Actually a lot of my inspiration came from two Python frameworks: web.py, which is sort of a micro web-framework, and sqlalchemy, which is in my opinion a brilliant piece of work. I haven’t used, but I read a lot of the documentation and grabbed some ideas from there, like for example using a URL to specify a database connection, or the separation between a layer that deals with fetching records and a modelling layer.

What have you learned about Ruby while you’ve been working on Sequel?

Sharon I learned a lot about meta-programming. I got a lot from reading the stuff that _why wrote. He’s probably the brightest most genius Ruby programmer in existence today! A lot of parts of Sequel are written using meta-programming techniques. Once you know how to use those, you can do nuclear stuff!

The biggest discovery I made though was ParseTree. It’s a library that takes your code and gives you back a parse tree made from arrays with symbols in them. It’s the ultimate meta-programming tool. I believe it’s really the next step for Ruby programming and should become part of the Ruby core. Unfortunately, it doesn’t work with Ruby 1.9, at least for now.

Sequel uses ParseTree to translate Ruby code into SQL. So for example, the following code:

DB[:items].filter {:score > 100 and :category.in('ruby', 'perl')}

Is translated into the SQL statement “SELECT * FROM items WHERE (score > 100) AND (category IN (‘ruby’, ‘perl’))”

The Ruby to SQL translator (I call it “The Sequelizer”) is also smart enough to evaluate Ruby expressions, so you can also use constants, variables and ivars, and make calculations.

Another important thing I learned was the value of properly written specs and good code coverage. The thing about rspec is that unlike unit testing, when you write specs you tend to repeat a lot of expectations. I found that with specs the code gets a much more thorough work-out than with unit tests. A lot of stuff gets tested multiple times under different scenarios.

RSpec has also proved to be an indispensable tool for debugging, but this really requires a change in how you work. When you work on a bug, instead of just hammering out a solution, you first write a spec that will fail, setting expectations that expose the bug. Only then do you fix your code so the spec will pass. This seems trivial but it’s important when you develop a library used by hundreds of people. When you work this way you can also ensure that the bug doesn’t return when you make changes to your code.

I use RCov in conjunction with RSpec to make sure every line of code is covered by specs and as of version 1.0 we have 100% code coverage!

Are you using other coverage tools (like dcov or heckle)? Why or why not?

Sharon Both dcov and heckle are things I haven’t looked into in depth. Sequel is not very strong on documentation, and we should put a lot more effort into it. As regards heckle, I tried playing with it a few times, but eventually it would always make the specs hang. Some of the Sequel specs really wreak havoc with Ruby threads and loading/unloading of classes and methods, so I don’t know if heckle can really be beneficial for us. As many other people have already observed, 100% code coverage doesn’t mean there are no bugs or that the code is perfect, but it’s still, together with specs, a good indication of code quality.

Often, people ask if you need to know Ruby to use Rails. How much SQL do you need to get started with Sequel and how much SQL do you need to know to really use Sequel well?

Sharon That depends. The abstraction provided by ORM’s is never perfect, and if you’re going to deal with a relational database, you pretty much need to understand SQL. But with Sequel, you don’t need to know how columns, strings and other literal values should be quoted, or worry about SQL injection, or what’s the correct order of SQL clauses.

There’s a #filter method for specifying record filters, an #order method for specifying the order, a #limit method for limiting the size of the result set and so on. Sequel has idioms for column references (using symbols), SQL functions ( e.g. :max[:price]), array subscripts (as shown above), qualified column names, column aliasing, etc, so the mapping of Ruby to SQL is pretty much complete. There’s also substantial support for sub-queries, which is a believe a unique feature of Sequel. And you can always see what the SQL looks like by calling #sql. Sequel also provides a DSL for defining and changing table schemas. In that respect, you can forget how SQL statements look and manipulate your database using Ruby code.

So if you’re just writing a blog app and need to put some stuff in a database, you don’t really need to know SQL, but if you’re dealing with large data sets or complex relationships then you better have a good understanding of how relational databases work and how records are fetched, and that involves at least some knowledge of SQL.

What books, websites, etc. would you recommend for people wanting to learn more about SQL?

Sharon I’m really not the type that reads books on programming, I just dive right in. There are though tons of articles on the web about the subject. That’s especially useful if you want to look at how to do more complex stuff like multiple joins, sub-queries, grouped queries etc.

Sharon Not that I know of, and I’m not really the person to take on that kind of project. But if somebody wants to go ahead and do it, that would be like super cool!

What’s the coolest thing that someone’s done with Sequel?

Sharon I’d have to say Hackety Hack, which is a project by _why to make programming accessible to kids. He wraps Sequel a bit to make it more friendly but you can see a bit of Sequel code right there on the front page!