For many developers, Rails starts with the database. One of the most compelling features of Rails is ActiveRecord, the object-relational mapping (ORM) layer. ActiveRecord does such a good job of hiding the gory details of SQL from the programmer that it almost seems like magic.

However, as Joel Spolsky says, all abstractions are leaky. There is no perfectly transparent ORM system, and there never will be, due to the fundamentally different nature of the object-oriented and relational models. Ignore the underlying database at your own peril.

Database Management Systems

The Rails community has been built around the MySQL database management system (DBMS*) for years. However, there are still a lot of misconceptions surrounding DBMSs, especially when used with Rails. While MySQL has its place, it is certainly not the only option. In the past few years, support for other databases has vastly grown. I encourage you to keep an open mind throughout this chapter, and weigh all criteria before making a decision on a DBMS.

Rails supports many DBMSs; at the time of this writing, DB2, Firebird, FrontBase, MySQL, OpenBase, Oracle, PostgreSQL, SQLite, Microsoft SQL Server, and Sybase are supported. You will probably know if you need to use a DBMS other than the ones mentioned here. Check the RDoc for the connection adapter for any caveats specific to your DBMS; some features such as migrations are only supported on a handful of connection adapters.

PostgreSQL

I list PostgreSQL* first because it is my platform of choice. It is one of the most advanced open source databases available today. It has a long history, dating back to the University of California at Berkeleys Ingres project from the early 1980s. In contrast to MySQL, Postgres has supported advanced features such as triggers, stored procedures, custom data types, and transactions for much longer.

PostgreSQLs support for concurrency is more mature than MySQLs. Postgres supports multiversion concurrency control (MVCC), which is even more advanced than row-level locking. MVCC can isolate transactions, using timestamps to give each concurrent transaction its own snapshot of the data set. Under the Serializable isolation level, this prevents such problems as dirty reads, nonrepeatable reads, and phantom reads. See the upcoming sidebar, Multiversion Concurrency Control, for more information about MVCC.

One advantage that PostgreSQL may have in the enterprise is its similarity to commercial enterprise databases such as Oracle, MS SQL Server, or DB2. Although Postgres is not by any means a clone or emulation of any commercial database, it will nevertheless be familiar to programmers and DBAs who have experience with one of the commercial databases. It will also likely be easier to migrate an application from Postgres to (say) Oracle than from MySQL to Oracle.

PostgreSQL has an unfortunate reputation for being slow. It got this reputation because the default configuration is optimized for performance on a tiny machine. Therefore, it will perform fairly consistently out of the box on a server with as little as 64 MB of RAM or as much as 64 GB. Like any database, Postgres must be tuned for any serious use. The official documentation at http://www.postgresql.org/docs/ has lots of great information on performance tuning.

One disadvantage of using PostgreSQL is that it has a smaller community around it. There are more developers, especially in the Rails world, working with MySQL. There are more tested solutions built around MySQL than PostgreSQL. The company behind MySQL, MySQL AB, provides commercial support for its product. There is no such centralized support structure for Postgres, as there is no single company behind PostgreSQL; however, there are several companies that specialize in Postgres consulting and provide support contracts.

Multiversion Concurrency Control

Multiversion concurrency control (MVCC) is one of the most powerful ways to achieve isolation between concurrent database transactions. MVCC gives each transaction a snapshot of the data it accesses, as the data existed at the start of the transaction. The transaction performs actions on the data, which are logged with timestamps. When the transaction commits, the DBMS checks the logs to ensure there are no conflicts with other transactions; if the transaction can be performed successfully, it is applied to the database at once, atomically.

The alternative to MVCC is row-level locking, which is used by MySQLs InnoDB storage engine. Row-level locking locks only those rows affected by an update during a transaction (as opposed to page- or table-level locking, which are more coarse). The primary advantage that MVCC has over locking is that MVCC does not block readers. Since all update transactions are applied atomically, the database is always in a consistent state. Pending transactions are stored as logs alongside the database to be written upon commit, rather than being applied to the database in the middle of the transaction. The most significant consequence of this is that reads never block, since they are read from the database, which is always consistent.

It is important to realize that isolation for concurrent transactions usually trades off against performance. MVCC uses more storage space than locking because it has to store a snapshot for each in-progress transaction. And though MVCC never blocks reads, the DBMS may roll back update transactions if they cause a conflict.