Tips for Optimizing Rails on Oracle

by Luca Mearelli

Learn tips and techniques to build more efficient Rails applications on Oracle Database.

Published May 2007

Using the Ruby on Rails framework it is relatively easy to develop complex systems that are backed by an Oracle database, but this ease doesn't exempt the developer from the need to optimize the data model and the code that manipulates it.

This article is for developers working with Ruby on Rails applications on an Oracle database, with special attention to those having a live application that needs to be optimized. You will learn which options tweak the parameters that handle the connection to the database for the Oracle adapter to boost the application performance, as well as get a useful plugin to track down the query execution plans for live applications. You'll build a simple set of models and look at various typical queries on them, exploring how you can improve their efficiency.

Setup

The prerequisite to follow this article is having Ruby on Rails installed and configured properly to connect with an Oracle database. "The Ruby on Rails with Oracle FAQ" by Casimir Saternos explains the process in detail. (It's quite easy.)

For the remainder of this article I will make examples based on a Rails "version" of the HR schema that has been modified to directly comply with the common Rail conventions (as opposed to using the standard HR schema from a Rails application, explained in Saternos' "HR Schema On Rails"). This choice has been dictated by the wish to show the behavior of a standard Rails application on an Oracle database without being distracted by the specifics of handling a separate schema to map the HR schema into the Rails conventions. (Which is indeed a very clever way to handle legacy schemas!)

In the sample code file you will find the complete migrations for the schema and a (yml format) dump of the data which can be used to load them using the Active Record fixtures plugin, which is also included. (The db/db_loading.rb file does the data import and can be executed using script/runner).

The six tables are populated by the very same data that the regular HR schema holds, while they have some columns modified. (E.g., all of them now use the common Rails convention of having one id column as their primary key.)

The models also define some relations among them; you can see the full code for the models in the sample code file:

Part 1: Of configuration, Cursors, and Rows

Basic Configuration

So, you have already configured all the required prerequisites for running Rails on an Oracle database, and bootstrapping your Rails application is just a matter of opening a prompt window (terminal, xterm, or whatever) and typing:

rails myapp

This will reward you with a skeleton application that is ready to hold your code (models, views, controller and tests). You will need to customize the config/database.yml file telling the framework basic information for each one of the databases that Rails uses.

Each entry is identified with a key that tells the system if the specific information is to be used while developing the application, while running the tests or when running the application in production mode. You could even define other database connections beyond the standard three and use them from your Rails application, but this subject goes beyond the scope of this article.

Having separate database settings means that you may modify the values in the development or testing database and conduct your tests and measures without touching the production database.

The adapter parameter specify that we are connecting with an Oracle database and selects the OracleAdapter (see: activerecord/lib/active_record/connection_adapters/oracle_adapter.rb ). Older Rails releases used the 'oci' value to identify the oracle adapter, and this value can still be used.

With the host parameter you specify the database to which you want to connect. If you have configured TNS entry for your database:

you may use its name for the host parameter; otherwise, you can use the Easy Connect naming convention to address the Oracle instance (host: //10.211.55.7:1521/XE).

The username and password are the credentials you want to use for your application. All the objects (tables, sequences) needed by the application should be accessible from this user and anything built thorough the database migrations will be built into this user's schema (at least this is the default behavior).

Models and SQL Queries

Now we will look at some of the queries that are generated by ActiveRecord, and this is made simple by the fact that while in development mode Rails writes all the queries it is performing to the log file. In this manner you can familiarize yourself with the behavior of the framework and gain greater confidence in what the tool will do.

I find that having a deeper understanding of how the framework code is going to translate your high level requests into actual SQL is a required step when working with powerful frameworks that make complex interactions with the data. You will gain a better understanding of the Rails idioms that should be used and how to write more efficient code.

One of the most common ways to get the model data is looking up an object by its id:

emp = Employee.find(202)

will get you the employee record with id = 202. Looking at the development.log we will see the queries that are getting executed:

The first query is executed once the first time a model class is loaded to pull the columns from the employee table allowing the system to dynamically generate the required methods for the model based on the database structure. (This is one of those features that make Rails development really quick!) Be warned that while doing your activity and running the server in development mode you will see this query repeated many times as the models get reloaded on each browser request. This makes the server slower but allows you to get the updates to the database structures without restarting the server.

The second query gets the data from the employees table for the employee whose id is 202.

Other examples of the models may be the following (all presented with the actual query executed on the database):

This is really handy and efficient when working with objects behaving as containers for sets of other items as it avoids going to the database with a query for each and every row you want to manipulate.

One of the points of this article is to show you the new configuration parameters introduced with the release 1.2 of Rails that may help in tuning the database-related performance of your application, but before going deeper into it let me explain how currently Rails builds the actual SQL used.

Bind Variables and Cursor Sharing

ActiveRecord uses its (dynamically built) knowledge of the models and database to create queries on the fly. It even builds many methods of the models themselves in this way allowing the developer to write things like:

Employee.find_by_first_name_and_last_name('Steven','King')

but it does so by building the SQL query as a string and doing parameter interpolation before sending the query to the adapter for execution (without using bind variables). SQL injection and security problems are attacked at the adapter level by quoting the values on SQL interpolation but the performance hit remains, and it also impacts the scalability of any Rails application running on Oracle. To understand how this impacts performance and scalability, let me explain what happens when a query is executed.

When a query is first executed Oracle performs a hard parse step which verifies that it is syntactically correct and that all the objects (tables, columns, ...) being accessed really exist. If the checks pass, a new entry is created in the library cache with the parse results that may be reused for subsequent executions. The key to the library cache is the SQL text for the query itself, therefore two queries that have any literal difference are considered different for the purposes of parsing.

When a new query is executed, a hash lookup into the library cache is done to see if the query had been already parsed. If the query is found then the cached parse result is used and the hard parse step is avoided, moreover no new entries are added to the library cache. A way to minimize hard parses is to always use bind parameters.

Within the context of a pre-1.2 Rails application, without any specific database tuning this means that each time a different query is executed (even if it differs just for the parameter values), the statement is hard parsed by the database engine and a new query entry is inserted in the library cache.

You can verify what happens by looking at the v$sql view where the sql_text column holds the actual SQL used for the queries: each row in v$sql corresponds to an entry in the library cache. We may use the following query to get all the queries generated by our application (limiting ourselves to the queries that hit one of the 6 tables):

select sql_text from v$sql
where ( lower(sql_text) like '%employee%'
or lower(sql_text) like '%countries%'
or lower(sql_text) like '%departments%'
or lower(sql_text) like '%jobs%'
or lower(sql_text) like '%locations%'
or lower(sql_text) like '%regions%'
)
and not lower(sql_text) like '%v$sql%'
order by sql_text
If we do the following:
(1..200).each do |id|
Employee.find(id) rescue nil
end

which tries to get each Employee with id between 1 and 200. You'd end up with the following in v$sql:

This is not an advisable behavior from an application expected to scale, especially for the common patterns of use where you may expect many queries using the primary key (e.g. accessing to the user table for login purposes or for configuration data).

Work is under way to provide Rails with proper handling of bind variables, or at least to let ActiveRecord adapters themselves do the binding of parameters within the query. In the meantime the best option is to use the ability to handle query rewriting built into the database.

A specific configuration offered by Oracle comes to the rescue here. The CURSOR_SHARING database parameter changes the way the database behaves doing hard and soft parses in presence of queries that are not using bind parameters. The parameter can be set to exact, similar, or force (and the latter can be set databasewide or on a specific session).

exact is the default value where the database behaves as outlined above.

force tells the database to rewrite all the queries substituting the literals in the SQL text with bind parameters (e.g. :"SYS_B_0") resulting in one parsed entry for each query.

similar rewrites the query but also looks at the effect of such rewrite and generates different entries in library cache for queries that would end up using different execution plans—that is, if substituting the literal with a bind parameter would end up in a different plan being executed then the replacement is not done.

Before Rails 1.2, to change the cursor sharing setting you'd need to change it for the entire database instance or patch the ActiveRecord adapter. From the last major release an adapter configuration parameter has been added to do just that.

The cursor_sharing setting can be used in database.yml to choose the preferred value without messing with the global database parameters (thus being better citizens in an heterogeneous environment where other applications may require different settings).

Moreover the default value has been chosen to be similar, which means that without needing to add cursor_sharing to the parameters you get a reasonable behavior: the queries are changed according to the plan that would be used depending on the values.

If you redo the query above you'll see this in v$sql:

SELECT * FROM employees WHERE (employees.id = :"SYS_B_0")
just one library cache entry for any Employee.find(id) call.

To see the impact of similar vs. force, update the table to have some skewed data:

This is because the two queries generated would use two different plans depending on the salary value. (In the first case the index would be used while in the second case a full table scan would result being the best plan.) Using force cursor sharing would tell the database to use to the same parsed entry for both queries which is not what we want in this case, even if it meant having only one cache entry.

Having a configuration available for this from Rails, let's tune its value on a per-application case, depending also on which kind of data you have and on which patterns of use you are seeing for your application.

A second parameter added to Rails 1.2, which may help in tuning the application, is the prefetch_rows configuration. This allows setting the OCI_ATTR_PREFETCH_ROWS connection parameter, which specifies the amount of rows to be pre-fetched from the database on each round trip. It can give a great performance boost in all cases where you want to fetch many rows.

Typical patterns are iterations over a set of entities like in:

Employee.find(:all, :conditions=>"salary < 5000")

In these cases rows pre-fetching means fewer hits to the database, while the cost is more memory used by the adapter—which is not a problem anyway if you want to process all the rows returned by the query.

The default value is set to 100 which has been identified by the Oracle adapter contributors to be ideal in a range of test cases, but more than for cursor_sharing the prefetch value needs to be tweaked by benchmarking with the real data-sets and queries used in the application. For custom tuning this parameter is set in database.yml as usual:

Part 2: Execution plans and Indexes

What Rails Cannot Do

While Rails goes great lengths toward reducing the burden of development through a clever use of conventions and design pattern implementation, there are situations where it cannot have the required knowledge to make the decisions for the developer. Thus, the Rails developer should not forget that he or she is building a database-backed application, nor which database he or she is developing for.

While most of the coding is doable in a fairly database independent way a good understanding of how a database works is required if you want to get the best performance out of your applications.

Knowing how your database works is useful while developing the data model and while building the queries, but it is much more important when deciding which indexes you need on your tables. This is something that the framework cannot do and does not even attempt to do, as it would depend on:

the specific database that is used (as it might have or not have some capability when it comes to creating and using indexes)

the data that is present in the tables (as they influence the query execution plan and which indexes are used)

the specific application (as the actual queries used may access the data in unpredictable ways based on usage patterns)

Moreover these factors influence each other and any optimization will depend on all of them collectively. In his blog post "Indexing for DB Performance", Jamis Buck gives some tips on the kind of indexes that might be needed:

foreign keys (has_one, belongs_to, has_many, ... relations)

non-index fields that are frequently queried upon (e.g. login names)

columns where you sort of do calculations frequently

sets of columns that are used as conditions together

Each index adds some overhead as it needs to be updated on data insert and update, so you should not add indexes for every column (or combination of them). Rather you should make informed choices starting with some reasonable set of indexes, modifying them as the application evolves over time, taking into account the actual data and usage patterns.

Explain Plan. In an Oracle context, this means using the explain plan command to understand how queries are executed. It gives the developer a way to optimize the query itself and to eventually add indexes.

You should always try to optimize the query first and only afterward go on to add the indexes.

Rails makes it rather easy to use custom queries instead of a regular one with the handy find_by_sql method of ActiveRecord. Note however that in the find_by_sql call you could simply pass the text of the SQL query, but you should really confirm that you use the parameter substitution form, passing an array with the query and the list of parameters. This way you can use the parameter quoting that is in the database adapter and you make sure that you will use real bind variables in future when ActiveRecord supports them without the need to modify your code.

Query_Analyzer Plugin. While looking for bottlenecks in your application it is important that you get to know which queries are executed. The Rails log files can be really useful here, as the application while running in development mode logs all the actual queries. They might be collected and examined one by one to find the queries and generate the execution plans. This can become tedious work. The query_analyzer plugin helps relieve the burden by dumping the explain plan output in the log file. You can get an Oracle compatible release here; it was originally developed by Bob Silva for MySQL. To install it just unzip the archive contents in the vendor/plugins directory on your Rails project.

To allow the plugin to work in Oracle the database user will need to be given enough privileges on the data dictionary tables that are queried on loading the model. The following statement will do it:

GRANT SELECT ANY DICTIONARY TO < database user >;

The plugin monkey patches the database adapter to add the explain plan for each query that is executed by the adapter while running at a logging level below INFO. The plan is formatted and printed at the debug level to the default logger. This default behavior makes sure that the plan is not executed while running on the production environment.

Two parameters are available to modify the default behavior; both can be modified within the environment setup files: The plan_table_name parameter can be used to specify the name of the plan table and its default is the standard PLAN_TABLE; the plan_details parameter can be used to specify which details should be printed by the explain plan statement. Possible values are:

This is the result of querying all the employees that have first name 'Stephen': Employee.find(:all, :conditions=>["first_name = ?", "Stephen"]).

Note that this query needs to scan the full EMPLOYEES table to get just one row when using a filter on the first_name column. If this kind of query is seen many times through the application, you should consider adding an index. The predicate information for the more costly operations gives you a good hint at what indexes may benefit the query execution—in this case, obviously an index on the first_name column.

Rails Migrations. You can also use Rails Migrations to manage indexes. (See the API docs for more information on what Migrations are and how to use them to manage a database schema.) To add an index use:

add_index table_name, column_names, options = {}

this adds an index on "table_name" for "column_names" where the latter can be a single column or a list of columns:

As you have seen it is really easy to use the plugin to poke at the execution plans used by a running application and you may even leave it installed on the production server and disable/enable it when needed by changing the logging level.

Conclusion

In this article you dug a bit into how connections to an Oracle database are configured in a Rails application, and you have seen how the framework has been updated in the 1.2 release to have better performances thorough the use of the cursor_sharing and prefetch_rows parameters (while waiting for a real bind variable implementation).

You have also examined the Rails migration commands that are relevant to tuning the database (specially creating/removing indexes).

Finally, as you've learned, a good understanding of both how Rails builds the SQL from the code and how the database executes them is needed to get the best performance. The provided plugins should help here. Luca Mearelli [ http://spazidigitali.com] is a specialist in Oracle and Web technologies based in Città di Castello, Italy. Send us your comments