Implementing a new backend

Hadley Wickham

2017-06-18

The goal of this document is to help you implement a new backend for DBI.

If you are writing a package that connects a database to R, I highly recommend that you make it DBI compatible because it makes your life easier by spelling out exactly what you need to do. The consistent interface provided by DBI makes it easier for you to implement the package (because you have fewer arbitrary choices to make), and easier for your users (because it follows a familiar pattern). In addition, the DBItest package provides test cases which you can easily incorporate in your package.

I’ll illustrate the process using a fictional database called Kazam.

Getting started

Start by creating a package. It’s up to you what to call the package, but following the existing pattern of RSQLite, RMySQL, RPostgres and ROracle will make it easier for people to find it. For this example, I’ll call my package RKazam.

In your DESCRIPTION, make sure to include:

Imports:
DBI (>= 0.3.0),
methods
Suggests:
DBItest, testthat

Importing DBI is fine, because your users are not supposed to attach your package anyway; the preferred method is to attach DBI and use explicit qualification via :: to access the driver in your package (which needs to be done only once).

Testing

Why testing at this early stage? Because testing should be an integral part of the software development cycle. Test right from the start, add automated tests as you go, finish faster (because tests are automated) while maintaining superb code quality (because tests also check corner cases that you might not be aware of). Don’t worry: if some test cases are difficult or impossible to satisfy, or take too long to run, you can just turn them off.

Take the time now to head over to the DBItest vignette. You will find a vast amount of ready-to-use test cases that will help you in the process of implementing your new DBI backend.

vignette("test", package ="DBItest")

Add custom tests that are not covered by DBItest at your discretion, or enhance DBItest and file a pull request if the test is generic enough to be useful for many DBI backends.

Driver

Start by making a driver class which inherits from DBIDriver. This class doesn’t need to do anything, it’s just used to dispatch other generics to the right method. Users don’t need to know about this, so you can remove it from the default help listing with @keywords internal:

Connection

Next create a connection class that inherits from DBIConnection. This should store all the information needed to connect to the database. If you’re talking to a C api, this will include a slot that holds an external pointer.

Now you have some of the boilerplate out of the way, you can start work on the connection. The most important method here is dbConnect() which allows you to connect to a specified instance of the database. Note the use of @rdname Kazam. This ensures that Kazam() and the connect method are documented together.

Replace ... with the arguments needed to connect to your database. You’ll always need to include ... in the arguments, even if you don’t use it, for compatibility with the generic.

This is likely to be where people first come for help, so the examples should show how to connect to the database, and how to query it. (Obviously these examples won’t work yet.) Ideally, include examples that can be run right away (perhaps relying on a publicly hosted database), but failing that surround in \dontrun{} so people can at least see the code.

Next, implement show() and dbDisconnect() methods.

Results

Finally, you’re ready to implement the meat of the system: fetching results of a query into a data frame. First define a results class:

Then write a dbSendQuery() method. This takes a connection and SQL string as arguments, and returns a result object. Again ... is needed for compatibility with the generic, but you can add other arguments if you need them.

The hardest part of every DBI package is writing the dbFetch() method. This needs to take a result set and (optionally) number of records to return, and create a dataframe. Mapping R’s data types to those of your database may require a custom implementation of the dbDataType() method for your connection class:

With these four methods in place, you can now use the default dbGetQuery() to send a query to the database, retrieve results if available and then clean up. Spend some time now making sure this works with an existing database, or relax and let the DBItest package do the work for you.

SQL methods

You’re now on the home stretch, and can make your wrapper substantially more useful by implementing methods that wrap around variations in SQL across databases:

dbQuoteString() and dbQuoteIdentifer() are used to safely quote strings and identifiers to avoid SQL injection attacks. Note that the former must be vectorized, but not the latter.

dbWriteTable() creates a database table given an R dataframe. I’d recommend using the functions prefixed with sql in this package to generate the SQL. These functions are still a work in progress so please let me know if you have problems.

dbReadTable(): a simple wrapper around SELECT * FROM table. Use dbQuoteIdentifer() to safely quote the table name and prevent mismatches between the names allowed by R and the database.

dbListTables() and dbExistsTable() let you determine what tables are available. If not provided by your database’s API, you may need to generate sql that inspects the system tables.

Full DBI compliance

By now, your package should implement all methods defined in the DBI specification. If you want to walk the extra mile, offer a read-only mode that allows your users to be sure that their valuable data doesn’t get destroyed inadvertently.