Welcome to SQL-based analysis for large surveys project!

Analysing large complex surveys, such as the American Community Survey, using R to generate SQL code for MonetDB.

This project used to have two packages. It now just has one: sqlsurvey, for analysis of large surveys. You also need MonetDB.R, which replaces my JDBC-based interface. There are other MonetDB to R interfaces, but they won't work with the sqlsurvey package, because we've had to extend the R-DBI interface to handle concurrency problems from garbage collection.

Both packages require MonetDB, so installation is more complicated than just installing an R package.
Under Windows it is important to use a 64-bit version of MonetDB to allow creation of large databases.

While it is possible to read data into R and then save into MonetDB using dbWriteTable, this is very inefficient for large files and it is better to construct the database table and read the data directly using the MonetDB console client. Here is a script that reads the whole-US ACS 3yr person data, which comes in four CSV files, into a table in MonetDB.

None of the analyses will modify any existing database table, and the R survey design objects behave as if they are passed by value, like ordinary R objects. Temporary tables are automatically dropped when the R objects referring to them are garbage-collected. The basic design ideas for the package were described in apresentation at UseR 2007, but were not developed further because of lack of demand. The American Community Survey and some medical-record surveys such as the Nationwide Inpatient Sample do represent a real need, so the project has been restarted. MonetDB turns out to be much faster than SQLite for this sort of analysis, and interactive analysis of millions of records on an ordinary desktop is quite feasible.