Musings on computers and software as I continue learning and sharing software development knowledge.

Saturday, June 25, 2011

The curious case of multithreading Sqlite

In a recent project, I've embarked upon the process of saving information in a quick and lightweight manner for which the serverless database, sqlite was ideal. This is not only a flexible database supporting dynamic data types, but also an extendible one used from embedded web browsers and mobile platforms to rails based websites(good riddiance, mysql) .The database is so lightweight that you'll find yourself exclaiming it as db. It is less than 100 kb in some cases and can be used as a memory resident database for the performance conscious folks and applications.
However, the support of transactions and scalability leave a lot to desire as the documentation cleary states the goals of this database is not these.
In my case, getting started was just the matter of finding the jdbc driver and it s documentation from its website. However, upon customization, there was a need for normalization which in turn required a sequential read/write operations. As soon as this was done, the multi-threaded application that was accessing started experiencing concurrency issues around the database.
This was resolved via using a little used jni wrapper that I discovered, sqlite4java. This not only solved the concurrency issues, but also enabled transactions. The beauty of this approach is also the fact that it uses a single connection as opposed to a pooled/fresh connection per every request, which tends to be a real performance bottleneck as we scale up our application.

For such a nice framework, it is a surprise that these changes have not permeated to the java/native jdbc drivers so far and the users have to cope up with its api at the moment. This could be probably because of its tight integration with the underlying C-based api that is quite performant as compared to its
alternatives in python and java, effectively making it non compliant with the jdbc specification.

As the single connection served entire applications, exposing it as a singleton was the easiest approach :

For providing a transaction-like support, there was a need to wrap the existing operations in this queue :

public static void databaseInsert(final List list) {
queue.execute(new SQLiteJob() {@Overrideprotected Integer job(SQLiteConnection arg0) throws Throwable {insert(bidders, arg0);return null;}});}
The insert method performs the actual reads and writes in a row in a non-blocking manner which earlier used to cause locking problems with sqlite in the past.
Integration tests verified this performance gains (recollections):
DB insert (Read + Multiple writes in the operation)
1Record : ~200 ms2 s
100Records : 5s10s
10000Records : Fail~100s
As an afterthought, you can have a plain vanilla solution via the scraperwiki.com that not only gives you free cloud access for your scripts, but also its data via sqlite. This aptly highlights the difference in the mindsets of java as well as scripting developers. Here, I was able to run a rudimentary scraper in a matter of minutes but was of no use for my client due to the gathering restrictions of the running jobs there. Hopefully, we get to see some mixing of both the worlds in the near future as scripting for jvm has been making a lot of noise for the past 3 years, but not much efforts have been made by the two sides so far.
Oh, and I am really enjoying the rainy weather as I write this post!