ChEMBL Resources

Sunday, 27 October 2013

Tastypie & Chempi

One of the immediate consequences of refactoring our webservices using Django, Tastypie and related approaches (as described here) is that we can run them on almost any database backend. Django abstracts communication with database and using custom QueryManagers we were able to implement chemisty-specific opererations, such as substructure and similarity search in a database agnostic manner.

This means, that if we want, we can use only Open Source components (such as Postgres and RDKit), or elect to use optimised commercially sourced software as appropriate. However, what if we go one step further and try to use Open Hardware as well? This is exactly what we've just done! We managed to install full ChEMBL 17 on raspbery pi.

Some frequently asked questions (at lease those that have been asked internally) and technical details are below:

1. How much space does it take?
12 Gb, including OS, data and all relevant software. Unfortunately we a used 32 Gb SD card so this is size if you would like to use our cloned disk image.

We haven't make any benchmarks yet. Obviously it's slower than our online web services - but then it's a lot cheaper. On the other hand, performing some sample requests we can say that performance is certainly acceptable; and there is a lot room for improvements - raspberry pis can be easily overclocked from 700 MHz to 1GHz and according to some benchmarks this can give rise to doubling of application speed in some cases. The SD card we used is not the fastest one as well. Finally, all caching is disabled because we wanted to save disk space but using database caching from Django caching framework should give further major improvements - so maybe use the 32 Gb image after all.

Types of request that chempi can be slower on are:

- Image generation, but if we replace image with JSON from which image can be generated using HTML5 canvas on the client side (the way we generated images in our game) it can be much faster. More about this topic in future blog post.
- Queries using aggregate functions such as COUNT (it seems that we need to optimise our postgres db by adding some more indexes).
- Substructure and similarity search - again, caching, over-clocking and some database and cartridge (choosing faster fingerprints) optimization should solve all the problems. "Premature optimization is a root of all evil", so we first wanted to have a proof of concept that just works, not necessarily works super fast.

4. Can I make my own chempi?
Yes, we are planning to share our SD card image, we will probably use BitTorrent protocol to do this due to image size, and some issues we have faced with distribution of the myChEMBL. We do remember that not everyone has mega-fast broadband!

5. Is chempi useful at all?
Although we think it is interesting as a proof of concept having chemical database on such small and open source hardware, we do think this may have some interesting future real-world applications:

- plugging our chempi to local network makes it immediately accessible to other computers. So this is a zero configuration demonstration of ChEMBL.
- analogically to the thesis included in this paper, it can encourage cheminformatics education on low cost ARM hardware.
- raspberry can be easily enhanced with camera to perform image recognition. This, combined with software like OSRA can give ability so scan compound images and search them in database.
- adding some e-ink display (for example, jailbroken Kindle?) can produce very interesting small machine...

6. What are some of the technical details?

To deploy our webservices (which are just another Django application) we've used Gunicorn as a server, which in turn connects to NGINX via standard unix pipe. To make it work as a deamon and launch on machine startup, we've used Supervisor. We believe this is ideal way to deploy Django not only on raspberry but on all production machines to if you like to run chembl webservices locally in your company/academia we suggest to do it this way.