Subscribe to receive PuneTech updates updates in your email inbox or via RSS. And, if you are looking for special interest groups, Click here. See our About Page to find out more about what PuneTech is.

TechWeekend LiveBlog: NoSQL + Database in the Cloud #tw6

This is a quick-and-dirty live-blog of TechWeekend 6 on NoSQL and Databases in the Cloud.

First, I (Navin Kabra) gave an overview of NoSQL systems. Since I was talking, I wasn’t able to live-blog it.

When not to use NoSQL

Next, Dhananjay Nene talked about when to not use NoSQL. Main points:

People know SQL. They can leverage it much faster, than if they were to use one of these non-standard interfaces of one of these new-fangled systems.

When reporting is very important, having SQL is much better. Reporting systems support SQL. Re-doing that with NoSQL will be more difficult.

Consistency, and Transactions are often important. Going to NoSQL usually involves giving them up. And unless you are really, really sure you don’t need them, this issue might come and bite you.

If you’re considering using NoSQL, you better know what the CAP theorem is; you better really understand what C, A, and P in that mean; don’t even consider NoSQL until you’re very well versed with these concepts

RDBMS can really scale quite a lot – especially if you optimize them well. So 90% of the time, it is very likely that the RDBMS is good enough for your situation and you don’t need NoSQL. So don’t go for NoSQL unless you are really sure that your RDBMS wont scale.

MongoDB the Infinitely Scalable

Next up is BG, talking about MongoDB, the Infinitely Scalable. They are using MongoDB in production for http://paisa.com (Infinitely Beta). The main points he made:

Based on the idea that JSON is a well understood format for data, and it is possible to build a database based on JSON as the primary data structuring format.

The data is stored on disk using BSON, a binary format for storing JSON

Obviously, JavaScript is the natural language for working with MongoDB. So you can use JavaScript to query the database, and also for “stored procedures”

MongoDB it does not really allow joins; but with proper structuring of your data, you will not need joins

You can do very rich querying, deeply nested, in MongoDB

MongoDB has native support for ‘sharding’ (i.e. breaking up your data into chunks to be spread across multiple servers). This is really difficult to do.

MongoDB is screaming fast.

It is free and open source, but it is also backed by a commercial company, so you can get paid support if you want. There are hosting solutions (including free plans) where you can host your MongoDB instances (e.g. http://mongohq.com)

You store “documents” in MongoDB. Since you can’t really do joins, the solution is to de-normalize your data. Everything you need should be in the one document, so you don’t need joins to fetch related data. e.g. if you were storing a blog post in MongoDB, you’ll store the post, all its meta-data, and all the comments in a single document.

FourSquare recently had a major unplanned downtime – because they did not understand how to really MongoDB. That underscores the importance of understanding the guarantees given by your NoSQL system – otherwise you could run into major problems including downtime, or even data loss. See this blog post for more on the FourSquare outage

Some stats about use of MongoDB at paisa.com. 54 million documents. 80GB of data. 6GB of indexes. All of this on 2 nodes (master-slave setup).

Redis

Gautam Rege now talking about his experiences with Redis. Main points made:

Redis is a key-value database with an attitude. Nothing more.

Important feature: in (key, value), the value can be a list, hash, set.

1 million key lookups in 40ms. Because it keeps data in memory.

Persistence is lazy – save to disk every x seconds. So you can lose data in case of a crash. So you need to be sure that your app can handle this.

Redis is a “main memory database” (which can handle virtual memory – so your database does not really have to fit in memory)

All get and set operations on Redis are atomic. A lot of concurrency problems and race conditions disapper because of atomicity.

Sorted sets combine hashes and arrays. Can lookup by key, but can also scan sequentially.

Redis allows real-time publish-subscribe.

Redis is simple. Redis is for specific small applications. Not intended for being the general purpose database for your app. Use where it makes sense. For example:

Lots of small updates

Vote up, vote down

Counters

Tagging. Implementing a tagging solution is a pain – becomes easy with Redis

Cross-referencing small data

Don’t use Redis for ORM (object-relational mapping)

Don’t use Redis if memory is limited

Sites like digg use Redis for tagging

SQL Azure

Saranya Sriram talking about SQL Azure and data in the cloud. SQL Azure is pretty much SQL Server in the cloud, retrofitted for for the cloud:

Exposes a RESTful interface

Has language bindings for python, rails, java, etc.

Gives full SQL / Relational database in the cloud

The standard tools used to access SQLServer locally can also be used to access SQL Azure from the cloud

For Azure you get a cloud simulation on your local machine to develop and test your application. For SQL Azure, you simply test with your local SQL Server edition. If you don’t have a SQL Server license, you can download SQL Server Express, which is free.

You can develop applications in Microsoft Visual Studio. You can incorporate PHP also in this.

You can also use Eclipse for developing applications.

SQL Azure has a maximum size limit of 50GB. (Started with 1 GB last year)

There is no free plan for Azure. You have to play. “Enthusiasts” can use it free for 180 days. If you sign up for the Bizspark program (for small startups, for the first 3 years) it is free. Similarly students can use it for free by signing up for the DreamSpark program. (Actually, the Bizspark and DreamSpark programs give you free access to lots of Microsoft software.)

a) People know SQL : This is very important when your software is being shipped to other companies and their operations team are well versed with

b) When adhoc reporting is very important, having SQL is much better. NoSQL requires writing map/join tasks which can be far more time consuming especially for adhoc reporting

c) RDBMS can really scale quite a lot – especially if you optimize them well. So 90% of the time, it is very likely that the RDBMS is good enough for your situation and you don’t need NoSQL. So don’t go for NoSQL unless you are really sure that your RDBMS wont scale primarily for scalability reasons</em?