Jon asks about Oren’s comments on a recent .NET Rocks podcast in which he said that document databases allow us to be more correct than relational databases.

Oren gives a real life example of how an update to a customer’s financial information caused a change to her historical record, which caused some real problems.

Jon talks about some of the hoops we jump through in an attempt to maintain historical data in a relational database, e.g. soft deletes.

(08:42) Disk space concerns

Scott K says he hears DBA’s worry about disk space due to data repetition between documents and asks what other concerns people bring up.

Oren says there can be more computation and indexing, but on the other hand temporal data is orders of magnitude easier.

Data design principles were established back when space was expensive, that’s all changed now.

Oren says he hears people say that space isn’t cheap in the enterprise, but runs some numbers and concludes they’re either very inefficient or someone’s got their hand in the till. Scott K says that enterprises data storage is often expensive because they’re not tiering their data correctly to put low priority data on cheaper storage.

Oren says enterprises drive up storage costs by due to foolish backup strategies.

(14:42) Query and performance benefits

Scott K says that people often view document databases as a giant blob of text rather than structured data which can be searched, indexed, etc.

Oren says that you get full text search for free in RavenDB.

In relational databases, you’re always working with the very latest data, so you have locks, readers waiting for writers, etc.

RavenDB does a lot of precomputation in the background, so it can give you aggregate information immediately.

(17:27) RavenDB 2.0 release overview

Big improvements to performance on some key codepaths, in some cases over 1000%.

Support for JavaScript scripts on the server, which allows for scenarios like mass migrations and batching support on the server.

Files some rough spots in the 2.0 release – things that beta testers didn’t mind, but can be a little smoother.

They added a new feature – improves support for replicating to a relational database.

(22:05) Sharding improvements and migrations

Sharding’s been around since the beginning, but required you to specify a lot of things – lots of options, too much complexity, too many important decisions early in the development process.

Sharding support has been revamped – provide the endpoints, defaults take care of the rest.

Oren gives an example with sharding customer data. By default, documents are sharded together based on transaction id. You can specify a shard when you save based on a user specified id.

Some people have problems with the default approach because the document id includes the shard id. That’s necessary to prevent having to query all shards.

Jon asks how this works over time if you need to add shards, migrate data, etc. Oren says you can rebalance by biasing new data towards a newly added shard.

If you need to move data to a new server – for instance, a customer becomes large enough that you want to put all of their documents on a new shard, you’ve got two options for handling the id’s. Oren says some users migrate data, rewriting id’s during the process, but he doesn’t recommend that. Instead, he recommends using a sharding function which allows remapping document id’s to a new shard without changing id’s.

Jon obviously doesn’t get it and asks the same question again, also asking how you handle data modifications over time. Oren explains that you can just write a JavaScript function to update your existing documents if needed.

Kevin asks how long data a data migration takes. Oren types one up on the fly and explains the parsing and execution time.

(34:43) Time for some random questions!

Scott K notes that there’s a client that runs on Mono and asks if there are plans to get the server running on Mono. Oren talks about the general plan to handle that, but says it’s not high on the priority list.

(35:48) Scott K asks about compact scenarios, including clients that run on mobile and embedded instances that run locally. Oren notes that clients are easy, because anything that can make a REST call can be a client. They had an embedded version that had very little interest.

(41:25) Jeremy Miller (@jeremydmiller) asks when Oren is going to fix Lucene.net’s flow control via exception madness. Oren says it’s not planned, and that Jeremy should ignore those exceptions.

(42:25) Philip (@autosnak) asks why RavenDB doesn’t do more for startups and small biz pricing-wise. Oren explains the offers they make available – open source is free, RavenDB basic edition is $5 / month, they donate a lot of license for a lot of other cases, and even the full versions are incredibly cheap compared with any other database. Shoot him an e-mail.

(44:44) Chris Whellams (@chriswillems) asks how to sell NoSQL and RavenDB to IT management and bosses that are addicted to SQL Sever. Oren outlines a strategy – start with a persistent viewmodel cache on a slow page to get a quick win, then use it for simple storage of ancillary application data (e.g. preferences), then use it in a spike on a new project. This is exactly what the MSNBC team did – they started with a non-operating RavenDB node in production, then slowly moved some things in without taking on any unnecessary risk.

(42:50) Jon asks for any closing thoughts. Oren says they’re starting on some weekly webinars for RavenDB users – or just if you’re curious about it. There’s a RavenDB course in the US in May.

Very interesting topic. I’m from RDB-crowd. For a long time as DBA I was working with OLTP/OLAP environments using MS Sql, Oracle, MySQL systems. Recently I tried to use nosql to create proof of concept for one of my projects which faces typical database issues – combining OLTP with OLAP (at least basics of it).
Tested MongoDB and RavenDB. Since I’m using .NET stack RavenDB seemed ideal to me. Speed of inserts is fantastic. But hey. It’s no surprise if you understand you’re just writing into flat file. Unfortunately immediately after that came disappointment. Idea of index running on background is fine while index is capable of doing what I need. It’s not the case here. If you need at least some basic analytics (group by, count(), sum()) either forget about nosql solution or be prepared to develop extra layer/ETL for pushing data into some other DB-warehouse engine. So far I was not able to find one working example proving I’m wrong. And I would be glad to to be wrong. Because the major characteristics of nosql – scalability and super fast data inserts are very sweet baits for me.