I have been blogging for a long time now, and I am quite comfortable in expressing myself, but I was still blown away by this post to the RavenDB mailing list. Mostly because this thread sums up a lot of the core points that led me to design RavenDB the way it is today.

Rasmus Schultz has been able to put a lot of the thought processes behind the RavenDB design into words.

Back when I took my education in systems development, basically, I was taught to build aggregates as large, as complete and as connected as possible. But that was 14 years ago, and I'm starting to think, what they taught me back then was based on the kind of thinking that works for single-user, typically desktop applications, where the entire model was assumed to be in-memory, and therefore had to be traversible, since there was no "engine" you could go back to and ask for another piece of the model.

I can see now why that doesn't make sense for concurrent applications with large models persisted in the background. It just never occurred to me, and looked extremely wrong to me, because that's not how I was taught to think.

Yes. That is the exact problem that I see people run into over and over. The create highly connected object model, without regards to how they are persisted, and then they run into problems using them. And the assumption that everything is equally costly to read from memory is hugely expensive.

Furthermore, I'm starting to see why NHibernate doesn't really work well for me. So here's the main thing that's starting to dawn on me, and please confirm or correct me on this:

It seems that the idea behind NH is to configure the expected data-access strategies for the model itself. You write configuration-files that define the expected data-access strategies, but potentially, you're doing this based on assumptions about how you might access the data in this or that scenario.

The problem I'm starting to see, is that you're defining these assumptions statically - and while it is possible to deviate from these defined patterns, it's easy to think that once you've defined your access strategies, you're "done", and the model "just works" and you can focus on writing business logic, which too frequently turns out to be untrue in practice.

To be fair, you can specify those things in place, with full context. And I have been recommending to do just that for years, but yeah, that is a very common issue.

This contrasts with RavenDB, where you formally define the access strategies for specific scenarios - rather than for the model itself. And of course the same access strategy may work in different scenarios, but you're not tempted to assume that a single access strategy is going to work for all scenarios.

You're encouraged to think and make choices about what you're accessing and updating in each scenario, rather than just defining one overriding strategy and charging ahead blindly on the assumption that it'll always just work, or always perform well, or always make updates that are sufficiently small to not cause concurrency problems.

Comments

And from the otherside that's really a problem.
You design Raven-"schema" for specific scenarious. Answering on some questions in google-group you always ask "what exactly you're trying to do", and that's correct question, because the answer differs depending on that.
But well, you've designed something and another day requirements change and your existing schema is already not "the best solution" for that. Isn't that happening?

Relational world is good for the simple design rules, it's basically table per entity and that's it. And it suits nearly all your needs in all scenarious (but yes, fulltext search is way simpler in Raven :))

And domain model usage is simpler with NH. It's just .Load(id) with batch-size enabled, and it covers 70% of use-cases without additional queries/classes.
Working with relations in Raven always make me create new "ViewModel" classes to hold entities and it's relations.. that's adding noise to the code :)

Shaddix,
Um... no. Take a look at modeling concerns in relational databases as well, and they are full of the same scenarios as well. What are you doing impact how you are storing the data.
The major problem is that in relational databases,you often have no choice in how to model the data, because of the schema limitations.

And as for "just do a Load" in NHibernate, that is just not true. I have been doing NHibernate for close to a decade now, and I have been involved in dozens if not hundreds of projects.
That is much more complex than that.

As for creating ViewModel classes, you are likely still trying to model things in a relational way, which is why you are getting this issue

"just do a Load" in NHibernate, that is just not true.
Why it isn't? May be things I'm dealing with are too simple, but it works pretty good. Sure, there are complex parts, where "only one load" is not enough at all, but well, I'm taking about another 50% cases :) It's not that optimal, but if it's not a high-load service then it's worth savings on developer's time.

you are likely still trying to model things in a relational way
let's take a simple example of User entity. It should be a separate entity and I'd like to "join" it in all the places. Sure I can denormalize "just login" and live with it, but it's not that comfortable as with joining, where it's easy for me to output birthdate, lastvisit, etc. in any place where I/UI designer want.

One think I don't understand with RavenDB is that from what I understand you have to create indexes in order to be able to search. But what if one day client asks for report you didn't expect and did not create index for? It happens every month to us. But maybe that's my misunderstanding of RavenDb?

@Karep,
indexes are created on the fly, used multiple times they becoming persistent. If your reports need any sofisticated aggregations, you can always export your data to a relational db and use good old fashioned SQL

The most striking difference between relational and document database for me was the discovery of multi-valued (array) fields and how efficiently you can query them with Lucene compared to SQL (in sql you have to use many to many relationship and one or two level deep nested query just to do a simple 'in' on a multi-valued field)

Yeah, in my experience, NH is only simple so long as what you're doing is simple. For really complex applications, it's the illusion of simplicity.

Too often, it turns out your O/R mapping is based on the structure of the data itself, rather than on expected data-access patterns. "Just do a load" covers maybe about 20% of the cases on the project I've been working on for the past 2 years - the other 80% is painstaking uphill battles with HQL or the Criteria API to try to force it to write the exact query I want.

For comparison, I would say, NH is ultimately flexible in terms of query building and data-access patterns - that flexibility is great for a data-analyst writing one-off SQL queries, mining data for information that the client has asked for. But it is very rarely necessary in software, and it comes at the price of very high complexity.

In contrast, with RavenDB, you define the data-access patterns required by your software, in the form of indexes.

Of course, if you're using NH and doing a good job, you're probably also defining the data-access patterns required by your software, perhaps in the form of services or factory-classes. The difference is, those factory-classes are usually much more expensive to build, to maintain, as well as to run...

Not to be misunderstood: i'm not arguing against Raven at all, moreover, I wish I had a chance to try it on some real project :)
But well said, technology should make simple things simple, and hard things possible. Simple things (it seems, what I'm doing now is all simple) are a bit harder with Raven against NH (and not only because of learning curve).

It's hard to discuss complex scenarios, because everyone has his own story in mind, but map-reduce looks a lot like stored procedures to me, with the same disadvantage, that I get some sort of DTO/ViewModel from it, and not a clean repeatable Domain class. Yes, there's no other way in really complex stories with a lot of data, but with my current experience in Raven, Map-Reduces are very common even in relatively simple cases.

Karep,
Indexes are very easy to create, and RavenDB can build them in the background so even on very busy sites, there isn't any downside (online index building).
In short, you just deploy your index along with your new code, and you are pretty much done.