This is part of the series of posts about the Whiteboard Chat project. In this post, I am going to explore the data access methodology and why is it so painful to work with. In particular, I am going to focus on the problems arising in this method:

You can clearly see that there was a lot of though dedicated to making sure that the architecture was done right. Here is the overall data access approach:

So are using the Specification & Repository patterns, which seems okay, at first glance. Except that the whole system is seriously over engineered, and it doesn’t really provide you with any benefits. The whole purpose of a repository is to provide an in memory collection interface to a data source, but that pattern was established at a time where the newest thing on the block was raw SQL calls. Consider what NHibernate is doing, and you can see that here, one implementation of an in memory collection interface on top of a data store is wrapped in another, and it does nothing for us except add additional code and complexity.

As an aside, even ignoring the uselessness of the repository pattern here, I really don’t like that the FindOne() method returns an IQueryable<T>, and I don’t really see a reason for that there.

Then there is the issue of the Specification itself. It is not quite pulling its own weight. To be frank, it is actually sinking down the entire ship, rather than helping any. For example, let us look at the GetPostLastestForBoardById implementation:

Are you kidding me, all of this infrastructure, all of those nested generics, for something like this? Is this really something that deserves its own class? I don’t think so.

Worse than that, this sort of design implies that there is a value in sharing queries among different parts of the application. And the problem with that sort of thinking is that this premise is usually false. Except for querying by id, most queries in an application tend to be fairly unique to their source, and even when they are the actual physical query, they usually have different logical meaning. For example, “select all the late rentals” is a query that we might use as part of the rental system home screen, and also as part of the business logic to charge people for late rentals. The problem is that while on the surface, this query is the same, the meaning of what we are doing is usually drastically different. And because of that, trying to reuse that query in some manner is actually going to cause us problems. We have a single piece of code that now have two masters, and two reasons for change. That is not a good place to be. You’ll probably have to touch that piece of code too often, and have to make it more complicated than it would have been because it has dual purposes.

You can see an example of how this is problematic in the method above. In my last post, I pointed out that there is a SELECT N+1 issue in the code. For each of the loaded posts, we also load the Owner’s name. NHibernate provides easy ways to fix this issue, by simply specifying what is the fetch paths that we want to use.

Except… this architecture make is drastically harder to make something like that. The repository abstraction layer hides the NHibernate API that allows you to specify the fetch paths. So you are left with a few options. The most common approach for this issue is usually to re-architect specifications so that they can also provide the fetch path for the query. It looks something like this:

Note that this design allows you to specify multiple fetch paths, which seems to fix the issue that we had in our API. Expect that it doesn’t really work, even identical queries near one another (say, in different actions of the same controller) usually have different fetching requirements.

Oh, and we haven’t even talked about the need to handle projections.

Okay, so we can do it differently, we can let developer the option to configure the fetch paths for each query at the location where that query is made. Of course, you then get the following architecture:

If you think that this is ridiculous, I agree. If you don’t think so… well, never mind, I was told I can’t actually revoke someone’s Touch-The-Keyboard license.

But let us say that you have solved even that problem somehow, probably by adding yet another abstraction layer on specifying fetch paths. At that point, your code is so abstract, it deserve being put in the Museum for Modern Art, but I’ll let you get away with it.

The problem is that fetch paths are actually only part of the solution for things like Select N+1. One very powerful option that we have with NHibernate is the use of Future Queries, and we can utilize that feature to significantly reduce both the number of time that we have to go to the database and the size of the data that we have to read from it.

Except… this design means that I am pretty much unable to implement this sort of behavior. I will have to drastically change the system design before I can actually do a meaningful performance improvement.

I see this sort of problem all the time when I am doing code reviews. And the problem with this is that it is usually a stop ship issue because we literally cannot fix things properly without doing a lot of very painful changes.

Now, allow me to explain the underlying logic behind this post. Reading from the database is a common operation, and it should be treated as such. There should be very few abstractions involved, because there isn’t much to abstract. For the vast majority of cases, there is no logic involved in read operations, and even when there is, it is most often cross cutting logic. The answer that I give for things like "well, so how can I apply security filtering” is that you throw that into an extension method and do something like this:

This make it very easy to review such queries and see that the appropriate actions where taken, and this type of action is only required when you can’t handle this as truly a part of the infrastructure.

It is also important that I am talking specifically about read operations here. For write operations, the situation is (slightly) different. Write operations usually have business logic, and you want to keep that separate. I would still want to avoid doing anything too abstract, and I would certainly not introduce a repository for writes.

What I would do is either create a service layer for writes that handle the actual business of doing writes, but (and this is important), I would only do so if there is actually logic to be executed. If this is merely a simple CUD operation that we are executing, there is really very little point to going with a service. Yes, that means using NHibernate directly from the controller method to save whatever it is that you are editing, assuming that there is nothing that requires business logic there.

To be frank, for the pieces that actually require business logic, I would probably prefer to avoid an explicit service implementation and move to using an internal publish / subscribe (event publisher) in order to have a nicer internal architecture. But at that point, it is more of a preference than anything else.

To summarize, do try to avoid needless complexity. Getting data from the database is a common operation, and should be treated as such. Adding additional layers of abstractions usually only make it hard. Worse, it creates, in your own application code, the sort of procedural operations that made it so hard to work with systems using Stored Procedures.

You want to avoid that, it was a bad time, and we don’t want to go back there. But code such as the above is literally something that you would write for those sort of systems, where you have only the pre-defined paths for accessing the data store.

It is no longer the case, and it is time that our code started acknowledging this fact.

Comments

One reason for the repository/specification pattern is that I have an interface or two I can isolate for testing.

How would you unit test with this style (directly using NH's ISession whenever you need to talk to the DB)?

For the trivial example above, an integration test with an in-memory DB would do the trick, given a fair bit of setup, but what happens when your controller is more complex, with multiple steps and db reads and writes?

I always says, "why use Repositories with ORM?" I never saw any advantage and always had problems with projections and eager loading you mentioned. Each Action/View need a projection, generalize the query is a waste of resources.

Always used this standard because everyone said it was right and would help me in the future.

To me, your opinion is certainly one of the most important among all developers, Thanks!

Is the publish/subscribe approach you mention a synchronous mechanism, similar to domain events? Also, is it scoped by user action (eventually using nested containers), so as to enable, for example, error handling?

I really like the idea of using NHibernate session directly instead of IDao or IRepository implementations. The top most argument against doing so is, that there is no easy way to switch the persistence mechanism anymore. But I can't see a reason why I would want to do that. IMHO this kind of decision should be made up front and should be stuck to. Not after I started writing production code.

Could you please explain in more detail what the part with the event publisher is about. And how do you manage transactions? I read a lot of times that you should always use transactions with NHibernate; even if you're only reading data.

I agree with all points made here. Repository discourages encapsulation and results in a wide DAL-like facade to the database. In practice the queries are rarely reused indeed. NHibernate / EF already implement repository pattern, why wrap it in yet another one? Specification pattern is already implemented in LINQ, Disconnected Criteria, and QueryOver, why wrap them in yet another one?

For those who question having too much logic in the Controller I suggest encapsulating that in ViewModel (or Presenter) that would be an abstract model of the screen, and leave the HTTP/JSON-related stuff to the controller. The ViewModel will then be a more cohesive unit of functionality that can be unit-tested effectively.

For those concerned about unit-testing NHibernate directly: you can use TypeMock but if your application is data-access-heavy you're really fooling youself. Just get a live local DB and run each test within MSDTC transaction. Your questions are welcome (I have a contact me form on my website).

It looks like a lot of mind reading went into creating this post. That's not to say you don't have any valid arguments, but I see where you've made assumptions about why a certain approach was taken where I could make a different assumption. E.g., re-usability vs testability or re-usability vs structuring.

Perhaps you'd give more weight to your critique if you provided an alternate concrete implementation showing how you think it should be done? That might spawn a productive conversation.

You give vague generalities about what you would do differently, but those are "interview answers."

I Totally agree, but this is an hardest thing to be understood from the coding folk.. Everytime I say that this is (even if it's the TRUTH) a kind of "coding eaven", in the "real world" of day-by-day coding everyone take the "Tower Architecture" as the normal way of coding, they think this is the premium and coolest one! Talking to them about this is like speaking two different languages without translation!

Frankly I have to say to you ayende, that you in an oldest post, you talk about encapsulating linq queryes with some kind of "filtering and preparing logic" in specific classes, and my question is, this is not like using specifications? I hate specifications for the same reasons you said..

I agree to almost all you wrote - Repository is useless layer in most cases, especially when you have data source that supports IQueryable because Repository == IQueryable + CUD. So then we can just use ISession, DataContext or ObjectContext directly; or create thin layer over these "Repositories". And don't use Specification (use LINQ queries directly).

But if you want to use some data source that doesn't support IQueryable (i.e. some NoSql database, Cassandra in my case) then Repository layer (with Specifications) is very useful.

Btw. your DAO (~IQueryable) leaks into your Domain (Specification) and this stinks a little bit for me - but if we suppose IQueryable as Repository then it's ok (but over-engineered and useless as you wrote :)).

Anyway, haven't used NHibernate, but as i understand, it allows quite easy to mock things problem is, that not all orm frameworks do that - let's say Linq2Sql, if i'll use datacontext directly in code, my tests will become a nightmare, or won't they? Someone mentioned using database in memory, but I think there isn't any for Linq2Sql?

@Alex Simkin: I think Ayende is thinking of the DB querying and DB inserts/updates as two distinct things (correct me if I'm wrong) - the querying happens via the ISession directly, but the insert/updates happen via a service/repository/DAO class that could encapsulate retry logic. Or would you also have retry logic for queries?

While NHibernate is easy to mock because ISession/ISessionFactory and query objects are all interfaces, it might not be the case with other ORMs such as LINQ-to-SQL.

The other issue is repository and unit-of-work abstractions reduce the entry barrier for developers who are not familiar with NHibernate and are used to developing with MS ORMs. It made a difference on whether NHiberate was accepted by the management as an ORM choice for me at least once.

The specification pattern is often overused. I only use it in cases where the lambda expression for the where condition is difficult to understand. It makes it easier to for IT to work with ubiquitous language with, say, the accounting department. Obviously, there are other ways to create named queries but we are talking LINQ-to-datastore here.

I had a conversation with someone who was looking at a project of mine and said "Where is the data abstraction layer?" and I said "NHibernate is the data abstraction layer," which prompted a mildly horrified stare.

If something is really complicated, I put it in a named query and write a little in-memory-database integration test for it, or I make an IComplicatedQuery object that I can stub out. But for simple stuff, NHibernate already does it all.

Guys, Is there a blog/tutorial anywhere (pref mvc) that shows the techniques Oren is talking about. I have always thought that repositories gives me extra bloat esp when you have lots of FindByXXX, GetByXXX etc.

by using repository pattern you can simply replace the nhibernate with something else without changing other parts of the app.

using SQLite as an in memory unit testing tool is wrong! for example it does not support foreign keys. yeah those are there but those are not applied! you have to write triggers ... and it does not support lots of other features.

I don't want to memorize lots of internals of NH. so I warp them in a repository and I will use it on and on!

Yes over the top architecture is a fools job security, yes every developer should learn to keep things simple!

But a concern with your prescribed approach is that the ORM is baked into the entire application, while this may be fine for simple applications, for an enterprise application this is brittle.

I am sure a lot of people working on brownfield projects know the pain of refactoring out typed datasets, or custom legacy database code to make use of the much more efficient unit of work pattern or ORM tools. The way I see it we are shortsighted and irresponsible to not make use of the right abstractions and services so as the next generation doesn't feel the pain we felt by baking in the most efficient tool available at the day, because I assure you the better way of working, without ORM is just around the corner, and when it comes I do not want to be sifting through controllers updating code, I would rather be focusing on the data access layer alone.

I agree with your assertion completely. Having a repository abstraction layer when used in conjunction with ORMs like nhibernate is just needless and a sign of bad abstraction. They donot provide for any useful technical benefits. As such they just acts as noise making the system difficult to understand and thus maintain or refactor.

How do you manage the NHibernate session and transaction when security becomes involved in an MVC application? I want to load my user in the PostAuthenticate method and also check permissions in the view. Is it okay to have a single NHibernate session and transaction open from PostAuthenticate to after the view result has executed?

I must agree with you that sometimes it's best to access ISession rather then the repository, specially for queries. And after reading your previous post my question is what is your strategy for other repository actions like save for instance. For sure it's not desirable to let every developer write it's own save method, specially in large business applications, and sometimes it's not as straightforward as a single call to Session.Save for instance, so how do you encapsulate that logic outside a repository?

I do have one argument for the Repository pattern, however. One thing I like about it is that you can tailor it more to your domain as a layer of the data access model which can make it easier for users of your applications business logic layer easier to understand. For example, if you have a readonly table - don't expose a Save method. Only relevant access methods are exposed, and gives a better understanding of the underlying data.

Further, the prefetch paths are easy to overcome by exposing enumerations of allowed prefetch "queries". These can be passed as params and is not rocket science to implement.

Finally, the implementation of this layer takes away the underlying data source dependency - this has been invaluable to me in the past when switching an application over from a relational database to a document database - my application didn't have to change at all.

All in all, I agree with your sentiments but I'm not prepared to throw out the repository pattern quite yet!

I would like to offer a mild defense of repositories. First I want to wholeheartedly agree that the generic repository and the repository-per-class pattern are useless crap.

I have a medium-size Windows Forms application that is about 10 years old (was originally in Delphi) and does not use MVP/MVC or any UI pattern consistently. Extracting the data access methods into repository classes was the only practical way for me to introduce integration testing. I also find it helpful for code organization, and I don't care if each method is only used once. The ISession is used directly for Get/Load but anything more involved (HQL or Criteria API) is in a repository that requires an ISession in its constructor (transactions are controlled by the UI). This works great for us.

Thanks Ayende, what about opening the transaction? Can there be a single transaction from PostAuthenticate to the OnActionExecuted or should I just open multiple transactions?

For security I am evaluating Rhino.Security and trying to see how I can get it to work in the following situations from the view. i'm trying to bake some conventions into my security framework so that I don't need to keep specifying them throughout my project.

Is it just me that can't spot the unit tests in the Effectus and Alexandria code? I'm not sure I buy into the "unit testing" by using sqlite - there are enough differences between sqlite and "real" sql databases that you WILL have unexpected bugs.

And it's not really unit tests any more, but instead integration tests which is usually considerably slower. It might not matter in a small application, but it would quickly become too slow to run the tests all the time.

Do you have some examples that is developed without repositories in a TTD way?

In-memory sqlite testing may not be perfect, but what are you comparing it to? A mocked repository is even more likely to behave differently.

I agree with the sentiments of the article, but I also agree that it would be so much better to see real examples of how it could be done better, rather than yet another "I'm smart, because they did it wrong" post.

Why are you picking on a student's project? The student in question has done a lot of things right, and of course a project is "over engineered" in a university project - why wouldn't it be. At university you're supposed to try large scale techniques on small scale projects to experiment and learn.

The student has a promising career in front of him, but your public humiliation over a series of posts might hurt his career.

I advise you to use applications from seasoned professionals as examples. I've tried your NHibernate Profiler, and must say it's buggy (doesn't catch all queries), extremely slow and cpu intensive - why not write about your mistakes in creating that?

You act as if a free code review by one of the best developers on the planet (and associated discussion by a lot of professionals dealing with the same problems) is a bad thing! This is one of the best things that could happen to what already looks like a very promising student.

And I'm not sure what your code is doing with NHibernate but my code is pretty bad with hundreds of Select N+1 issues and NHProf handles it with ease. You must have same narly queries in there to be taxing NHProf!

@Paul Cox It IS a bad thing when the angle is way off base. A student project doesn't aim to be a commercial project, and shouldn't be judged or criticised as one. I'd like to know how you'd feel if one of your student projects were dissected as if it were an enterprise application.

My point is very clear and valid; fight boxers in your own weight class, or above. There is no honour in beating a lightweight to death.

As for your NHibernate problems, N+1 issues are normal if you don't know what you're doing. I use an SQL profiler to monitor what NHibernate really does. When using multi criterias/futures, filters and so on, NHProf isn't always up for the task.

I would have loved to have had my university projects reviewed by one of the top programmers in the industry. You can guarantee I would be writing better code now.

I disagree that this is "beating a lightweight to death". Ayende is just showing some common problems in MVC applications using NHibernate by using an open-source example. Commercial enterprise applications tend not to be open source but if you are willing to post your code that would be really helpful because as you can see from the comments, a lot of professionals struggle with the issues highlighted in this series of blog posts and you seem to know what you are doing.

Finally, I'm not sure if you've done this already, but if you save your NHProf session to a file and send it to Ayende, I'm sure he will fix any issues and that would help everybody.

Dunno what kind of apps you who all clap to Ayende write, but specification pattern for queries and service layer for complex updates has served me rather well over the time. Heck, even dao with app level interface ( the thing most of you call repository here ) Read operations are DB specific (fetchpaths for a start) and isolating this from the controller keeps latter sane plus makes query optimizing/avoiding duplication easier. Note: I usually work with apps that are maintained over several years

[Btw. your DAO (~IQueryable) leaks into your Domain (Specification) and this stinks a little bit for me - but if we suppose IQueryable as Repository then it's ok]

Augi, IQueryable actually isn't Data Acess, it's a form of specification pattern. e.g. Intention. Of course the specific IQueryProvider IS data access usually but as long as the interface abstraction is leveraged then it's not at all leaky.

I like idea of Code Symmetry:
aspnetresources.com/blog/achievingcodesymmetry. When the query is placed just in the controler then the symmetry is disturbed:). So aggree with Clayton. Repositories are not about reusability but about structuring and testability.