DDD: The Generic Repository

This discussion came up on the alt.net list last night and I had this post about half way done so I agreed to push it to the top of my stack and get it done for today given the timeliness of the information. I apologize that I lied about what the next post would be. I will have the follow up on the last DDDD post next.

The intent of this code is to enumerate all of the customers in my repository that match the criteria of being 19 years old. This code is fairly good at expressing its intent in a readable way to someone who may have varying levels of experience dealing with the code. This code also is highly factored allowing for aggressive reuse.

Especially due to the aggressive reuse the above code is commonly seen in domains. Developers are trained that reuse is good and therefore tend towards designs where reuse is applied. The reuse can be seen two-fold. The first is in the definition of an IRepository<T> interface something like:

Then people using Object Relation Mappers such as Hibernate will tend to make a generic implementation of this interface since the ORM does most of the heavy lifting for them ex: Repository<T>Show me the polymorphism!A main reason why one would favor a generic contract that is possibly specialized such as in the example of IRepository<T> is that one could write code that operated upon IRepository<T> directly to perhaps do things like be a “generic object editor”. That is it uses the various repositories in a polymorphic fashion.

Quite simply, where and more importantly why would someone want to do this? Finding a place or a reason for how this polymorphism would be used is extremely difficult under the guise of domain driven design. Perhaps in some sort of admin interface, but even this would fail to the forms over data complexity test and is likely better off being done in another methodology such as Active Record.

As if the utter lack of necessity of a shared interface were not enough the introduction of such an interface actually causes further issues.

C(?)R(?)U(?)(D(?)Some objects have different requirements than others. A customer object may not be deleted, a PurchaseOrder cannot be updated, and a ShoppingCart object can only be created. When one is using the generic IRepository<T> interface this obviously causes problems in implementation.

Those implementing the anti-pattern often will implement their full interface then will throw exceptions for the methods that they don’t support. Aside from disagreeing with numerous OO principles this breaks their hope of being able to use their IRepository<T> abstraction effectively unless they also start putting methods on it for whether or not given objects are supported and further implement them.

A common workaround to this issue is to move to more granular interfaces such as ICanDelete<T>, ICanUpdate<T>, ICanCreate<T> etc etc. This while working around many of the problems that have sprung up in terms of OO principles also greatly reduces the amount of code reuse that is being seen as most of the time one will not be able to use the Repository<T> concrete instance any more.

Revisiting the intent.What exactly was the intent of the repository pattern in the first place? Looking back to [DDD, Evans] one will see that it is to represent a series of objects as if they were a collection in memory so that the domain can be freed of persistence concerns. In other words the goal is to put collection semantics on the objects in persistence.

The key here is that as a rule there should be no persistence logic within the domain. This allows the domain to be more easily tested, tested independently of persistence and moved easily from a persistence mechanism (this is more important as a long term maintainability goal as opposed to “we want to use XML files now”).

Simply put the contract of the Repository represents the contract of the domain to the persistence mechanism for the aggregate root that the repository supports. The realization that the Repository is less of an object and more of a contract to infrastructure is both a subtle and important one.

The importance of the “Contract”The Repository represents the domain’s contract with a data store (another common word we may use here is that it is the seam). This is extremely important as one can tell every possible way that the domain interacts with the data store by looking at the contract. When it comes time to optimize the database for performance as an example one can look at the repositories and figure out what the domain requires of the data store.

The unfortunately is only useful if the contract is narrow and specific. Consider the conceptual difference between the following.

In terms of figuring out what the contract to the data store actually is the first example gives us no information. It could literally be running any query that could be expressed within the QueryObject (read: any possible query). In order to now figure out what the contract actually entails one would have to go look through the domain (and possibly the UI (ugh) depending on where the query objects originate).

Simply put: allowing query objects to be passed into the repository widens the contract to a point of uselessness.

But reuse is good?None of us like writing the same code over and over. However a repository contract as is an architectural seam is the wrong place to widen the contract to make it more generic.

You will note that nothing has precluded the use of Repository<T> only really of IRepository<T>. So the answer here is to still use a generic repository but to use composition instead of inheritance and not expose it to the domain as a contract. Consider:

The key here is that our seam as exposed to the domain is a very specific architectural seam as opposed to the open/generic seam that allows us to do anything. Composition is used as opposed to inheritance to gain reuse while minimizing the width of the repository contract within the domain.Hint: Minimize the complexity at the entry/exist seams of your domain by making the seams as explicit as possible.

47 Responses to DDD: The Generic Repository

I’ve stumbled across this problem before and solved it by narrowing the contract to specific cases – remember, Repositories are not DBALs – they shouldn’t be provided with queries, its repositories themselves which – basing on domain requirements – are constructing the abstract data queries and pass them into DBAL ( what you called the “generic repository” ) here.

the question is I saw some implementation whereas IList _repository maintained. say when I am constructing my Repository I am filling it with _repository = LoadAllItems() in ctor. Then in my Find(lambda expression) function I do _repository.Where(lambdaEx).

my main headache actually comes when I have subcategory as my customers has orders.. so what is the convenient way of loading? my agregate root is customer should I keep a reference to order as an object or just id number. I know there is a way of achieving lazy loading however I just want pretty plain solution. I mean if I want to follow DDD and do everything through Root then my Customer should actually hold all orders as list of objects.. assume that I have thousands of customers? thank you in advance

one question, maybe a bit off topic or naive… I just was wondering should I have to maintain internal collection in my repository. I mean when I have a function like getallcustomers() should I keep in my memory and further refer to this list.. or my function should only return a list and thats it. in that way my repository in loading data acts just like some DAL function. is not it? could you please help me with this… I got tons of other question however this one of essential ones.thanks.

Can’t we just define our query using the Specification pattern and centralized them into the repository assembly. Then have the Application Service assembly pass the specification query to the Find method on the repository? So, that we have less permutation of Find** methods …

Thanks for writing this. I agree with most of the things you’ve pointed out.

I do have one question, however, about specifying the contract/seam between a domain model and a data store layer.

From what I understand, we do not want to have a “generic” contract (like the example you gave) because it becomes too “wide” and renders it useless as a contract. I agree. However, if we have methods such as “findStudentByFirstName”, “findStudentByLastName”, “findStudentByOrganization”, “findStudentBasedOnHairColor”, etc, etc, won’t we end up with a rather large contract? For a sufficiently large domain aggregate, it’s possible to have hundreds of those very specific “find*” methods.

My only “objection” is that I find a bit too much having to keep all the custom repositories only to constrain the amplitude of the generic one. Services do that for me, and clients only have direct access to them but never to the repositories. So instead of a CustomerRepository I would have a CustomerService making the calls to the generic repository.

I kept on looking at my Repository classes thinking about how I could generalize them, but ended up fighting that urge to generalize because I like the fact that my UserRepository had some methods on it specific to User concerns.

The only difference is that I am using inheritance rather than composition. (eg UserRepository : BaseRepository, IRepository)

I read alt.net mails too. And I think people are confused when they see Repository word in both CustomerRepository and Repository. What I understand is Repository is a contract between DataMapper and Domain Model while CustomerRepository is a contract between Repository and Presentaion Layer( UI). the Repositories are different in terms of purpose of usage. right?

This was a great write up, I just recently came across the Repository pattern while looking through the S#arp Architecture project and I definitely saw the strengths of it (and the perhaps over zealousness of it in some aspects) but using it in delegation seems like a perfect fit.

I just wrote a post where I came to a similar conclusion, but using granular repository traits (ICanDelete, ICanSave etc) to build up an interface that exposes a subset of methods from a mainly-generic concrete implementation.

“A common workaround to this issue is to move to more granular interfaces such as ICanDelete, ICanUpdate, ICanCreate etc etc. This while working around many of the problems that have sprung up in terms of OO principles also greatly reduces the amount of code reuse that is being seen as most of the time one will not be able to use the Repository concrete instance any more.”

Not true – my concrete ProjectRepository subclasses Repository, so it uses all generic public methods. Common methods we do want like GetById() and Save() pass straight through to the generic base. Common methods we don’t want like Remove() aren’t exposed by IPersonRepository, so the domain doesn’t know about them. Heaps of re-use, and without having to write a whole load of little bridge methods from your CustomerRepository to the Repository inside.

I feel the same way in so much as internal repositories should never be exposed to the client (e.g. the one developing in the domain). However, I still see plenty of merit in an inheritance-based public repository model instead of composition.

I’m a firm believer in building for the rule and not the exception, and the examples you provided (non-updatable domain objects, the shopping cart) are the exception – it feels unnecessary to change the repository API for a few objects which may not conform to the rule.

For most data-driven applications, a concrete inheritance-based repository model with an internal generic repository is a perfectly suitable solution. It’s not too advanced for developers unaccustomed to S.O.L.I.D, and it’s flexible enough to support more advanced scenario’s while giving developers their generic juice.

As an example, lets take a newsfeed style service into consideration. Each story involves 1) an author, and 2) a target domain object on which the story is about. We’ll need the target domain object of each story, and this is incredibly easy to do in a simple loop over any number of domain objects when your public repositories are of Repository.

Without the support of polymorphism, I might have to rely on reflection, or still revert back to IReadRepository style of inheritance (which always feels like smell to me).

Also you made a very important point when you said “[make] the seams as explicit as possible”. If you’re going to use query objects, don’t expose them in your contract! Use them the same way as you would an internal repository – the repository exists to say “this is how you can find instances of me”, and providing an arbitrary query mechanism can mean an arbitrary death to your persistent storage (especially if it’s a relational database).

At the end of the day, either solution is equally workable, and there is no one-size fits all pattern. My experience is generally against entirely mutable domain objects, with a few that are not (.e.g categories).

I completely agree with this. It has always bothered me when people implement generic repositories in this way because you are now shifting potentially database specific concerns back out of the repository. By exposing a query object and allowing them to be built anywhere within the domain it makes it impossible to control (and limit) what kind of queries are formed.

Your solution through composition is a nice middle ground that would still reduce a huge amount of repetitive code. Excellent post.

1) Reuse – Couldn’t agree more, I’ve had massive reuse using concrete repositories for years using exactly this sort of technique (though I consider the composed repository to really be just a persistence helper class, an artifact of the implementation).
2) Polymorphism – I agree its not necessarily that useful and if you need it you can get it with concrete repositories (IDelete and ISave etc). I found the polymorphism made my persistence tests cleaner, for example SaveTestHelper took an ISave, but as you said on the forums delegates might have worked too (though I find they can lead to less readable test code and had complaints from other devs about my existing use of them in test code).
3) C(?)R(?)U(?)(D(?) – ICanDelete etc, agree you can’t use Repository but you can still get lots of reuse. I think its a valid approach.
4) The importance of the “Contract” – Nicely put, agree completely. When you say “every possible way that the domain interacts with the data store” its also the way clients of the domain do, which is very useful.
5) Specifications – I think passing them in can be useful, but only if they are constrained (so not just allowing clients to pass in Specification or QueryObject).

On DDDD, it might be worth backtracking and explaining why we need it and where. Understanding the problems you see it fixing for the average DDD project would be very useful.

We have long-since used BOTH the Repository implementation to get our reuse AND an ICustomerRepository to get the best of both worlds.

ICustomerRepository would have the ‘GetCustomerByFirstName(…) method defined on it and all consumers of customers would use just this repository interface only.

CustomerRepository *dereives* from Repository but *implements* ICustomerRepository. This way, it gets the benefit of the reuse of the generics-enabled repository base-class but since consumers depend only on the ICustomerRepository interface, they aren’t ‘aware’ of the .FindAll(…) etc. methods that are actually there on the class.

This gives us both the benefit of the reusable generics-enabled Repository base class AND the intention-revealing methodnames hanging off the interface for the consumers of the repository to access.

Are your domain repositories ignorant to the persistence/ORM technology? Do they just rely on your infrastructure repositories to handle the details or are the infrastructure repositories just for code re-use?

Shaun a better question would be why are you running reports in your domain. Note that by report here I am referring to something that sounds like it should be hitting a read only reporting model as opposed to your transactional model.

Adhoc queries have no place in a domain that is modeling transactional behaviors.

I curious if your domain repositories are ignorant to the persistence/ORM technology and just rely on your infrastructure repositories to handle the details? If so, how do you effectively abstract ad-hoc queries to your infrastructure repositories?

This is how I’ve been doing things until now for pretty much the reasons you’ve posted, but have always had trouble figuring out how to optimize queries (which I why I started the thread on alt.net.) E.g. I can have a CustomerRepository.GetCustomerByLastName used in many different use cases. Some use cases are more perfomant if the customer’s orders are eager loaded and others more performant if not. How do I handle this? The options I can think of are:

1. Udi’s solution of interfaces. (Sorry his site seems to be down so can’t get the link.)
2. One repo method per use case.
3. Having a parameter on the repo method indicating the fetch strategy.