Monday, February 12, 2007

There are some discussions in Spring forum, of late, regarding injection of repositories in the domain objects. And in the context of the data access layer, there appears to be some confusion regarding the difference between DAOs and repositories. A data access object(DAO) is the contract between the relational database and the OO application. A DAO belongs to the data layer of the application, which encapsulates the internals of CRUD operations from the Java application being developed by the user using OO paradigms. In the absence of an ORM framework, the DAO handles the impedance mismatch that a relational database has with OO techniques.

A Look at the Domain Model

In the domain model, say, I have an entity named Organization, which needs to access the database to determine statistics regarding its employees ..

Note the usage of the Spring annotation @Configurable to ensure dependency injection of the employeeDao into the domain object during instantiation. But how clean is this model ?

Distilling the Domain Model

The main problem with the above model is that we have lots of unrelated concerns polluting the domain. In his book on Domain Driven Design, Eric Evans says in the context of managing the sanctity of the domain model :

An object should be distilled until nothing remains that does not relate to its meaning or support its role in interactions

In the above design, the code snippet that prepares the list of outstation employees contains lots of logic which deals with list manipulations and data fetching from the database, which do not appear to belong naturally to the domain abstraction for modeling an Organization. This detailed logic should be part of some other abstraction which is closer to the data layer.

This is the ideal candidate for being part of the EmployeeRepository, which is a separate abstraction that interacts with the data accessors (here, the DAOs) and provides "business interfaces" to the domain model. Here we will have one repository for the entire Employee aggregate. An Employee class may collaborate with other classes like Address, Office etc., forming the entire Aggregate, as suggested by Eric in DDD. And it is the responsibility of this single Repository to work with all necessary DAOs and provide all data access services to the domain model in the language which the domain understands. So the domain model remains decoupled from the details of preparing collections from the data layer.

The DAO is at a lower level of abstraction than the Repository and can contain plumbing codes to pull out data from the database. We have one DAO per database table, but one repository per domain type or aggregate.

The contracts provided by the Repository are purely "domain centric" and speak the same domain language.

Repositories are Domain Artifacts

Repositories speak the Ubiquitous Language of the domain. Hence we contracts which the repositories publish must belong to the domain model. OTOH the implementation of a Repository will conatin many plumbing codes for accessing DAOs and their table specific methods. Hence it is recommended that the pure domain model should depend *only* on the Repository interfaces. Martin Fowler recommends the Separated Interface pattern for this.

Injecting the Repository instead of the DAO results in a much cleaner domain model :

In the next part, I will have a look at the Repository implementations while using an ORM framework like Hibernate. We do not need DAOs there, since the object graph will have transparent persistence functionality backed up by the ORM engine. But that is the subject for another discussion, another day ..

28 comments:

Great post.I've been playing with the Apache Jackrabbit repository (JCR - JSR170) lately. I've heard some refer to it as an object oriented database (though thats not really correct).It does seem like a natural way to go with Spring.Of course, cases where you need to connect to existing schema will probably still require Hibernate.

Here is how I look at it.Organization is an Aggregate (in Evans' terminology), which will have its own repository (OrganizationRepository). In domain services which query organizations, or do any other CRUD operations (e.g. create an Organization etc.), we inject OrganizationRepository. Most of the db plumbing code will be within the DAOs which are injected into the OrganizationRepository abstraction. But, as I mentioned in my post, OrganizationRepository provides domain centric operations to the user.

The DAO layer is closer to the database - the methods which the DAO exposes are supposed to be fine grained ones, which need to have higher reusability. OTOH, repositories are domain artifacts and speaks the language of the domain. They provide much coarse grained contracts to the Aggregates, like in the example, getOutstationEmployees() is a domain method - the DAO should not contain any interpretation of the term *outstation*. This meaning has to remain closer to the domain layer - hence the necessity of a separate abstraction, called the Repository. The example that I posted was a trivial one, where the benefits of a separate abstraction for Repository may not be obvious. But for large applications, they address a neat of separation of concerns.

In summary, DAO methods are finer grained and closer to the database, while the Repository methods are more coarse grained and closer to the domain. Also one Repository may have multiple DAOs injected.

thanks for the interesting post ... it's always good to read of people talking about DDD.

However, I have to disagree with you about where you put the getOutstationEmployees() business method.I think it is not a domain object (Employee) responsibility, nor a Repository one: I'd say it's rather a domain service responsibility.

This is because statistics are rarely responsibilities of domain objects, and repositories should have collection-like interfaces that CONTAIN business logic but DO NOT EXPOSE business methods ... the difference may be subtle, but that's how I see it.

An Organization is a domain entity and it has a number of employees. Hence I feel that the number of outstation employees contained within an organization can be treated as belonging to the abstraction itself. In that sense, it is not statistics per se, but very much a character of the domain entity itself. Hence I kept in the Organization abstraction itself. Anyway, my main intent was to demonstrate that we should inject Repositories instead of DAOs in the domain model, since repositories are closer to the domain than DAOs.

You'll find my comments at http://blog.gdinwiddie.com/2007/02/13/interface-between-domain-layer-and-persistence-layer/ In short, I don't like the use of two queries for a single set of entities. I've offered an alternative based on my usual way of coding.

Although I've never called it a "repository", our Coherence software is used to do just this. The DAO is plugged in via the CacheStore interface, and Coherence becomes a virtual in-memory repository of the domain objects available across the grid.

I came (from google) looking to understand the repository pattern, and how it differs from having a data layer doing the look-ups. I was honestly shocked when I saw your implementation of first loading (!!) all employees into memory, then posing a second query (!!) loading even more employees (!!!) into memory and doing a subtract (!!) on these two sets, while saying that "oh well, this is just some abstraction that has to be made to separate db-plumbing (which it's surely not) from business logic".

Then you read articles like these:http://msdn2.microsoft.com/en-us/vcsharp/bb264519.aspx

and the only question that remains is, where did you take a wrong turn Debasish? Because, face it, you're not using pure SQL, but more likely an OR mapper which already in itself is a very good separation between the data layer and the business layer. You can in for example NHibernate pose HQL-queries that are in the form of object oriented notation. That's so much quicker and smoother and in my opinion a HQL-query in a dao-object in the data layer should satisfy your aim to find all employees without an address just fine. Then you can cache that query if it's posed often enough.

The basic idea of the post is not to determine the best possible query strategy to fetch outstation employees. The idea is to establish the Repository as a valid pattern in the DDD stack. The point that I was trying to make is that DAOs (irrespective of how u implement queries within them) are usually 1:1 with database tables and contain methods which are much fine grained and closer to the database. While the Repository is a coarse level abstraction, usually corresponding to Aggregate roots, which speaks the domain language. Hence the contracts exposed by the Repository are domain specific and may be implemented composing multiple DAOs. I am sure the implementation of getOutstationEmployees() can be implemented much more efficiently using more performant SQLs.

@Kiran:The generic layering works like the following :a) The Struts Action is the Controller layer of your web application. It uses the Domain Service layer for all its action logic. b) The Domain Service layer is at a coarser level and uses domain entities and Repositories for implementation. The Domain Services have Repositories injected through some sort of DI.c) The domain entities, again can have repositories injected to perform domain logic.

As you have mentioned, the action layer has services (singletons) injected, which in turn have repositories injected. This is the normal scenario. The domain entities can also have repositories injected through mechanisms like @Configurable of Spring.

Who says a DAO is relational database specific? "Data access object" sounds pretty generic to me. I've always thought of it as something that could be backed by an XML store in memory, a CSV on desk, a relational database, a hashtable, anything.

Where did Martin Fowler recommend this approach? I'm reading Eric Evans's DDD and did not find a conversation about different interfaces between the model and the repositories, pros and cons. Your post was helpful, thanks.

as i see it , it appears more of an issue of deciding if a method resides in the business tier (i have not yet read DDD so will not risk using the term "domain model" incorrectly) or a data access tier. The questions i ask is , is the method CRUD oriented. If yes then i put/refactor it in the DAO. In this example , it does appear to be one. But also , there are CRUD methods that need to be exposed as business methods. This being one . so its quite okay to have it in the bsiness object too.

If we encoutner such a method , i think we shud have the plumbing code in the DAO. DAO being the reusable object , by other business obejcts , and the one to be injected.

But let us think of a case where a method is purely business in nature. It cannot be put in the DAO . In such a case , should we have an abstraction layer , such as a repository , and which i feel is the main question asked in this article.

I am somehow not very convinced about its usage, sicne a DAO alone can provide all the benfits if designed correctly right from the start. The author has made some premises , which i dont know are right or are so hard and fast as the author assumes :1) DAO's are allways relational in nature , and provide CRUD for just one table , rather than for a domain model entity by acessing multiple tables if required. But if i am faced with a design that is in production with DAO's that are frozen for some reson (i also saw a tool which generated DAO's from a database schema , so if we want to leverage such tools and save time )and there is a need to add a method that actually belongs to the DAO tier , then i would add a Repository Layer which are object oriented.

The diff b/w first and the final code snippet is we moved the implementation code i.e data access and list diff stuff to another layer of abstraction but the repository is still part of the domain. Empirically speaking if the justification behind creating a layer of abstraction is to allow 2 things to change independently, i see that possiblity b/w DAOImpl and the rest of the domain model(i.e we may plugin in a different data layer etc). Within the domain model the main attraction of the method seemed to be the data and the interaction logic encapsulated together, by moving the implementation code out aren't we going against that theme and creating more code. Kindly explain.

1 more issue is see here is in the recommended practice of having fine-grained methods withing DAO that are more aligned to the db, although i see merit of that argument, but the issue there is if suppose we were putting a method like getOutstationEmployees() method directly on DAO, we could have leveraged the full power of SQL and got our result in 1 shot as opposed to doing it over multiple steps that looks rather stiff. Another related thought is the DAO interface looks more like a set of canned SQL queries, with ORM and their own query languages that guarantees portability(especially now with JPQL) can the whole idea of DAO be rethought ?? for most of our requirements (say some kind of data query) the "business logic " basically consists of composing such query and returning its result. The execution part could be taken care by something like entitymanager injected (I am thinking ejb3 here).

Regarding the first question of the additional abstraction in DAO and Repository, all I say is that everything depends upon the complexity of the application and the domain model. Many people favor the approach of keeping a 1:1 mapping between a table and the DAO (assuming we are talking relational systems, DAO is equally applicable for non relational stuff also) and use Repository as a higher level of abstraction. i.e. a Repository can abstract multiple DAOs and this makes sense when you can model a repository closer to a domain abstraction using DAOs for implementation. Another group of people tend to use the DAO itself as the higher level abstraction and use a 1:n mapping between the DAO and the tables. The main idea of this post is to have a higher level abstraction for the data container that you inject into your domain model and keep implementations abstracted within them. For example in one of my projects I used a NameRepository that had two DAOs - one consisting of a relational table Name and another abstracting an LDAP repo.

Regarding "having fine-grained methods withing DAO", yes, I tend to use DAO as a lower level abstraction and build methods of higher granularity (closer to the domain) at the Repository. You are correct .. this may lead to some inefficiency since I could have used one performant SQL instead of multiple queries. This is all a compromise between modularity and performance. I start with a focus on the former and then when some of the areas need blazing performance I fall down to optimize and club them into a single SQL. Remember not all parts of your application need blazing performance .. for the ones that need it, you can compromise on the modularity.

thanks, debasish ghosh.Here is my view. The differences between repository and DAO are not only on the source code, but also on semantics. At the first,I was always confused about these two concepts.if you have a interface called DAO, you can also write your different implements injected to your domain, That's not the different point.On my way ,I was keeping the DAO pure responsibility for storing the domain, In contrast，the repository can be added more logic like caching,being related to another DAO. In my experience, two interfaces here were fine.some code like these: