The amazing adventures of Doug Hughes

One of the issues I have dealt with in the past and continue to deal with with our object management system is the difference between object oriented purity and reality. For example, suppose that I have an Order object, or better yet, an Order Id. Now, my display requires that I show the Manufacturer details for a given Product that is contained in this Order.

In an ideal object oriented world, servers would have unlimited memory and never need a reboot, thus all of our objects and their data would live in memory. Therefore, when I need to do something like getOrder().getProduct().getManufacturer().getName(), there is no problem as all of the data is in memory.

However, when you add a database to the picture for data persistence, you introduce a question. If I treat my low level business objects as pure objects, an Order obviously has no idea what the Manufacturer name is for a given Product within that order. The same would apply to the database design if it is properly normalized. So, when I use the previously mentioned chain of objects, I am potentially making 3 separate queries on the database …. one to get the Order details, one to get the Product details, and one to get the Manufacturer details.

In most cases, these are all very small light queries, so it is not a big problem. However, this bugs me in that we are making three separate queries where a nice simple join would have retrieved all of the necessary data with a single hit on the database. However, if we were to use that join, our data tier is now dependent or aware of the requirements of our controller / view tier which is not a good idea. If the data requirements were to ever change, instead of just changing a service object to return a slightly different set of data based upon objects in the system, we would actually be changing database queries.

So, at this point, I am still sticking with the purer object oriented route and taking the minor performance hit when doing things like this. How do other people handle this? Do you take the performance hit, break encapsulation, or is there some middle ground out there?

Like this:

Related

Comments on: "Object Oriented Purity vs. Reality" (10)

It’s a use case decision. If your making a general use tool for everyone then perhaps the decision would be easier. In general the pure approach has merit. You put it best in the title when you used the word Reality. If performance is an issue perhaps that would be a time to consider refactoring. Peter Bell has a business object he uses to get speed that wouldn’t be considered pure but in some situations would be considered better use case.

Summary of what I am saying… the concept of pure is an ideal that there is an absolute ideal. They haven’t sold me on it being a universal truth yet. They are welcome to keep chipping away at my objects though since these conversations are not conclusive for me either.

It is a good question. Like you said, in an ideal world it would all be in memory. But in reality, and especially in the semi-stateless world of the web, the Order object may only exist for that one request. I tend to prefer the query with the joins because a more realistic example will probably include much more such as shipping address for the order, prices charged, quantities, etc. How many DB calls with that actually end up being?

As for the data tier and view having to be dependent/aware of each other, I think you can only take abstracting so far. You aren’t going to have some uber-view that’s so abstracted it can output employees, product galleries or orders. You’re going to have a view that outputs Orders. The way it outputs it is going to depend on if the data is in a query or objects, but it will have to know something about the data either way.

This is, at heart, why almost every DAO layer I code ends up with a built-in FIFO cache mechanism. Because the reality of the situation is that you probably have the same manufacturer for several products on an order, so fetching it 5x in a row is just silly. In OO terms, all of my DAO CFCs end up extending a CachedDAO base CFC.

But, also recognize that there will never be a perfect solution. You are layering OO on top of an RDBMS. You can make the two talk to each other, but they are fundamentally very different things with very different goals.

(I’m not saying that you should go OODBMS or ORDBMS. Nor am I saying that you should crack into the OO interface that your high-end DBMSes have. I’m just pointing out the impedence mismatch.)

I will voice a solid dissenting opinion against the folks that think that going towards an OO-based query language is a good idea. If you’ve ever seen code that looks like “foo .select([a,b,c]) .from([d,e]) .whereEquals(a,b) .and() .whereEquals(f,0)” then you know what I mean. Down that path lies madness.

@John: It is good to hear some validation even if there is not a better solution out there.

@Matt: I will have to differ with you here. I still don’t believe there is enough reason to my application view layer communicating / depending directly on the data layer. There is too much that should be going on in between and while I am not going to say you can’t ever bend an OO design pattern, I think this is going way beyond that.

@Rick: The caching layer is definitely a help even though it adds another layer of complexity and yes, I have no desire or recommendation to go with an OODBMS.

My first thought would be to say that you could use an ORM to maintain the power of OO, avoid the performance hit of 3 queries, and avoid having to write the SQL joins. But since Alagad produced Reactor, I’d assume you already tried that route, which leaves me wondering if I’m fully understanding the problem you’re describing.

The cache is very important. Beyond that, I think the middle ground is to use an ORM that provides configurable fetch strategies. If you’re pretty sure you’ll usually need the manufacturer’s names whenever you get an order, set the fetch strategy to load all that stuff in the one query.

With a bit more work you can even tune this for different parts of the app – use a different ORM configuration for the billing system than for the ordering system, say.

Then I find it useful to draw a distinction between my core domain model and ad hoc reporting. Unless you can completely silo your database, you’re going to have people running ad hoc queries, ETL jobs, Crystal reports etc etc – none of which goes anywhere near your domain model. A lot of the bulk data extraction that causes performance problems for an OO domain model is actually in this category, and it’s *not* a given that this stuff has to sit within your object model, or even within the core data access layer that serves the object model. So break it out into a separate reporting sub-application, and hand-tune away!

I think this is a situation where a slight variation on the original question is called for: How do you walk the line between OO and data-driven designs while maintaining a practical view of performance (data-driven) and maintenance (OO’s strong point)? It requires a larger view of your application’s needs, and even of your overall service and data tier architecture… which is a strong point of an ORM library of some sort, whether homegrown or turnkey.

Even without a full-bore ORM, using metadata either from the cfdbinfo tag or JDBC it’s possible for your persistence tier to be smart enough to predict your desire for it to prefetch related records and build the correct joins on its own… so if you build an architecture around persistence rather than discrete persistence routines, a big bunch of the problem disappears. Give it a lazy-load configuration option and you’ve got the makings of a solid solution to the problem.

Because ORMs require that they have the relationships defined for them in advance and possibly a lazy-load toggle that allows them to fetch full or partial recordsets independent of the service, controller or view layer. Turn on lazy loading and you’ve got a potentially slower app and more runs to the DB but less immediate interdependence between objects. Turn off lazy loading and your hole shot slows down a bit but you might be running back to the DB far less often.

Toss in a nice, robust caching layer and you’ve nearly arrived at a solution to your question. With a decent cache, at some point during any given span of application uptime you could very well erase any runs to the DB at all save for changes to persistent data… lazy loading notwithstanding. I think the combination of “intelligent persistence” and caching is the only way to solve the problem, honestly.

As for real options, right now it’s home-grown, Reactor, or Transfer… or go big and go Hibernate. In terms of the ColdFusion tools, though, the caching layer is, dare I say this… one of the things that Transfer has over Reactor. Transfer’s cache is very robust and handles this in-memory and completely behind the scenes (though it has an API to it’s cache for hands-on manipulation). While Reactor and Transfer both go in entirely different directions to solve the problem, they both provide the same basic functionality (other than the caching issue) and both do it well.

In any case, it seems to me that a slightly more intelligent, and thus abstracted, persistent layer could be a way to solve the problem. Then again, with today’s server and networking hardware, does it matter if you send 4 queries over a gigabit backplane between quad-processor dual-core servers? 😉

In a large-scale system, like we have with NaturalInsight, we have to compromise on the purity and deal with the reality. Joins are absolutely necessary in order to squeeze out the most database processing value for the money spent.

I, too, wish that I could be more pure and keep the separation perfect. But we found that we cannot realisically do it. Sending a few thousand requests will work, but a few hundred thousand requests will show the weakness, and everything else is affected.