We're on Facebook

Traditional Approach

Prior to coding the final resultant code, the key design question is how to maintain multi-table derivations: the Customer Balance and the Order Total. We can

normalize the data (which omits redundant attributes) or

denormalize for performance.

Recognize that this choice, made early on in the project for each multi-table derivation, has significant effects on the code we write. So we must consider it carefully. The sections below explore both alternatives.

Normalized Data

This is the "pure" approach, often advocated by Database Administrators (DBAs) since it avoids data inconsistency due to programming error. This is an entirely logical position, since it is terribly easy for a minor oversight to result in the feared programming error.

Aggregate Performance Issues

While attractive for data integrity, this can lead to some serious performance issues. Worse, these issues will not show up until you test with large amounts of data. This may not occur until the end of the project, when many pieces of code have presumed this approach.

Aggregate Pruning

In many cases, your transaction alters child data in a way that does not affect a sum or count (e.g., Lineitem.notes in BusLogicDemo). Hibernate support does not recognize this important case, and recomputes the parent data regardless of whether your actual update really affects the sum/count.

Aggregate Cost: many child rows

The BusLogicDemo database, like most development database, is tiny. You won't ever notice performance issues.

But as you start to undergo load testing (or, go live...), it can become apparent that the cost accessing all the rows to add them (see alternatives, below) can be unacceptable.

Sadly, this can happen easily. Imagine an Analyst asking a Business User: "does a Customer have many orders?" Answer: no. True enough, but a few customers might have many orders, and that is the bulk of our transactions. Again, you might not discover this until you are going live.

Aggregate Cost: Chained Aggregates

The aggregate cost becomes particularly serious when aggregates depend on other aggregates. For example, in BusLogicDemo,

Without adjustment, the system would need to materialize eachPurchaseorder.total - n aggregate queries (where there are n orders for a customer).

The implications are even more extreme when there is recursive nesting. Changing a Department Budget might literally require loading all the rows for that table into memory.

Aggregate Design Alternatives

Even this "simple" approach presents some non-trivial design choices. There are 2 approaches for deriving an aggregate.

Read the Data into Memory

This can be a good choice if there are multiple aggregates ranging over the same child data (e.g., a sum of Order.total, and a count of unpaid orders). The first will load up the Hibernate cache, and subsequent related aggregates will compute from the cached data.

It can be a bad choice if there are many child rows.

Issue an aggregate SQL

The alternative approach is to issue the Sql Select statement. This deals better with large result sets, but requires considerable care to avoid sending multiple (unbatched) queries to the server.

Declare a Hibernate Formula

Hibernate provides very attractive support for formulas. While attractive (it is declarative, so there is practically no code), these have a number of restrictions that make them inappropriate for business logic:

They only deal with transient data (so no adjustment possibilities)

They do not handle chained aggregates

They are only materialized on initial retrieval, so are not responsive to subsequent transaction activity

They can only refer to stored data (e.g, not a formula)

Denormalized Data enables Adjustment Logic

A very common solution to the performance dilemmas posed above is to denormalize for performance: store the Customer balance as a column. That way, our Business Logic can perform adjustment processing, described below.

Adjustment Processing

Instead of reading all the Purchaseorders (and their Lineitems!!) on each request, Adjustment Processing means: adjust the stored value by the change. In particular, we avoid bringing all the rows into memory, or issuing aggregate queries. Updates to related data become simple one-row updates.

This is where some of the code explosion really starts to occur. Remember, with Business Logic, we were done once we declared the Customer.balance rule.

Note that adjustment processing requires that we have access to old and new value, so we can detect changes.

Design Alternatives

We're not done yet, not nearly. Even once we decide to employ adjustment processing, we have to decide where.

Data Access Objects

A clean and compelling alternative is to add insert, update and delete methods to our Domain Object POJOs. But this is not as simple as it seems, since our adjustment logic requires old and new values.

It is not so easy to obtain old values in Hibernate, and when you do, you need to address a series of restrictions and side-effects. For these reasons, most of the implementations we see steer clear of these issues by utilizing Service Objects.

Service Objects

Another common approach is to provide separate Service Objects for each case. So, for our Customer Balance, we might define:

payOrder (orderId)

reassignOrder (orderId, newCustomer, oldCustomer)

changeItemQuantity (itemId, newQuantity, oldQuantity)

changeItemProduct (itemId, newProduct, oldProduct)

And so forth. Observations:

We "solved" our old/new value problem by pushing it back to the Presentation Layer, which now must arrange to save old values so they can be submitted as arguments

Our APIs do not deal with multiple changes (recall changing the Product, the Quantity and the Purchaseorder). It is not trivial to make sure these are totally independent, so we might winding up intruding into the User Interface to disallow multiple types of changes.

Service Implementation

Our analysis suggested that the Service Objects design would be the most common, and representative. We provide a brief overview below.

User Interface: single change save

We made a significant simplifying assumption: each user change submits a transaction. Certainly Hibernate and Business Logic are specifically designed to handle multi-row transactions, but this approach helps you track the spreadsheet-like effect with isolated changes, and avoids a whole class of programming complexity of analyzing multiple change types (e.g., a change to a Purchaseorders' Paid flag, and its Customer).

Request Processor

This class is invoked by the JSP page. It analyzes the request (what object, what row, and what attribute was changed), and invokes the appropriate Service Class. In the diagram above, RequestProcessorManual identifies a change to a Purchaseorders' paid flag, and invokes order.setPaid (a Service method, shown in the right code window).

Service Classes

We created a Service Class for each Domain Object (Customer, Purchaseorder, Lineitem). Its methods process each type of change, making the appropriate changes to related Domain Objects and other Service Classes. In the screen shot, SPurchaseorder#setPaid adjusts the Customers balance.

Total Code

Focusing just on the code replaced by the 5 rules, we wrote approximately 500 lines of code. This is of course in addition to the design time to work through the issue discussed above.

Class

Lines

Notes

RequestProcessorManual

(135)

Not counted, since similar amount required for Business Logic version. But note that this might be eliminated with a good UI framework that automated read/write of domain objects.

SCustomer

139

SPurchaseorder

159

SLineitem

155

SProduct

65

Code Analysis: not pretty

Not only was there quite a lot of code for what ought to have been a very simple problem, but this design resulted in code that most would judge to be rather brittle. First, the logic of a given object (maintain balance) is spread over multiple Service Methods (setPaid, setAmount, setCustomer and so on). This means the overall logic is tough to grasp, which means it is easy to miss oversights (bugs). Worse, it will not be particularly easy to maintain.

Our second issue is documentation. Given the "exploded" nature of the logic, it is important that the design intent be clarified, and traced to the multiple pieces of code that implement it. The pressures on documentation stemming from time pressures and writing aversion are well known.

Final Thoughts

Basically, what a mess. We have performance considerations that might not be clear until going live. Just scanning the table of contents, we have multiple design alternatives, none of which share any code if we change our mind. And all but one of the alternatives involves many pages of code.

And all we wanted was to make the balance be the sum of the unpaid order totals. One of the most common patterns of data processing, so we get to confront this hundreds of times per system.

Makes you think, hmm... there's got to be an approach that is simple (fast!), guarantees database integrity, resolves Design Alternatives, and enables you to change the access strategy as performance considerations become clear.