Enterprise class code

There’s a natural instinct to assume that everybody else’s code is an untidy, undisciplined mess. But, if we look objectively, some people genuinely are able to write well crafted code. Recently, I’ve come across a different approach to clean code that is unlike the code I’ve spent most of my career working with (and writing).

Enterprise-class Code

There’s a common way of writing code, perhaps particularly common in Java, but happens in C#, too – that encourages the developer to write as many classes as possible. In large enterprises this way of building code is endemic. To paraphrase: every problem in an enterprise code base can be solved by the addition of another class, except too many classes.

Why does this way of writing code happen? Is it a reaction to too much complexity? There is a certain logic in writing a small class that can be easily tested in isolation. But when everyone takes this approach, you end up with millions of classes. Trying to figure out how to break up complex systems is hard, but if we don’t and just keep on adding more classes we’re making the problem worse not better.

However, it goes deeper than that. Many people (myself included, until recently) think the best way to write well-crafted, maintainable code is to write lots of small classes. After all, a simple class is easy to explain and is more likely to have a single responsibility. But how do you go about explaining a system consisting of a hundred classes to a new developer? I bet you end up scribbling on a whiteboard with boxes and lines everywhere. I’m sure your design is simple and elegant, but when I have to get my head around 100 classes just to know what the landscape looks like – it’s going to take me a little while to understand it. Scale that up to an enterprise with thousands upon thousands of classes and it gets really complex: your average developer may never understand it.

An Example

Perhaps an example would help? Imagine I’m working on some trading middleware. We receive messages representing trades that we need to store and pass on to systems further down the line. Right now we receive these trades in a CSV feed. I start by creating a TradeMessage class.

I’m a good little functional developer so this class is immutable. Now I have two choices: i) I write a big constructor that takes a hundred parameters or ii) I create a builder to bring some sanity to the exercise. I go for option ii).

Now I have a builder from which I can create TradeMessage classes. However, the builder requires the strings to have been parsed into dates, decimals etc. I also need to worry about looking up Assets, since the TradeMessage uses the Asset class, but the incoming message only has the name of the asset.

We now test-drive outside-in like good little GOOS developers. We start from a CSVTradeMessageParser (I’m ignoring the networking or whatever else feeds our parser).

We need to parse a single line of CSV, split it into its component parts from which we’ll build the TradeMessage. Now we have a few things we need to do first:

Parse the timestamp

Parse the amount

Parse the trade type

Lookup the asset in the database

Now in the most extreme enterprise-y madness, we could write one class for each of those “responsibilities”. However, that’s plainly ridiculous in this case (although add in error handling or some attempts at code reuse and you’d be amazed how quickly all those extra classes start to look like a good idea).

Instead, the only extra concern we really have here is the asset lookup. The date, amount and type parsing I can add to the parser class itself – it’s all about the single responsibility of parsing the message so it makes sense.

Now – there’s an issue test driving this class – how do I test all these private methods? I could make them package visible and put my tests in the same package, but that’s nasty. Or I’m forced to test from the public method, mock the builder and verify the correctly parsed values are passed to the builder. This isn’t ideal as I can’t test each parse method in isolation. Suddenly making them separate classes seems like a better idea…

Finally I need to create an AssetRepository:

AssetRepository

public Asset lookupByName(String name)

The parser uses this and passes the retrieved Asset to the TradeMessageBuilder.

And we’re done! Simple, no? So, if I’ve test driven this with interfaces for my mocked dependencies, how many classes have I had to write?

TradeMessage

TradeType

TradeMessageBuilder

ITradeMessageBuilder

CSVTradeMessageParser

Asset

AssetRepository

IAssetRepository

TradeMessageBuilderTest

CSVTradeMessageParserTest

AssetRepositoryTest

Oh, and since this is only unit tests, I probably need some end-to-end tests to check the whole shooting match works together:

CSVTradeMessageParserIntegrationTest

12 classes! Mmm enterprise-y. This is just a toy example. In the real world, we’d have FactoryFactories and BuilderVisitors to really add to the mess.

Another Way

Is there another way? Well, let’s consider TradeMessage is an API that I want human beings to use. What are the important things about this API?

TradeMessage

public Date getTimestamp()
public BigDecimal getAmount()
public TradeType getType()
public Asset getAsset()
public void parse(String csvline)

That’s really all callers care about – getting values and parsing from CSV. That’s enough for me to use in tests and production code. Here we’ve created a nice, clean, simple API that is dead easy to explain. No need for a whiteboard, or boxes and lines and long protracted explanations.

But what about our parse() method? Hasn’t this become too complex? Afterall it has to decompose the string, parse dates, amounts and trade types. That’s a lot of responsibilities for one method. But how bad does it actually look? Here’s mine, in full:

Related

Post navigation

11 thoughts on “Enterprise class code”

I’d go for the 7+-2 rule: Classes with more than 7+-2 attributes or methods (besides getters/setters) are too complex to be understood/remembered in some seconds. Same goes for methods with more than 7+-2 statements, packages with more than 7+-2 classes or subpackages, components with more than 7+-2 packages, layers with more than 7+-2 components, architectures with more than 7+-2 tiers.

So try to reach 7+-2 in every layer an you’ll neither have too complex code or design.

But anyway, this discussion is IMHO academic – usually teams have no problems with their tiny or small classes. The problems are usually in those 10 or so classes & methods that are real huge. No matther how many small & tiny classes support them.

Very true – it’s those few really stinky classes that kill you every time. But, there’s also an exercise in trying to avoid adding too much technical debt as we add new code. I’d always believed that lots of small classes were preferable; but I can see cases where a few larger classes can be better and potentially introduce less debt.

I like your rule of thumb – not heard that applied to class size before. Although I disagree – I think classes should be the “right size” (whatever that might mean). If my class reasonably has 12 attributes then I’d rather that than introduce three new classes to introduce some arbitrary level of separation. But the rule of thumb is probably a good guide.

I definitely agree with you that we don’t want the functionality smeared across different classes. That would be shotgun surgery code smell then. I know what you are talking about, I had a similar experience during a code retreat.

Although both solutions might be derived test driven, the second one seems to focus more on the domain. Your requirements for the trading middleware did not list a builder and any technical class you created in the first place (although the decisions are quite sound). When you focus on the problem aka the domain aka the requirements, you are more likely to end up with something simple, that does exactly what you need. 12 classes don’t seem simple compared to 4. I think that’s the cool think behind BDD as it drives you more towards solution number 2, because you always focus on the requirements.

You might be right – although I worked on a team that religiously did ATDD and the codebase was littered with classes named in the implementation domain, not the problem domain. But in general ATDD might help you stay focused on the underlying problem – which I couldn’t/didn’t really do in my toy example. Although it’s still amazingly easy to get sucked into technical nonsense with BuilderManagerFactories…

I agree. The amount of code written at times to acheive not very much is staggering. Not sure why one would need the IBuilder interface, why not a static inner class? A pattern which much of the google code base follows, albeit with a few tweaks is (coming from the java world):

This keeps the number of classes to a minimuim, separates the concerns of object creation and object use, doesn’t clutter up the domain class, and if you want to grab a copy of the domain class and modify it you can via:

You get objects which are immutable, still able to be modified, multiple ways to create them, and with only two classes. Internally there may be some other classes used for creation if required, for example if the object is built from a custom binary format etc.. but these are hidden away within the builder. Should usually have a nice api wrapping custom binary formats anyway to make this trivial.

Hmm that’s an interesting approach. In a previous life I started a pattern of naming classes Foo.Builder, for much the same reason – it adds less clutter to the type browser / namespace. The trouble is you end up with a long primary class – because it is both a Foo and a FooBuilder. For me the thing to always question is whether that length is preferable to the alternative… long classes are not /always/ wrong.

You do, but, if you put the builder right at the bottom, always, then when you get to it you can ignore it. You know what it does, all it does is instantiate the thing at the top. So unless you’re interested in the instantiation part, it can be ignored. IMHO IDE’s should be cleverer and present this in a better way.

The way I see it is, if I need to fix something, can I find the correct place quickly? Are all the bits I need to know right there together? For instantiation it’s the builder, which tends to be pretty thick, for domain logic it’s the class the builder is in. Since no setters in the domain class it’s also smaller so all you’re left with is the domain logic. There is also less tendency to just add this little mutator cos I need to modify it in this one little way… (path to hell for you laddy…). All this goes in the builder.

Ideally a codebase should be obvious, or at the very least consistent, with the inconsistent bits clearly highlighted, and only being a fraction of the whole. Then when you look through the code all the unteresting, standard bits go away as you’re used to ignoring it. So by using the builder pattern in a consistent way you stop seeing it after a while. Domain objects stop exposing all their fields, and you don’t end up with ridiculously long ctor’s.

If the behavior changes per feed i.e. csv or xml then I would begin separating the parser.
The parser will be responsible for building up properties Map “same as your last parse example” which will hold the variables and the map would get set on the TradeMessage class.

Hmm interesting – ordinarily I would agree. But I wonder now if it’s simpler to implement as two parse methods on the TradeMessage class? One that parses as CSV, taking values by index; and one that parses as XML taking values by xpath.

What’s the better approach? Half a dozen classes to handle different parse logic and separating everything, or a more complex TradeMessage class that handles all the logic internally.

What about having a TradeMessage parser (TradeMessageParser/Worker.class) with 2 parse methods on it; csv and xml?

It will be the worker for TradeMessage.
TradeMessage cannot live without it. It is the same as having the 2 parse methods on TradeMessage but the code sits in the worker.
In the end you have 2 classes which gets the work done, no implements or extends other classes; TradeMessage.class and TradeMessageParser/Worker.class

Rentius – nice idea, you don’t get a mass of classes but still get some separation.

I do wonder what real benefit this gives you over having those same methods inside TradeMessage. The separate Parser also requires the setXxx methods to be available, whereas if the parse methods are on TradeMessage we can hide them.

It’s a surprisingly tough balance to strike – but it’s interesting to look at different alternatives.