Tuesday, April 14, 2009

Most businesses run on a suite of diverse applications. Some have been written in house, some have been written by outside suppliers and some are bought as shrink-wrapped products or COTS (Commercial Off The Shelf). They run on different platforms using different technologies. They range in scale from Excel spreadsheets, created and used by non-technical business users, to large scale enterprise applications based on server farms and created by teams of IT professionals.

For example, one of my recent clients used Sage for their accounts, Salesforce.com for the sales team, and RedDot for content management. They also had a suite of applications written in-house. The older ones were VB/SQL client-server systems, with the more recent web based ones being written with .NET. There were also many business processes based on spreadsheets maintained by individual users. Some processes’ data even lived on whiteboards and paper.

The problems that arise from such a disparate suite of applications are well known; you can find them in most enterprises.

Users often have to re-key data, copying it from the screen of one application to input fields in another.

The organisation has more than one source of the same information. For example, a customer’s address might be different on different systems.

Because there is no single source of information, important details are often missed. A customer might be given preferred status on one system, while they are marked as having a legal dispute with the company on another.

It is very hard to gather management information. We might not be able to say how many customers with orders above a certain amount always pay late.

Business processes are manual and add-hoc. They can be lost when a key member of staff moves on. They are often carried out differently at different times by different people; confusing customers, creating inconsistent data and introducing ‘bugs’ into the system.

So what do we do to solve these problems? We try and make the disparate applications talk to each other.

Integration Pathologies

But making the applications talk to each other introduces a whole new set of problems. Most naïve approaches fail primarily for two reasons:

The first is not correctly decoupling the applications. The communication implementation involves the applications knowing far too much about each other’s innards. The most pernicious (and common) form of this is when one application directly accesses another’s database. This reaches an apogee of awfulness when stored procedures execute cross database joins. Multiple applications accessing shared business components can also be a sign of this pathology.

This style of communication quickly becomes a tightly coupled mess. It can very easily stop the possibility of any changes being made to the individual applications as developers recoil against the complications of refactoring all the known (and often unknown) disparate pieces that rely on schemas, stored procedures and other internals. “We can’t touch system-x because who knows what might break”.

In the worst cases, such as when the integration targets the tables of shared databases, business rules can be replicated many times over. This makes it extremely difficult to change them and adds to the forces fossilising the organisation’s software.

The second common pathology is direct application-to-application communication. This is a natural consequence of doing application integration ad-hoc. If we think of an application as a node in a network, it’s easy to see that each additional node requires a new set of connections equal to the number of existing nodes. The task of integration will get successively more complex as we add new systems that require integration.

Each connection requires its own mapping and access to the joined application’s internals. This leads to duplication of effort and business rules.

Before long we find that our integrated applications, rather than creating the wonderful joined up business we envisioned, have made things even worse. The integration effort itself takes up progressively more resources, the tight coupling between applications makes it very hard to change anything and the duplicated business rules and diverse mappings mean that we face an avalanche of poor, inconsistent data.

The trick to doing integration successfully is to connect the applications in such a way that they remain decoupled. This is the core secret to doing Service Oriented Architecture well. We decouple applications by hiding them behind well defined service interfaces. We never allow them to interact directly with each other’s internals.

By hiding each application behind a well known interface, we can allow it to change internally without having those changes propagate throughout the organisation. Each application can remain agile and responsive to business needs.

We control the proliferation of connections by making each application talk through a common Enterprise Service Buss (ESB). We then have only one interface to worry about for each application. We make each application talk a single canonical language that is shared throughout the organisation. Now we only have to manage one mapping per application, between the application’s internal representation and the canonical message schema.

The canonical language should consist of messages that are relevant to the business process. We should avoid service interfaces that exhibit CRUD style APIs and instead build an event driven ESB that exchanges coarse grained messages styled as ‘business events’. Changes that are relevant to the organisation are published by applications where the change is sourced, and subscribed to by applications that need to know about the changes.

Messages should be asynchronous and atomic. We need to avoid the situation where one application needs to synchronously call another application to source some data in order to complete an operation. A message should carry all the information needed to complete a business event and messages should not be enrolled in transactions.

We should not concern ourselves with duplicated data between applications, so rather than having a single list of countries, for example, held in a single service, we should be relaxed about duplicate lists held in each application. See Bill Poole’s posts on Centralised vs. Decentralised Data here, here and here.

Integration is hard

Even with all these principles in place, integrating business applications is still a complex task. It is not to be undertaken lightly. We need a relentless business driven focus on integration design and a clear and well communicated vision of how to achieve it. This is doubly hard when we try to disentangle an existing web of poorly designed tightly coupled interactions as described above. However, done well it can be the springboard to a far more flexible organisation.

4 comments:

"We make each application talk a single canonical language that is shared throughout the organisation."

It is practically impossible to come up with a single canonical language that everybody in the organization agrees on. That's because different parts of the organization give different meanings to the same term.

'Customer' is treated differently in Sales, Billing, Shipping, etc. They care about different fields. Same with things like 'Product'.

Part of the service oriented approach of having autonomous business services is that each defines its own language and meaning unambiguously. Mapping is still needed, but its done at a higher level than the application.

Events are expressed according to a given perspective, and not generically. This gives stability, version tolerance, and many of the other goodness SOA promises.

It's hard to cover it all in either a blog post or comment, but I hope that helps a bit.

You can find recordings of some of my conversations with Bill Poole here:

+1 for giving Bill Poole's blog some love...but why has he stopped posting?? Good post overall but I also stopped at the "single language" proposition.

Udi's point about a canonical language is what I took from Bill's posts on centralized vs. decentralized data. It not just about replication of reference data, you also can't shy away from "repeating" concepts from different problem domain perspectives.

My experience is that the canonical language is the received wisdom approach for most SOA practitioners, but to me it makes no sense. Because:

1) it presumes a clarity of organizational vision I've yet to encounter in reality.

2) it assumes future stability once you've discovered your organization-wide concepts, which is never the case. People say they'll adapt the concepts as the system evolves but that is impractical.

3) it results in each system being coupled to irrelevant concepts: what does a CustomerService Customer have in common with a Sales Customer?

4) it impedes adoption because it requires enterprise-wide coordination and affects every system.

That said, it's an interesting problem to maintain referential integrity across decentralized applications. And, some concepts (e.g., Customer Profile) might be authoritative. But mappings and aggregation of services is the responsibility of the application consuming the services. ESB mappings are for message transformation -- a translator between service domain concepts.

Decentralizing services meets a lot of resistance but, IMO, the centralized approach is short-sighted hubris and neuters the whole "autonomous" aspect of services.

Yeah, I think the impulse we all share is to try to do concept classification and observe some form of concept-DRY principle. But, it's not just that I haven't seen it done, I don't see how it could be done. In other words, I think the impulse is misplaced when it comes to services.

If you place primacy on autonomy and decoupling, you start to see that you can couple yourself to an irrelevant conceptual domain as easily as you can to a service implementation. My attitude about SOA is influenced by Evan's DDD book. Domain languages have boundaries and you adapt to cross those boundaries. Trying to conceive a Platonic ideal of a Customer, for instance, fails because there's no business obligation to ensure attributive consistency across domains.

A illustrative problem domain is insurance contracts, where concepts such as Contract undergo state changes as they move from potentiality to actuality -- and that's an industry with a fair degree of standardization. I believe Bill Poole has some posts on just such an SOA architecture.

Code Rant

Notepad, thoughts out loud, learning in public, misunderstandings, mistakes. undiluted opinions. I'm Mike Hadlow, an itinerant developer. I live (and try to work in) Brighton on the south coast of England. Please don't mistake me for an expert in anything. I love technology and programming, but make no claims to be any good at it. Much of what you read here may be poorly thought out, wrong, or just plain dangerous.