Thoughts on the Canonical Messaging Pattern

Many BizTalk solutions that I have implemented and worked on have followed the canonical messaging pattern. It’s certainly one of the first things I consider when building a new solution and a concept that I come across often. I would consider it “best practice” to implement such a pattern, given it’s benefits (which I will outline in this post).

As they say, “a picture paints a thousand words” so here is a graphical view of this pattern compared against a solution not implementing this pattern (i.e. a peer to peer (P2P) solution):

As you can see in regards to the canonical pattern (in green/with a tick), documents that are logically equivalent map to a standard application specific format (the canonical format). Lets unpack this statement a little.

The term logically equivalent is specific to our application; for example, external purchase orders in the formats indicated in the diagram above are equivalent in the context of the solution and so map to a standard format internally. This means that in the context of the application, these external purchase order formats are the same and will be processed in the same way. However to stores and suppliers, these different purchase order formats are quite distinct.

Canonical format describes how documents will be represented internally in our solution. In BizTalk, this has to be in XML (since BizTalk uses XML internally to represent messages).

The next question is how do we build our canonical document such that it can represent documents that are logically equivalent but may actually be formatted quite differently? Actually this statement is not quite correct: the canonical document should be created first independently of any external representations (e.g. to represent the essence of what a purchase order is) and then it should be a case of deciding how external representations map to the canonical representation. In the case of BizTalk, this will typically involve writing some XSLT that converts various formats from or to the canonical format.

I have to admit that when I first started out building BizTalk solutions I didn’t immediately grasp the benefits of having canonical representations of messages in my solution. This quickly changed however. Obviously there is a performance hit since every message will be transformed twice but I think this overhead is well justified given some of the benefits it provides below (I have tried to list these in order of importance):

Impact of schema change is minimised – since all messages map to or from the canonical document, if (following our example) a store or supplier decide to change their schema, it will only be necessary to change one map. Compare this to the P2P solution: 4 maps would need to be changed if a store changed their schema and not only that, each supplier would need to contacted and regression testing would need to arranged with each. By utilising a canonical document type, we protect parties from the impact of schema changes.

Minimising impact of change (2) – since orchestrations, for example, will work on the canonical schema, any changes to external schemas will not require orchestration changes and redeployment.

Additional document formats can be added with relative ease – only one new additional map would be required to or from the canonical format. Also it would only be necessary to deal with one integration partner and specific knowledge of all downstream message formats is not required – only detailed knowledge of the new message format and the canonical format is needed.

Reduction in solution complexity – with the canonical solution, 7 maps need to be maintained; 12 maps need to be maintained with the P2P solution.

Here are a couple of caveats that I have come across in respects to this pattern:

There can be only one canonical representation for your logical message type! I recently worked on a solution where Xsd.exe had been used to create classes for the canonical schemas and then these classes where used in the solution orchestrations… As the canonical schemas changed, the classes were not recreated. This can introduce subtle bugs; for example, if you were to assign canonical message 1 (schema) to canonical message 2 (class) in your orchestration, data not defined in message 2 will be lost… So it is definitely best practice to ensure that only one canonical representation is available in your solution.

It is harder to implement this pattern retrospectively, after the solution is in Production. So even if your solution is simple, do yourself a favour and future proof by baking in a canonical schema.

I hope this post demonstrates the benefits of the canonical messaging pattern and why solutions should implement it.

In my experience so far, depending on the stance/clout of the client, unless the client decides that data in the new external schemas would be beneficial to be included in the canonical representation, the canonical schema hasn’t been changed. Rather the owner of the external schema has been required to fit in with the requirements of the canonical schema (this may involve external parties making changes to their systems). If a canonical schema is continually modified to fit external parties, this defeats one of purposes of having an internal representation?

It’s also tempting to change the canonical schema for the purposes of internal plumbing/logic but instead of changing the schema, I would use promoted properties i.e. data that can be associated with the message but not required to be included in the message.

One single internal schema would be quite unwieldy to work with but fully agree with you regarding use of direct binding to the MessageBox database.

Hi James. I wonder how you approach managing changes to the canonical schema over time? In the past I have versioned schema projects to implement new/needed functionality and deployed both the old and new into production. This allows existing projects to continue working with the old schema (provided you specified your assembly version during design) until you are ready to update them.

I’m also curious if you make use of options such as elements to provide flexibility to push system specific data through the canonical from one map to another? Seems like an interesting option …

Hi – in regards to managing changes to the canonical schema over time… As you know, a schema versioning strategy (in general) is an important consideration. In the case of changes that are just additions and mean that previous versions should be processed, I would use an XML attribute to indicate the current version and increment the minor version (for example, 1.0 would become 1.1). So this indicates to the client (in the case of those using a superseded version) that the version they are using is not the latest but also, at the same time, allowing the document to be processed.

Where changes mean that previous versions are no longer valid, I would increment the version number indicated in the schema namespace… For example, I would use the namespace “http://mycompany/schemaname/v2” when previously it would have been “http://mycompany/schemaname/v1“. A decision would need to be made at this point concerning if v1 and v2 should coexist (I would keep each in the same assembly – no need to create another assembly version, which I think is unnecessary). If projects need to work with the old schema (something I would push hard to prevent) then both versions would need to coexist and they would be differentiated by BizTalk by the use of a different namespace.

Of course your versioning strategy would be restricted to company policy but given that canonical schemas are internal to the solution only, we should be able to implement our own versioning strategy.

I’m not keen on the Any element at all :-)… I certainly avoid it in any schemas I am asked to design and build. I consider that a schema represents a well defined and explicit contact between parties and hence using an Any type defeats this objective.