MANY COMMUNITIES are touting XML as a tool that will ease some of their most complex data representation and transport problems significantly. This is an incredible statement when you stop and think about it. A language-one that could have been written at any time in the past-is now going to simplify complex problems in computing significantly. By the sound of it, you would think that XML was an algorithm, or a technology, or perhaps even an entire machine, but it is only a grammar for defining markup languages.

This article will examine the use of XML for asynchronous and synchronous communications. Many software vendors that occupy this space directly or indirectly are already implementing XML support in their products. In the Enterprise Application Integration (EAI) market, which encompasses all types of communications methods, vendors who currently offer XML support or who are working XML into their products include: IBM, Microsoft, TIBCO, TSI Software, Vision Software, Vitria, and webMethods.

With this much support behind the standard, it is certain to prosper and simplify communications among all applications, whether they are intraorganization or extraorganization.

Before going further, let's first debunk some myths about XML.

Myth:

XML is a replacement for HTML.

Fact:

HTML will represent presentation in the Web browser for some time. XML just helps to separate the content from the presentation. Eventually, XML must be mapped to some visual representation for the data to be useful to people. In many cases, this representation will still be HTML.

Myth:

XML is a technology.

Fact:

XML is a language that allows you to define your own markup language that can be processed by a consistent set of tools. There are sets of initiatives that are being designed in coordination with XML, such as XSL, Schemas, Query, XPointer, XLink, etc., but without tools that implement specific behaviors based on these specifications, it is meaningless.

Myth:

XML makes it easier for businesses to exchange information electronically.

Fact:

This one is a partial truth rather than a complete myth. Agreements between people make it easier for businesses to exchange information electronically. This means that two companies can receive the same benefit from using a comma-delimited file as one authored in XML as long as both sides agree on the format in advance. However, XML makes it easier to develop the applications that will implement the agreements made by humans, and furthermore, do so in a way that is extensible and can easily change in time to meet the needs of the business. Whereas a change to the field ordering in a comma-delimited file from the sending company might break the application on the receiving side, XML forces no assumptions to be made for processing about the location of the data that will be fed into the processing applications. The end result is that XML applications tend to be more robust in the face of data exchange.

This last myth is pertinent to the rest of this article, because messaging involves exchanging information between applications, whether they are within the same company or across different companies.

Purposes for XML in Messaging
For the messaging community, XML can be used for three distinct purposes: 1) to define the container for message content; 2) to define the content of the message; and 3) to describe the content of a message. In this section we will look at the implications of using XML for each of these purposes.

First, we will examine the possibilities derived from using XML as the container for a message. When it is acting as a container, XML is ignorant of the data stored inside of it. For XML, this usually means that the data is stored inside of a CDATA (character data) section within one of its elements. A CDATA section allows users to put sequences of characters inside an XML document that will not be examined by an XML parser. That is, all the data in an XML document is evaluated against the rules of the XML 1.0 grammar, unless the parser is specifically told to ignore it.

The following code is an example of data stored in a CDATA section:

inside of this CDATA section,
the parser will not recognize it.]]>

As you can see, XML is well suited to act as a container for any type of data. Note that some random binary sequences could result in the CDATA section termination characters ]]>. To store binary data it is highly recommended to first bin64 encode the binary data.

By using XML as the container, there is a flexible and extensible header that can be utilized by a host of message-oriented middleware products. For example, if a company is forced to use both TIBCO and MQSeries in the same project, they could establish an XML grammar to encapsulate the message, thereby allowing them to distribute the same message throughout both middleware environments. Granted, there will need to be a process written for each of these messaging vehicles that can extract the contents and route according to the XML-defined message headers, but the development of the sending application will be greatly simplified and made more modular by design.

XML message containers are also an excellent vehicle for defining data flow in a workflow application. In workflow applications, there is a docket of information that must be passed through a number of processes, such as approval, transformation, routing, or validation, but as the docket moves between these processes there is a need to capture the output of the processing that has occurred each step of the way. XML makes a great format for aggregating the output of the processing information, while also encapsulating the data in the docket.

The following code illustrates an XML message that could be used in a workflow application:

JP MorgenthalCOMPLETESteve YelityNOT STARTED

Because of XML's hierarchical nature, each of the slips will be processed in order. It is not difficult to see how this message could be used in tandem with Java and some messaging middleware to implement workflow. Each Process tag has an associated name, which could translate directly into the name of a Java class, which in turn could be handed the entire message in XML, or just its content as a stream of characters.

There is a second use for XML in messaging and that is as a format for the content of the message itself. Significant benefits can be gained by marking up message content with a grammar that is relevant to the task being defined by the message. For example, if the message is an event that represents that a specific type of transaction, such as a sale, has occurred, it would be useful for the receiving application to expect a message that uses XML to define the components of the transaction, such as Price, Customer, Items, etc.

Of course, each designer must decide whether the ease of long-term maintenance that the application gains by using XML to define the message content outweighs the simplicity of forcing the sending application to be intimate with the internal structures of the receiving application. For example, two components of the same sales system might use a shared library to define the Customer object. For these two components, it makes perfect sense to just communicate using the customer object structure. However, if each of these components is sitting inside a different application, the requirement to share the customer object structure could eventually result in erroneous and problematic conditions.

Using XML as the message content also enables a more intelligent form of messaging called content-based routing (CBR). With CBR, the messaging middleware uses the message's content to define where it should be routed and which processes should be applied to it. Programming a CBR facility is not straightforward without a message format like XML. Imagine programming a CBR to route on the fourth field of a comma-delimited file. If one application sends a field out of place, the CBR will either route the message to the wrong location or throw an exception.

With XML, these mechanisms can route much more intelligently. Not only can they process without assumptions by programming the CBR to react to a particular element name, but they can also ensure that it happens in the proper context. For example, many purchase orders have both shipping and billing customer information. A CBR could be programmed to act on the Customer element inside of the Shipping element when the Customer element inside of the Billing element is blank, signifying that they are the same. This is far easier and more robust than programming the CBR to use the fifth field if the tenth field is blank, as one would do if one were using a comma-delimited format.

Finally, the last use for XML in messaging is to describe the content of a message. When it is used in this way, XML provides quick and flexible introspection on the message information. It differs from using XML as a container because the messaging system does not necessarily use this information to make decisions about the route or transport that the message needs, but describes the actual message content's schema. Additionally, this metadata may be stored outside of the message, perhaps even in a repository.

Using XML in this way provides message brokers with the ability to understand how to access the document's content. For example, if a comma-delimited message describes an invoice structure, XML may be used to define the fields that are used for billing, shipping, or accounting, and how to pull this data out of the message. In this way, message brokers can perform quick generic transformations on messages to make them suitable for use by other environments.

Of ORBs, RPCs, and the Like
Of course, no discussion of messaging would be complete without examining the class of products that rely on messaging as a foundation. Object Request Brokers (ORBs) and Remote Procedure Calls (RPCs) rely on messaging to invoke functionality located in a remote address space. However, in contrast to the type of messaging we just discussed, in which the message is exposed to the application directly, ORBs and RPCs use messaging transparently.

The messaging that supports ORBs and RPCs is highly specialized. It is a special format that allows for high-performance distributed computing by being able to identify components of a message quickly as coarse entities. Making those entities fine-grained would require a significant amount of additional work to retrieve the message's data. Consider the following comparison.

Here is a sample RPC message (not using any particular distributed object standard):

01 05 Reverse_ 08 F10DCA 02 10 abcdefghij

Here is a sample RPC message based on XML:

Reverse_i8abcdefghij

Both of these messages require some form of parsing, but the former rebuilds the invocation signature as the data is pulled out. With the latter format, there is more work necessary to build this signature. For one thing, strings, such as the attribute number on Parameters, need to be converted back into integers before they can be used; this is not necessary in the former format. This point may seem trivial, but in the scheme of doing hundreds of method invocations for a single task, the overhead can affect performance significantly.

This is not to say that an XML-based RPC does not have value, because it does. It just does not make sense to replace the core formats of CORBA's IIOP or Microsoft's DCOM with XML at this time. However, XML-based RPCs do make sense for the Web.

On the Web there is a fundamental understanding that connections are short-lived entities. We're not going to connect to a Web server and leave that connection open all day. The Web server is going to give us a particular block of information and then terminate our connection. So the Web server needs to be able to provide information blocks at a time.

XML-based RPCs can provide us with the programming metaphor we wish to have during development, which is that we are calling a function locally, but executing it transparently over the network. When it comes to the Web, we will not have the ability to ask a Web server 50 questions (invoke 50 method calls)-the common method of operation in distributed object computing-to get all the information we need. But we can use XML-based RPCs to provide us with the capability to ship the request for information to the server once, and to obtain a result set with all the answers in it. Then we can ask our 50 questions locally.

One benefit of operating this way is that the information does not need to be reflected to the application as one large return set. The XML-based RPC would move objects across the wire by-value and reconstruct the object locally. For example, if a BankAccount object existed on a server in Michigan that supported both CORBA- and XML-based RPC, there would be two methods for accessing the server. Using CORBA, clients could connect to the BankAccount server and send individual method calls to find out balance information and to perform debit and credit transactions, which may or may not work well through the Internet. However, an XML-based RPC could send a request for the remote BankAccount object instance to serialize itself into the XML return message, where it could be reconstructed on the local side. Reconstruction might occur in the form of a local CORBA object, but would work equally well if the object was just represented as a tree of elements in memory and accessed using XML's Document Object Model. Operations would be carried out on the local data set, and then all the changes would be transmitted back to the server for update.

Toward Extra-Enterprise Integration
Charles Allen of webMethods recently suggested the term Extra-Enterprise Integration (EEI) to represent the way to explain to customers that they must do internal integration in the same architecture as external integration. I believe that there is a significant amount of truth to this statement. Many customers still look at sharing information with suppliers, partners, and customers as different from sharing data between departments.

EEI is a data-centric process, and it is usually an asynchronous process. There are reasons for the divergence in architectures for internal and external communications. For one, many internal communications are happening over a local area network within the same building, which allows users to count on continuous connectivity, and thus synchronous processes, to move data between systems. Data moving outside of the company must rely on private dial-up networks, Value-Added Networks (VANs), or Wide-Area Networks (WANs). These systems have reduced bandwidths and less reliability, so it is most effective to move as much data as possible to the receiving application in one shot.

Another reason for the divergence is that within the company, there is less concern for security than there is with outside parties. It is highly likely that applications written within the same organization can gain access to the data from another application directly. Data coming in from outside will usually go through some testing and validation process before it is delivered into the internal systems.

While these two reasons explain why the divergence has occurred, there is a strong argument that the assumptions made about integrating internally are incorrect and will eventually prove disastrous for the enterprise. For example, all systems should validate and test data before it is accepted, and underestimating the need for security between departments opens the doors to internal hackers. By forcing internal applications to be integrated using the same methodology as they would be if they were integrated with external applications, in the long term, companies will be spared interrupted service because of problems that can be caused by more direct integration methods. This will also make the applications more modular and allow them to be integrated more easily with other systems that have yet to be developed or acquired.

XML is an important part of EEI. It enables companies to define standard grammars for messages that will flow between systems. These standards can then be related to their suppliers, partners, and customers if they need to open the system to applications outside of the company.

Conclusion
In this article we looked at the potential uses of XML in tandem with asynchronous and synchronous messaging. We identified that XML can act as a message container, the message format, and as the format for the message's metadata. All three of these uses will provide extensibility and robustness in the face of exchanging data between applications.

We also examined the role of XML in the world of distributed object computing. XML-based RPCs can greatly increase the efficiency of using the Web as a network transport, while still providing the convenient programming paradigm that object models, such as CORBA and DCOM, provide.

Finally, we illustrated the importance of the role of XML in integrating applications both inside and outside of the enterprise. It is important not to assume that all connections will be synchronous and that object servers will always be available. It is also important to understand that developing with a data-centric model can provide additional security by forcing data into and out of application gateways. Architects need to weigh the performance implications of integrating applications in this way carefully against the benefits it provides, which are extensibility and isolation from change.

JP Morgenthal (jp@ncfocus.com) is president and Director of Research for NC.Focus, an industry analyst firm that specializes in EAI. Morgenthal is also a co-author of Manager's Guide to Distributed Environments and the forthcoming Enterprise Application Integration Using XML.