Chapter 5. Common mistakes, problems and anti-patterns

This chapter presents some of the common mistakes and problems people face when writing code
using Axiom, as well as anti-patterns that should be avoided.

Violating the javax.activation.DataSource contract

When working with binary (base64) content, it is sometimes necessary to write a
custom DataSource implementation to wrap binary data that is
available in a different form (and for which Axiom or the Java Activation Framework
has no out-of-the-box data source implementation). Data sources are also sometimes
(but less frequently) used in conjunction with OMSourcedElement
and OMDataSource.

The documentation of the DataSource is very clear on the expected
behavior of the getInputStream method:

/**
* This method returns an InputStream representing
* the data and throws the appropriate exception if it can
* not do so. Note that a new InputStream object must be
* returned each time this method is called, and the stream must be
* positioned at the beginning of the data.
*
* @return an InputStream
*/
public InputStream getInputStream() throws IOException;

What makes this mistake so vicious is that very likely it will not cause
problems immediately. The reason is that Axiom is optimized to read the data
only when necessary, which in most cases means only once! However, in some cases
it is unavoidable to read the data several times. When that happens, the broken
DataSource implementation will cause problems that may
be extremely hard to debug.

Imagine for example[3]
that the implementation shown above is used to produce an
MTOM message. At first this will work without any problems because the data
source is read only once when serializing the message. If later on the MTOM
threshold feature is enabled, the broken implementation will (in the worst case)
cause the corresponding MIME parts to be empty or (in the best case) trigger an
I/O error because Axiom attempts to read from an already closed stream.
The reason for this is that when an MTOM threshold is set, Axiom reads the data
source twice: once to determine if its size exceeds the
threshold[4] and once during
serialization of the message.

Issues that “magically” disappear

Quite frequently users post messages on the Axiom related mailing lists about
issues that seem to disappear by “magic” when they try to debug
them. The reason why this can happen is simple. As explained earlier, Axiom uses
deferred building, but at the same time does its best to hide that from the user,
so that he doesn't need to worry about whether the object model has already been
built or not. On the other hand, when serializing the object model to XML or when
requesting a pull parser (XMLStreamReader) from a node,
the code paths taken may be radically different depending on whether or not
the corresponding part of the tree has already been built. This is especially
true when caching is disabled.

While the end result should be the same in all cases, it is also clear that
in some circumstances an issue that occurs with an incompletely built tree may
disappear if there is something that causes Axiom to build the rest of the object
model. What is important to understand is that the “something” may
be as trivial as a call to the toString method of an
OMNode! The fact that adding
System.out.println statements or logging instructions
is a common debugging technique then explains why issues sometimes seem to
“magically” disappear during debugging.

Finally, it should be noted that inspecting an OMNode
in a debugger also causes a call to the toString
method on that object. This means that by just clicking on something in the
“Variables” window of your debugger, you may completely change the
state of the process that is being debugged!

The OM-inside-OMDataSource anti-pattern

Weak version

OMDataSource objects are used in conjunction with
OMSourcedElement to build Axiom object model instances
that contain information items that are represented using a framework or API
other than Axiom. Wrapping this “foreign” data in an
OMDataSource and adding it to the Axiom object model
using an OMSourcedElement in most cases avoids the
conversion of the data to the “native” Axiom object
model[5].
The OMDataSource contract requires the implementation
to support two different ways of providing the data, both relying on StAX:

The implementation must be able to provide a pull parser
(XMLStreamReader) from which the infoset can be
read.

The data source must be able to serialize the infoset to an
XMLStreamWriter (push).

For the consumer of an event based representation of an XML infoset, it is in
general easier to work in pull mode. That is the reason why StAX has gained
popularity over push based approaches such as SAX. On the other hand for a producer
such as an OMDataSource implementation, it's exactly the
other way round: it is far easier to serialize an infoset to an
XMLStreamWriter (push) than to build an
XMLStreamReader from which a consumer can read (pull) events.

Experience indeed shows that the most challenging part in creating an
OMDataSource implementation is to write the
getReader method. In the past, to avoid that difficulty some
implementations simply built an Axiom tree and returned the
XMLStreamReader provided by
OMElement#getXMLStreamReader(). For example, older versions of ADB
(Axis2 Data Binding) used the following code[6]:

The MTOMAwareOMBuilder class referenced by this code was a special
implementation of XMLStreamWriter building an Axiom tree from the
sequence of events sent to it. The code than used this Axiom tree to get the
XMLStreamReader implementation. While this was a functionally correct
implementation of the getReader method, it is not a good
solution from a performance perspective and also contradicts some of the ideas on
which Axiom is based, namely that the object model should only be built when necessary.

Starting with Axiom 1.2.14, there is a solution to avoid this anti-pattern.
OMDataSource implementations that cannot provide a meaningful
XMLStreamReader instance should extend
org.apache.axiom.om.ds.AbstractPushOMDataSource and only
implement the serialize method.
OMSourcedElement will handle OMDataSource implementations extending this class
differently when it comes to expansion: instead of using OMDataSource#getReader() to
expand the element, it will use OMDataSource#serialize(XMLStreamWriter) (with a special
XMLStreamWriter that builds the descendants of the OMSourcedElement). Note that this means
that such an OMSourcedElement will be expanded instantly, and that deferred building of
the descendants is not applicable. Nevertheless, this approach is significantly more efficient
than using the OM-inside-OMDataSource anti-pattern.

Strong version

There is also a stronger version of the anti-pattern which consists in
implementing the serialize method by building an Axiom tree
and then serializing the tree to the XMLStreamWriter.
Except for very special cases, there is no valid reason
whatsoever to do this! To see why this is so, consider the two
possible cases:

The OMDataSource already implements the
getReader method in a proper way, i.e. without
building an intermediary Axiom tree. To properly implement
serialize, it is then sufficient
to pull the events from the reader returned by a call to
getReader and copy them to the
XMLStreamReader. The easiest and most efficient
way to do this is to extend org.apache.axiom.om.ds.AbstractPullOMDataSource
(available in Axiom 1.2.14), which implements the serialize
method in exactly that way.
There is thus no need to build an intermediary object model in this case.

The getReader method also uses an intermediary
Axiom tree[7].
In that case it doesn't make sense to use an OMSourcedElement
in the first place! At least it doesn't make sense if one assumes that
in general the OMSourcedElement will either be
serialized or its content accessed after being added to the tree. Indeed,
in this case the Axiom tree will be built at least once (if not multiple times),
so that the code might as well use a normal OMElement.

This only leaves the very special case where the OMSourcedElement
is in general neither accessed nor serialized, either because it will usually be somehow
discarded or because the code uses OMDataSourceExt#getObject()
to retrieve the raw data. Even in that case one can argue that in general
it should not be too hard to implement at least the serialize
method properly by transforming the raw or foreign data directly to StAX events written to the
XMLStreamWriter.

[4] To do this, Axiom doesn't read the entire data source,
but only reads up to the threshold.

[5] An exception is when code tries to access the children
of the OMSourcedElement. In this case, the
OMSourcedElement will be expanded,
i.e. the data will be converted to the native Axiom object model.