The Data-Centric Modus Operandi

The opinions expressed by this blogger and those providing comments are theirs alone, this does not reflect the opinion of Automated Trader or any employee thereof. Automated Trader is not responsible for the accuracy of any of the information supplied by this article.

Data distribution is about observing a changing world. A
system whose communication is based on this paradigm tends to
become data-centric: it becomes more concerned with
modeling the first-class concepts of its business domain and less
concerned with managing second-class "who-told-whom-to-do-what"
middleware concepts like queues and messages. Along the way, it
enjoys the benefits of decreased coupling and improved
reliability, scalability, and performance.

Data Distribution and Its Kin

Classically, messaging is an evolution of the remote
method invocation (RMI) paradigm - an attempt to make that
paradigm less coupled and more scalable by making it
asynchronous. A message says "I tell you to do this." When
compared with RMI, "I" and "you" are more abstract, both in
identity and multiplicity, and the request can be queued for
processing at a later time or by another party without making the
sender wait. These are improvements, but the interaction remains
coupled, because the roles of "I" and "you" (often in the guises
of "client" and "server" or the trendier "service consumer" and
"service provider"), as well as the intention of what action
should be performed, are still very much in play.

Eventing, like data distribution, is preoccupied with
changes to the world. An event says "I changed in this way." It
reduces coupling by entirely removing both the recipient of that
information and any notion of intention from you business logic
and your mental model; who might receive an event, and what they
might choose to do as a result, are not the business of the event
source. But state management remains a problem, because in order
to understand the change that occurred, all recipients must have
an up-to-date understanding of the state of the world prior to
the latest event - "the price went up by a dollar" doesn't do me
any good if I don't know what the price was before. This temporal
coupling means that every recipient must process every event in
order, whether those events are interesting or not, just in case
the interpretation of a subsequent interesting event should
happen to require the state established by a previous
otherwise-uninteresting one.

The resulting processing and state management are complex and
expensive. As a mitigation, they are frequently factored out of
the applications that need the data and into state-management
"servers" that "clients" must query using a message-centric or
even RMI-based approach - a huge regression in engineering
practice! The system becomes complicated by the presence of
multiple interacting communication paradigms, and the servers
(which serve no business role) introduce performance and
fault-tolerance choke points.

A data-centric architecture eliminate these problems by
simplifying the interactions. A data sample says simply "the
world is like this." It thereby eliminates coupling not only in
terms of source, recipients, and their intentions, but also in
terms of time. There's no longer any need for recipients to
process or store information they don't care about, because
samples don't implicitly encompass previous samples. Therefore it
becomes perfectly reasonable for one observer to examine the
state of the world every second, or every minute, or every hour -
and for another to observe every single intermediate state, even
if those states change from one to the other many times a second.

Modeling the World with DDS

A set of DDS entities, and the data they distribute and manage,
define a view into this changing "world."

A "domain" defines the boundaries of the world, the set of
information that a collaborating group of applications might find
interesting. A "domain participant" defines the presence of some
application in that world; it is the data-centric analogue to
what is frequently known as a "connection" in the messaging
middleware.

A "type" is a structural description of some part of the
world - for example, an Antelope is brown in color and has
four legs and two horns; a Ferrari is red in color and has four
wheels and two seats. A type has a formal definition, usually
(though not always) in a declarative language like XSD or OMG
IDL, and it implies a corresponding definition in the target
programming language.

A "quality-of-service" (QoS) definition defines the fidelity
with which some party/parties is/are able to describe the world.
For example, will the description contain every state the world
passes through or only a subset? Will observers have access to
new states of the world only, or will they be able to see
previous states as well? If the latter, how far back will those
previous states go?

A "topic" defines some aspect or subset of the world
consisting of similar objects. As such, it combines a type, which
defines the structure of those objects, with a QoS definition,
which defines how they can be observed to change.

An "instance" defines a single object in the group defined by
a topic. For example, a topic may be used to distribute the
positions of airplanes as detected by a radar. Each plane would
be an instance. All radar tracks have the same structure (type)
and are updated in the same way (QoS). But they are also distinct
from one another: it matters whether the plane at a given
location happens to be American Airlines flight 123 or Delta
flight 456.

A "data writer" defines a source of information about a
particular subset of the world (topic). As such, it may override
the QoS of its topic - multiple parties may provide information
about the same part of the world but with different degrees of
fidelity.

A "data reader" defines an observer of a particular subset of
the world (topic). As such, it may also override the QoS of its
topic. Furthermore, it may only be able and/or interested to
observe certain states of the world. For example, it may only be
interested in airplanes flying over a particular geographic area
or in stocks trading at over $20/share.

By creating a data reader with a certain QoS definition, an
application makes an affirmative statement that it wishes to
observe a certain portion of the world under a certain set of
circumstances. For example, it may state that it is interested in
observing the most recent five states (samples) to the objects
(instances) in its part of the world (topic), but it doesn't need
to process changes more frequently than once every second.

This statement is one of interest only; it in no way requires the
observer to actually observe a certain set of samples in a
certain way or within a certain period of time. On the one hand,
the observer may choose to be notified asynchronously of every
new sample and to respond to it immediately. On the other, it may
"go away" to other business and return hours later; when it does,
it will find the most recent five samples of each instance,
occurring no more frequently than once every second, waiting for
it. In the mean time, DDS will have taken care of all of the
necessary data reception, filtering, and replacement in order to
make that happen.