Design

Data-Centric Architecture: A Model for the Era of Big Data

As businesses struggle to handle large volumes of data rapidly, technologies used in industry and by the masters of the universe on Wall Street are suddenly becoming fashionable...again

Wired and wireless communication networks are making data collection and transmission cheap and widespread. In the future, networks will weave many devices and subsystems into complex integrated distributed systems that will become the fabric of business and daily life.

Building such distributed systems is far from simple, however. They must be assembled from independently developed software components. Integration, especially combined with real-time performance demands, becomes the key challenge.

This article outlines fundamental design principles that enable integrating distributed systems from components. I use a data-centric approach to this design, as the data is the key element that must flow through the various systems.

The key to data-centric design is to separate data from behavior. The data and data-transfer contracts then become the primary organizing constructs. With carefully controlled data relationships and timing, the system can then be built from independent components with loosely coupled behaviors. Data changes drive the interactions between components, not vice versa as in traditional or object-oriented design.

The resulting loosely coupled software components with data-centric interfaces are then integrated into a working system through a data bus. The data bus connects data producers to consumers and enforces the associated Quality of Service (QoS) contracts on the data transfers. This design technique is naturally supported by the Data Distribution Service (DDS) specification for real-time systems, which is a standard from the Object Management Group. Implementations of this standard are available from many vendors.

The techniques described here are proven in hundreds of mission-critical applications including robotics, unmanned vehicles, medical devices, transportation, combat systems, finance and simulation.

A Future Distributed System

To understand the dynamic nature of next-generation distributed systems, it is helpful to examine a representative scenario: an air traffic control system. The future air traffic control integrates a variety of disparate systems into a seamless whole — a system of systems. On the edge is a real-time avionics system inside the aircraft. The control tower in the center of the figure communicates with the avionics system, and then out to the enterprise data servers at the airport. The system thus comprises connectivity from the "edge" (devices) to the "enterprise" (infrastructure services).

The data in the avionics system flows at high rates and is time-critical. Violating timing constraints could result in the failure of the aircraft or jeopardize safety. Although aircraft traditionally operate as independent units, future aircraft must integrate closely with automated traffic control and ground systems.

The control tower is another independent real-time system; it monitors various aircraft in the region, coordinates their traffic flow and generates alarms to highlight unusual conditions. The data flowing in this system is time-sensitive for proper local and wide-area system operation. However, the system may have a greater tolerance for delays than the avionics systems.

The control tower communicates with the airport's enterprise information systems, which track flight status and other data and may communicate with multiple control towers and other enterprise information systems. It is also responsible for synthesizing a dashboard view containing passenger, flight arrival and departure status information. Because it is not in the time-critical path, the enterprise information system can be more tolerant of delays than other systems.

Key Design Challenges

This so-called "system-of-systems" must deal with a many issues, such as correctly handling myriad differences in data exchange, performance, and real-time requirements. The architecture also involves different technology stacks, design models, and component lifecycles.

To support system growth and evolution, the integration must be robust enough to handle changes on either side of an interface. To do this, only minimal assumptions should be made about the interfaces between systems—the interface specifications should describe only the invariants in the interaction. Behavior can then be implemented independently by each system; the interface between them should not include any component-specific state or behavior. This avoids tight coupling.

The systems on either side of an interface may differ in quantitative aspects of their behavior, including differences in variations in data volumes, rates, and real-time constraints and so on. The term "impedance mismatch" is shorthand for all the non-functional differences in the information exchange between two systems. Critically, a developer can capture these non-functional aspects of the information exchange by attaching QoS attributes to the data transfer. When QoS terms are explicit, responses to impedance mismatches can be automated, monitored, and governed.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!