Web Development

A Real-Time HPC Approach for Optimizing Multicore Architectures

By Aljosa Vrancic and Jeff Meisel, March 22, 2010

Complex math is at the heart of many of the biggest technical challenges. With multicore processors, the type of calculations that would have required a supercomputer can now be performed in real-time, embedded environments.

Aljosa Vrancic is a principal engineer at National Instruments. He holds a B.S. in electrical engineering from the University of Zagreb, and an M.S. degree and PhD in Physics from Louisiana State University. Jeff Meisel is the LabVIEW product manager at National Instruments and holds a B.S. in computer engineering from Kansas State University. Courtesy Intel Corp. All rights reserved.

Because tasks that require acceleration are so computationally intensive, your typical time high-performance computing (HPC) problem could not traditionally be solved with a normal desktop computer, let alone an embedded system. However, disruptive technologies such as multicore processors enable more and more HPC applications to now be solved with off-the-shelf hardware.

Where the concept of real-time HPC comes into the picture is with regard to number crunching in a deterministic, low-latency environment. Many HPC applications perform offline simulations thousands and thousands of times and then report the results. This is not a real-time operation because there is no timing constraint specifying how quickly the results must be returned. The results just need to be calculated as fast as possible.

Previously, these applications have been developed using a message passing protocol (such as MPI or MPICH) to divide tasks across the different nodes in the system. A typical distributed computer scenario looks like that in Figure 1, with one head node that acts as a master and distributes processing to the slave nodes in the system.

Figure 1: Example configuration in a traditional HPC system.

By default, it is not real-time friendly because of latencies associated with networking technologies (like Ethernet). In addition, the synchronization implied by the message passing protocol is not necessarily predictable with granular timing in the millisecond ranges. Note that such a configuration could potentially be made real-time by replacing the communication layer with a real-time hardware and software layer (such as reflective memory), and by adding manual synchronization to prioritize and ensure completion of tasks in a bounded timeframe. Generally speaking though, the standard HPC approach was not designed for real-time systems and presents serious challenges when real-time control is needed.

An Embedded, Real-Time HPC Approach with Multicore Processors

The approach outlined in this article is based on a real-time software stack, as described in Table 1, and off-the-shelf multicore processors.

Table 1: Real-Time Software Stack.

Real-time applications have algorithms that need to be accelerated but often involve the control of real-world physical systems -- so the traditional HPC approach is not applicable. In a real-time scenario, the result of an operation must be returned in a predictable amount of time. The challenge is that until recently, it has been very hard to solve an HPC problem while at the same time closing a loop under 1 millisecond.

Furthermore, a more embedded approach may need to be implemented, where physical size and power constraints place limitations on the design of the system.

Now consider a multicore architecture, where today you can find up to 16 processing cores.

From a latency perspective, instead of communicating over Ethernet, with a multicore architecture that can be found in off-the-hardware there is inter-core communication that is determined by system bus speeds. So return-trip times are much more bounded. Consider a simplified diagram of a quad-core system in Figure 2.

Figure 2: Example configuration in a multicore system.

In addition, multicore processors can utilize symmetric multiprocessing (SMP) operating systems -- a technology found in general-purpose operating systems like Windows, Linux, and Mac OS for years to automatically load-balance tasks across available CPU resources. Now real-time operating systems are offering SMP support. This means that a developer can specify timing and prioritize tasks that are applicable across many cores at one time, and the OS handles the thread interactions. This is a tremendous simplification compared with message-passing and manual synchronization, and it can all be done in real-time.

Real-Time HPC System Description

For the approaches outlined in this article, Figure 3 represents the general software and hardware approach that has been applied.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!