Taming concurrency in heterogeneous computing

One of the most time consuming and costly aspects of programming a multicore system comes from taming the often-problematic beast of concurrency. Concurrency is when multiple parts of an application are executed at the same time and it can lead to non-determinism – the ultimate headache for software developers. Problems can be tackled in the design stage more easily or with increased costs and delays in the verification stage. Luis Murillo, VP of Engineering at Silexica takes a look at this topic and discusses solutions for a problem that gets more challenging as computers become more complex.

Understanding the intended and unintended interactions between hardware and software components is changing as architectures become more heterogeneous and interconnect topologies get more complicated. This affects performance, power consumption and cost of the project. Lots of universities are researching this from a software perspective, while a small number are looking at the implications from the hardware side. Very few are analyzing this from a combined software/hardware viewpoint which is crucial given the new levels of relationship expected between the two.

As our modern systems rely more prominently on “multicore muscle” and heterogeneity, devices become more prone to concurrency-related bugs that only occur at system level. These bugs are increasingly hard to reproduce, understand and fix, and in fact it isn’t uncommon to hear of devices that get shipped with bugs that only manifest after several days or even months of operation. For example, this can be caused by a harmful software data race not seen by programmers beforehand, or due to a rule violation of a hardware synchronization protocol that wasn’t covered during verification.

When HW events deeply interact with SW, such as in drivers or embedded applications, buggy interactions can occur and manifest at any level, ranging from the HW protocols, through to the OS-privileged space, to the user application.

Concurrency very often leads to a non-deterministic environment which is as difficult as it gets for engineers and developers, because it takes many tries and careful consideration of the different combinations of inputs and environment settings, control and data flow, execution schedules, components that behave stochastically (e.g. a cache, a branch predictor in a CPU) and many other factors that aren’t under the control of developers.

To complicate this more, traditional debugging or profiling methods that are normally intrusive, also alter the system state and sometimes even contribute to concealing the bugs. If you have an intrusive methodology, ideally it provides the ability to uncover more bugs and not conceal them.

Taming concurrency requires mastering advanced concepts from the distinct domains of hardware and software. Therefore, it mutates easily into an interdisciplinary endeavor spread across teams, geographies, budgets, and project phases. If an organization does not have the right set of experts at the right time, then this is a perfect recipe for disaster.

Furthermore, there is a lack of supply of specialized tools that can help developers to reason about dependencies, parallelism and determinism, especially at a system level.
Reasoning about concurrency is an extremely difficult mental challenge using methodologies from less complex systems. Developers are required to anticipate the effects of factors they do not control or know while keeping in their heads several hundreds or thousands of possibilities that the system state may take.

Avoiding concurrency bugs at design time or detecting them during verification requires tools for classifying, understanding, and many times generating abstract models and simulating hardware events as well as software actions that have an impact on the system’s determinism. This includes SW synchronization primitives like mutexes or the OS scheduler.

Utilizing traditional SW debugging methods like source debuggers or OS kernel tracers to understand dependencies and interactions has limits because of the intrusive nature. There are very few specialized products available out there to support developers in this process.

Some interesting technologies can assist during the design and debugging phases such as simulation on so-called Virtual Platforms which are SW models of HW systems.
Analyzing system-level HW/SW interactions on a truly non-intrusive environment can prove helpful in inspecting and manipulating erroneous system states. However, it comes at the cost of customizing, rolling out and maintaining the virtual platforms as well as the debugging scripts and additional plugins.

Another example is the subset of SW static source code analyzers at compile time which provide support for mainstream parallel SW environments like POSIX threads such as Clang/LLVM Thread Safety Analysis extensions. This supports the software developer in avoiding and detecting bugs although static analysis comes with its limitations such as: lack of precise resolution of all dependencies; impossibility to dereference all pointer accesses in languages like C or C++; lack of correlation to other system-level events as it only gives you a source code-centric view. So indeed, static analysis is helpful, but not a silver bullet.

Silexica has spent many years developing the right combination of technologies to help bridge the gap in taming concurrency. Our SLX programming tools are powered by a source code analysis engine based on Clang/LLVM that is used to correlate static information and dynamic data coming from execution traces. It helps to unveil static and dynamic function call and data access dependencies, access frequencies, call counts and many others. It also understands threading and synchronization semantics and computes their possible effects on the application flow.

With SLX, developers can identify inter-thread dependencies like creation/join relationships, common functions among threads, use of synchronization functions, critical sections, and shared variables and access protection.

Finally, to account for hardware-centric properties, such as the number of available processors within the microarchitecture, SLX also allows abstract behavior simulation of your application’s synchronization and communication patterns. This allows to formulate and answer “what-if” questions like “how will scheduling of the existing threads happen?” or “when will threads wait on other threads? Will there be any conflicts or bottlenecks on platform resources?”

This is the Silexica approach to taming concurrency with SLX. To avoid headaches and shorten efforts at integration, we’d encourage you to consider your current methodology and address concurrency issues as early as possible.

Luis Gabriel Murillo is VP Engineering at Silexica. He is committed to finding solutions for complicated engineering and computing problems and developing tools for multicore analysis, simulation, compilation, debugging and optimization. He leads the engineering team at Silexica and previously worked as a full-time researcher at the Institute for Communication Technologies and Embedded Systems (ICE) at RWTH Aachen University (Germany). He holds hold a MSc in Embedded Systems Design from the University of Lugano (Switzerland).

Silexica provides software development solutions that enable technology companies to take intelligent products such as autonomous cars from concept to deployment. The SLX programming tools help developers implement software to run efficiently on embedded supercomputers by offering deep understanding of how software behaves on the system.