Forcing Rare Bugs to Appear – an Interview with Tingting Yu

By Jakob Engblom

One use of Simics that has always been fascinating is to use the power you have over the target to somehow force error conditions in software. The hardware state control inherent in a tool like Simics should be useful to knock a system off of the expected common path and into code less tested, and to reveal bugs that normally only appear in particular rare circumstances. Recently, there was a paper published that discussed exactly such an application of Simics. In this blog post, I will be interviewing Tingting Yu from the University of Nebraska at Lincoln (UNL) about her paper.

TY: ESQuaReD is the software engineering group at the University of Nebraska, Lincoln. The group is part of the Computer Science and Engineering Department at UNL. Currently, there are 30 graduate students and six faculty members in the group. In a recent ranking of International Software Engineering Scholars three of our faculty members – Gregg Rothermel, Matt Dwyer, and Sebastian Elbaum were ranked in the top 50 in the world.

JE: What is your topic of research?

TY: My primary research area is software testing – designing testing methodologies to improve dependability of complex software such as embedded systems and concurrent programs.

We have developed testing adequacy criteria to test interaction faults between system layers (application, OS, and HAL/BSP) in embedded systems. We have proposed a family of property based test oracles to detect faults that are not detectable by output-based test oracles. In order to better enhance observability and controllability when testing these systems, we are developing a testing framework called SimTester by utilizing readily available features in virtual machines (VM) to generate property-based test oracles to detect "hard-to-observe" faults, such as faults caused by hardware interrupts, multithreading, and memory access. In addition, SimTester also provides fine-grained controllability to deal with non-deterministic problems (e.g., control hardware events to increase the chance of exposing faults).

JE: So, the purpose of the test oracles is to detect issues in the system that do not necessarily manifest themselves as output errors, is that right?

TY: In the testing literature, a test oracle is the device by which engineers determine whether a test case has elicited a failure in a system. Test oracles can detect issues for both program outputs and internal states.

The ones we are using focus on detecting issues in the internal states of the system that may not manifest themselves at outputs. For example, when data races occur, the output does not necessarily generate errors. One scenario could be a thread wins the race and completely overwrites data written by the other thread, and the data written by the two threads is the same. However, another interleaving may corrupt the data and propagates the error to the output. Using internal oracles through monitoring data access on VM can help test engineers detect the fault more effectively without waiting for another specific interleaving or input data.

JE: How did you use Simics?

TY: We use Simics to detect concurrency faults that occur due to interactions between software and hardware. We achieve the level of observability and controllability needed to reveal such faults by utilizing the abilities of Simics to interrupt execution without affecting the states of the virtualized system, to monitor function calls, variable values and system states, and to manipulate hardware to force events such as interrupts and traps.

As such, SimTester is able to stop the system execution at a point of interest and force a traditionally non-deterministic event to occur. SimTester then monitors the effects of the event on the system and determines whether there are any anomalies. For example, to detect data races between applications and interrupt handlers, we set memory breakpoints provided by Simics to monitor shared variable access. To increase the chance of exposing data races, we directly change hardware interrupt pin status to force a specific interrupt to occur when a shared variable is accessed in an application.

JE: That is very cool indeed. So you basically use Simics to provoke bugs that would be intermittent and extremely rare on actual hardware. And just as importantly, you can detect when the bugs occur, which is often a non-trivial exercise.

TY: Yes, it works like this. There are four major components in the framework in addition to Simics itself, as shown below. Simics provides APIs that can be accessed via Python scripts; thus all components except the test oracles are Python scripts.

The configuration script specifies the points of interest such as locations at which to set memory breakpoints, addresses of variables need to be monitored, and machine instructions that need to be monitored (e.g., interrupt return instructions iret).

The execution controller specifies certain hardware events to be invoked at particular points in program executions. It can also inject data into device ports and forces an interrupt handler to take different paths. Before forcing an interrupt, the execution controller checks hardware to see if this interrupt is possible to be issued to avoid aberrant interrupt.

The execution observer monitors and generates information that can either be recorded into a file for offline analyses or fed directly into the test oracles for online analyses. In the testing literature, a test oracle is the device by which engineers determine whether a test case has elicited a failure in a system. Any anomalies are reported in the result file.

JE: So you let the simulator use its control over the target to fire of asynchronous hardware events like interrupts at points in time which are likely to cause issues, is that correct?

TY: Simics provides rich APIs that can let users directly manipulate hardware states to force interrupts to occur at the points of interest without instrumentation or adding additional tool support.

JE: How do you setup a test case?

TY: We generate test cases based on code-coverage-based test adequacy criteria. Different coverage criteria is used for different types of faults regarding points of interest. For example, to generate test cases relevant to race conditions, we first statically identify shared variables between the applications and interrupt handlers. We then manually generate a set of test cases that cover the feasible shared variables.

To generate tests relevant to deadlocks, we generate test cases to cover the lock acquire operations. This allows us to detect faults in every corner of programs. The test case generation process can also be automatic using techniques such as dynamic symbolic execution.

JE: What types of bugs do you think this approach works the best for?

TY: Our testing framework is designed to detect all kinds of faults for both sequential and concurrent programs. However, besides the faults that are detectable by using existing runtime checking tools (e.g., valgrind, pacer) running on user application level, our approach targets the type of faults that are more OS and hardware dependent, i.e., faults that are hard to observe and hard to control using code instrumentation.

Wind River Blog Network

The Wind River Blog Network is made up of a variety of voices: executives, technologists and industry enthusiasts. We hope to foster conversations and encourage the sharing of insights regarding the evolving landscape of intelligent, connected systems with our ecosystem of customers, partners and colleagues.