Virtual prototyping methodology to boot Linux on the ARM Cortex A15

SoC development teams worldwide have begun a steady move to a virtual prototype methodology for better accuracy and to accelerate the design process of all kinds of applications. For those of you who aren’t familiar with using a virtual prototype, let’s start with a definition, then take a look at how an engineer recently used virtual prototyping to boot Linux on the ARM® Cortex™-A15.

Virtual prototypes are fast, functional software models of a system that can execute production code. With benefits ranging from software development to enabling architectural exploration and early functional verification using abstract models, their rising popularity is easy to understand.

Almost every virtual prototype deployment though suffers from a similar problem: The virtual prototype either runs fast while sacrificing cycle accuracy or it is cycle accurate but lacks the speed to develop software.

Some virtual prototypes attempt to solve this problem by sacrificing a bit of speed and accuracy to produce a “best of both worlds” system that claims to have the best attributes of both with none of the downsides. In practice however, this pleases no one because it’s too slow for the software team and not accurate enough for use by architects and firmware engineers. Fortunately, there’s a way to create a single virtual prototype that is both fast and accurate.

I recently worked with an engineer to help him boot Linux on a virtual prototype containing an ARM Cortex-A15. In this case, he was developing a mobile application processor but the same steps apply to almost all complex SoC designs.

In order to get a true measure of the performance of the SoC, the engineer needed to run benchmarks that ran on top of an operating system. Benchmarks included Dhrystone, CoreMark and tiobench, a multi-threaded I/O benchmark used to measure file system performance, on top of Linux. Running benchmarks served two primary purposes. Obviously, results of the benchmark helped determine the relative performance of the device under test (DUT) but also do an effective job of generating large amounts of representative system traffic to stress the system and identify optimization opportunities.

Each benchmark required a significant number of simulation cycles to complete in addition to the huge number of cycles required to simply boot the OS. Because of this large number of required execution cycles, this type of use case is not typically considered with traditional cycle accurate prototypes. Instead, engineers have opted for cycle-approximate models that can lead to inaccurate and un-optimized SoC designs. Or, more often, they have skipped this optimization step entirely during the design phase and waited to run these benchmarks in prototypes when it was too late to make changes based on the results.