FPGA tool bottleneck stalls HPC

Santa Cruz, Calif. -- Gregg Cooke, a director in the emerging-technologies investigation team at a large investment bank in Chicago, is not your typical user of FPGA design tools. But he's representative of a new breed of customers in fields such as finance, biosciences, and oil and gas exploration, where people with no FPGA hardware experience are trying to use the devices in high-performance computing systems--and are often struggling with FPGA design tools.

FPGAs can potentially provide orders-of-magnitude speedups over conventional processors for compute-intensive algorithms, with greatly reduced power consumption. So it's no surprise that FPGA design is going far outside its traditional domain into a growing number of high-performance computing (HPC) systems, modules and boards.

The untapped market is potentially huge, but there's a problem. Current FPGA synthesis, placement and routing tools are written for hardware designers, not software programmers simply trying to accelerate an algorithm. A new generation of electronic system-level C-language compilers is attempting to bridge the gap, but there's still a lot of work involved in moving C-language algorithms onto FPGAs.

At the investment bank in Chicago, Cooke is working with a large compute grid that's running bond risk calculations. Bonds have 30-year lives, and the calculations are extremely computationally intensive. "Before we knew it, we were shoveling blades into our data center just to feed this beast," Cooke said. "We were casting about for ways to reduce heat in our data centers, and FPGA technology seemed like a good bet."

Cooke is working on pilot projects that use FPGAs to accelerate the Black-Scholes algorithm, a stochastic process frequently used in equities trading. One goal is to allow software people to port existing C-language code directly to FPGAs, without having to touch VHDL. "I want to work totally in C," Cooke said. "The problem is that C tools are just barely mature enough to support what we want to do."

Cooke has had experience with embedded hardware, and he previously worked with space lab applications. "In the bank as a whole, I only found two people who have explicit experience with FPGAs," Cooke said.

Current C-language FPGA compilers don't support the full syntax of C, Cooke said. This includes "structs," or constructs that are heavily used when C++ is brought into C, and "switch" statements. "Hardware engineers don't use these constructs, so we're placing new demands on C-to-gates compiler vendors," he said.

FPGA placement and routing is another bottleneck. Last summer, when Cooke tried placing and routing a large piece of code, his 16-Gbyte machine ran out of memory. The FPGA vendor predicted the job would take a week and consume all 16 Gbytes. "That's impossible," Cooke said. "We need to be able to make changes to code, compile and run, sometimes multiple times within a day."

Cooke has used the Impulse C compiler from Impulse Accelerated Technologies Inc. (Kirkland, Wash.). It's close to ANSI C and supports portability across devices, he said, but it could benefit from more efficient pipeline generation and smaller code size.

Pushing particlesPeter Messmer, deputy vice president for accelerator technology at Tech-X Corp. (Boulder, Colo.), a contract engineering company involved in computational physics, is running "particle push" simulations in which particles are accelerated and then evaluated for force, velocity and position. Messmer too is using the Impulse C compiler, along with FPGA boards from Pico Computing Inc. (Seattle). "We are desperate to get those particles pushed faster and right now, on an Opteron, a single particle push costs us on the order of 0.7 microseconds," he said. Messmer is not yet sure what FPGAs can do, because he hasn't yet succeeded in getting the algorithm to run, but he's hoping for a five- to tenfold speedup.

"I like the Impulse C approach, but there are stumbling blocks all the way from the beginning, starting with the way we write algorithms," Messmer said. "We're not exploiting all the parallelism that's offered by the FPGA." While the compiler can extract some instruction-level parallelism automatically, it's up to the programmer to put in the system-level parallelism, he noted. "It would be very nice if it was as simple as just running a compiler on a standard C application," Messmer said.

Robert Shuler, an engineer at NASA working on software radio, wanted to use a C compiler to put sequential parts of his signal-processing code into an FPGA, and then found that FPGA tools and conventional C-language programs use different memory models. Thus, programmers can't really reuse existing C-language code in FPGA applications. "In most cases, it amounts to starting over and reprogramming the application," he said.

A big marketThe HPC market potential is not lost on FPGA vendors. "You start out with applications like financial or oil and gas, and they could start buying thousands of FPGAs," said Steve Lass, senior director of software product marketing at Xilinx Inc. "What I see is that this will eventually expand to accelerate pretty much anything." Right now, said Lass, Xilinx is mostly seeing people who are "kicking the tires" for FPGA-based HPC, mainly in the financial area. Current tools, he said, are just about good enough for that. "But when it comes to doing something like straight ANSI C with no modifications, the tools aren't there."

Lass acknowledged that FPGA placement and routing run-times have to be a lot faster for HPC users. "The run-times are in hours," he said. "I think if we can get it down to 10 minutes, that's what these guys would like to see." Lass also noted that the placement and routing tools are written for hardware designers, with error messages no software engineer is going to understand. One solution, he said, is for electronic system- level tools to hide the details.

Similarly, one approach to FPGA programming is to construct APIs that hide the hardware details from the programmer, said Misha Burich, senior vice president of software and system engineering at Altera Corp. "Somebody would develop functions on the FPGA and provide an API to the developer of the application," he said. In this sense, "the FPGA is not really programmed by the developer of the application." As an example, Burich pointed to FPGA coprocessor provider XtremeData Inc., which couples the Impulse C compiler with a platform support package and IEEE 754 math library in an integrated solution for "financial analytics." As for place and route, Burich said Altera has a research project to determine if an FPGA-based accelerator would be a good way to speed FPGA placement and routing.

Several companies provide compilers for FPGA-based HPC applications. Celoxica Ltd. offers its DK Design Suite for C-based algorithmic entry, simulation and synthesis. The goal, said Jeff Jussel, vice president of marketing, is to make FPGAs as easy to program as general-purpose CPUs. "The perfect thing is to let them write any type of C code," Jussel said. "But the reality is that you can write unsynthesizable C code. So there are certain things we have to ask them to do."

For handling parallel algorithms, Celoxica provides its proprietary Handel-C language. And to ease the programming burden, it offers specific compiler capabilities and libraries for selected algorithms. "If it's in the finance space and we know what you're doing, we can take raw C," he said. "For oil and gas, we know what an acoustic- wave algorithm looks like, and we can take that and process it."