Spiral (www.spiral.net) is a program and hardware design generation system for linear transforms such as the discrete Fourier transform, discrete cosine transforms, filters, and others. We are currently extending Spiral beyond its original problem domain, using coding algorithms (Viterbi decoding and JPEG 2000 encoding) and image formation synthetic aperture radar, SAR) as examples. For a user-selected problem specification, Spiral autonomously generates different algorithms, represented in a declarative form as mathematical formulas, and their implementations to find the best match to the given target platform. Besides the search, Spiral performs deterministic optimizations on the formula level, effectively restructuring the code in ways unpractical at the code or design level. Spiral generates specialized single-size implementations or adaptive general-size autotuning libraries, and utilizes special instructions and multiple processor cores. The implementation generated by Spiral rival the performance of expertly hand-tuned libraries.

In this talk, we give a short overview on Spiral. We explain then how Spiral generates efficient programs for parallel platforms including vector architectures, shared and distributed memory platforms, and GPUs; as well as hardware designs (Verilog) and automatically partitioned software/hardware implementations. We overview how Spiral targets the Cell BE and PowerXCell 8i, the Blue Gene/P PPC450d processors, as well as Intel's upcoming Larrabee GPU and AVX vector instruction set. As all optimizations in Spiral, parallelization and partitioning are performed on a high abstraction level of algorithm representation, using rewriting systems.