The case for integrating FPGA fabrics with CPU architectures

September 07, 2017, anysilicon

As this happens, designers are debating how they can approach designs in such a way that they’re not relying on packing yet-more transistors onto a chip to achieve speed increases. One of the biggest innovations in this industry is going to come from a fundamental reapplication of a technology that has been known and understood for some time; enter the humble FPGA.

FPGAs started life as discrete devices in 1984. At this time Xilinx and Actel started to introduce products primarily used in low-volume industrial applications and prototyping as useful ‘band aid’ to patch holes in system logic. Altera Lucent and Agere drove FPGAs into networking and telco applications. Subsequent process shrinkage, reduction in mask costs, and the integration of SRAM blocks, large MACs, sophisticated configurable I/O and banks of SerDes marked a growth in FPGAs from 1995. Over the last 10 years FPGAs have continued to proliferate and prices have fallen to the point that FPGAs are adding significant value even to high-volume applications in functions previously associated only with DSPs, GPUs, and MCUs.

This idea has historically met with resistance. SoC developers were initially concerned about size, speed and cost but this just isnt the case any more. FPGAs have made order-of-magnitude advancements in every term. Nor is it the case that CPUs have advanced to the point where they can, reasonably, take on the required processing loads.

The basics

Every modern CPU you’ll ever encounter will essentially employ the load-store/modified Harvard architecture, wherein ‘instructions’ and ‘data’ are stored seperately and transmitted along differing signal pathways. Instruction sets will be communicated on the control plane. This describes how data will be acted upon, as well as administering ‘housekeeping’ for the overall system.

The constant need to continually load (often highly complex) instructions and store the resulting data, the need to keep a number of different hard-wired fabrics ‘on stand by’ to act on data within the same chip, and the need to continually switch context (every 100 cycles or so) to carry out different tasks, makes the humble CPU relatively inefficient at handling complex yet largely consistent data-plane operations.

An FPGA is an array of logic blocks connected by reconfigurable routing channels, that can be reconnected in such a way as to perform very specific functions. Both FPGAs and CPUs, of course, process instructions and data separately using memory and logic. FPGAs use memory to configure lookup tables, multiplexers, partially populated interconnect matrices, and a number of other elements. However, there is a major difference. A CPU is optimised for rapid context switching. It loads instructions and data from registers and memories, and then, within a few cycles, it will load a whole new set of instructions and data.

The process of reconfiguring an FPGA’s functionality, on the other hand, is a relatively slow and resource-intensive process, requiring the movement of new configurations into the configuration RAM, and is therefore impractical to do too often. However, once configured, an FPGA can emulate digital logic at speeds analogous to ‘hard wired’ circuits. Therefore, just as the CPU excels at performing varied tasks, the FPGA excels at performing repetitive (and particularly highly parallelised) tasks that repeat for thousands of cycles and are only occasionally redefined.

Industry activity signals – CPU and FPGA fabrics is coming

The industry has shown considerable faith that these two device types can drive value through closer integration: Intel’s $16.7Bn acquisition of Altera is testimony to this fact. Intel is hoping to develop modules that use both Altera’s FPGAs and Intel’s CPUs to accelerate the datacentre. Given the drive towards greater performance, lower costs, and better power usage, it is inevitable that these two structures will now start to move into the same device, with FPGA fabric integrated onto CPUs as IP blocks.

Achronix hopes to be a major catalyst in this area, having introduced embedded FPGA (eFPGA) IP from its earlier Speedster family of standalone FPGAs. The Speedster family was already known in the industry for its high performance, and its sophisticated routing architecture. The resulting eFPGA IP, Speedcore, is an able demonstration of many of the potential advantages of integrating FPGA fabric into CPU/SoC devices.

For one, resulting SoCs with eFPGA fabric are capable of higher performance than possible with a separate FPGA chip. This is partly due the the higher bandwidth that can be made available on-chip. What’s more, with no need for signals to go through SerDes and protocol encoding like PCIe, latencies are significantly (10x) lower. eFPGA elements can also be architected in such a way that they are cache coherent and integration of FPGA fabrics have also shown significantly reduced power use. Power saving is also realised at the wider system level due to the ability to remove supporting components such as cooling, passives and clock generators.

Secondly, in the implementation of eFPGA, an important new dimension is possible. Unlike discrete FPGAs which come in very fixed ranges of size and performance, the border between logic gates and memory in an eFPGA IP block can be changed dynamically within the design tools in an almost unrestricted fashion. This gives the designer the ability to get just the right amount of FPGA fabric required to provide the acceleration of the appropriate functions. This visibility ensures that the right CPU core to FPGA mix is achieved within the SoC, and the FPGA fabric is only as big as it needs to be to optimise silicon area, cost and power consumption.

The activity levels we’re seeing tell me we’re at a point where FPGA fabrics can be integrated into everyday CPU/SoC designs usefully, affordably, and practically, all without adding impractical amounts of time or complexity to the design process. This can only be facilitated of course by extensive design tools which are critical in developing embedded hardware of this complexity. It is, as far as I’m concerned inevitable that the industry will head in this direction. The really exciting questions are; what unexpected applications will be found for this general architectural tactic? And how the eFPGA industry will respond in order to satisfy the demand.

Newsletter

Karl Stevens

In the last paragraph “This can only be facilitated of course by extensive design tools which are critical in developing embedded hardware of this complexity.”
Are these tools under development? Otherwise this looks like a solution looking for a problem.
Also how is this different from FPGA with embedded CPU which both Xilinx and Intel offer?