Supporting CPUs Plus FPGAs (Part 3)

While it has been possible to pair a CPU and FPGA for quite some time, two things have changed recently. First, the industry has reduced the latency of the connection between them and second, we now appear to have the killer app for this combination. Semiconductor Engineering sat down to discuss these changes and the state of the tool chain to support this combination, with Kent Orthner, system architect for Achronix, Frank Schirrmeister, senior group director for product management in the System & Verification Group at Cadence, Ellie Burns, marketing director for HLS and low power and Gordon Allan, Questa product manager, both from Mentor Graphics. Part one can be found here; part two is here. What follows are excerpts from the conversation.

Schirrmeister: For an SoC solution, I can see a higher-level language for that intuitively working at the next higher level of software plus programmable hardware. It seems as if it becomes application specific. People ask about high frequency trading and, yes, we can use an FPGA-based prototype and you get the right latency. But how are the algorithms going to be split between the CPU and FPGA?

Allan: Partitioning opens up a problem, which is security. It doesn’t matter if it is trading or media processing, you want it to be secure. Whenever you have a new partition with configurable logic, then you introduce new risks and the device could be hijacked and behave as it should except for some activity that it is performing that is not desired.

Orthner: By putting the FPGA on the same die, don’t you finish up with much stronger security? Arm has its entire TrustZone system where they have an MMU that controls memory access and they understand about each master in the system. Is it a trusted component or not? By having the CPU and the FPGA on the trusted interconnect you can control who has access and it becomes quite secure. When you go over something like PCI Express, then you have to decide at design time, ‘Is the secure zone going to extend to the FPGA or not?” If it is in the secure zone, then it needs access to all of the security functions. Then you have this large hole where, if you do get a bitstream into the FPGA, that isn’t trusted. Then it can get access to the main memory.

Allan: It is simply a toolchain issue, especially in industries driving security on chip, such as the media industry. In order to get certified for HDMI or copy protection, you need to demonstrate in your verification flow that you have done the security check. You have drawn the dotted line between the secure zone and the downloadable zone. If we draw the line on-chip or off-chip, it still remains the same. That is why we see increased usage of the Secure Check flow to demonstrate to the certifying authorities or to your customers that you have verified for security. Now being downloadable over the Internet into an IoT device seems to add a new level of insecurity, but if the foundation is there…

Orthner: It is very similar today to when you download a new app onto your phone. The OS runs it behind an MMU that gives it a view of memory space, so it cannot see the rest of the system.

Allan: Like a virtual machine.

Orthner: Exactly. I imagine a day—hopefully not too far away—once embedded FPGAs have taken off, when you can download a new game onto your phone and the game includes an FPGA accelerator that is just part of what you are downloading. It programs the FPGA attached to the CPU and off you go. To get there we have to deal with security first.

SE: Does that make the FPGA a resource controlled by the OS that will now get to decide which apps can have access to the FPGA resources at any specific time?

Burns: And then how do you write these accelerators. What does that look like in the future? Do you have a library of accelerators?

Schirrmeister: You may remember Critical Blue. They would analyze the software and figures out where the bottlenecks are and provide guidance for you to create an application specific accelerator. That was used to a certain extent but they morphed the technology to do multicore optimization. That type of automation may be coming back.

Burns: It may come back around. We are getting to a point where the hardware architectures between the FPGA, there is a big change that is happening as these architectures change.

Allan: It makes a lot more sense.

Burns: Now you have changed the latency, it makes a whole lot more sense. There weren’t many things you could accelerate before because of the latency. So those of those approaches may have been before their time.

Orthner: I was looking at this paper yesterday about using FPGA technology for genome sequencing. The things that need to be accelerated are relatively small grained. When they accelerated this on an FPGA, it ran something like 400X slower. They realized that even if they could make the FPGA run infinitely fast, just the communications overhead with PCI express meant that there was no way to get the benefit of acceleration. They worked around it by batching a bunch of jobs and sending it as one giant message dump to the FPGA. They ended up getting a couple of hundred times improvement over a regular CPU, but they have to really deal with the high latency and low bandwidth limitations.

Burns: When I look at the new embedded solutions, while Xilinx and Intel battle it out for software acceleration in the cloud, they will kill each other, but now we have this new technology with low latency.

Schirrmeister: Isn’t there something in-between? The FPGA can be programmed by everyone but I remember doing a project back in the 90s where a customer was looking at video conferencing which required video encoding and decoding. Intel came out with the MMX instruction set extensions and it made video much easier so they no longer need an exotic solution. So the question becomes, can you do it all in hardware or all in software, you can do various mixes of hardware and software and at some point you go back to making your processor specific. So there is a continuum of ways in which you can accelerate a task. In the embedded world you also have to look at the Tensilicas and the ARCs of the world, where instead of having a dedicated FPGA you have an instruction set extension. The interesting question remains: How do you program it? In 2006 to 2008, we had a myriad of languages for programming special-purpose hardware. Five people worldwide knew these languages and that killed them. The same was true of specific processors such as the IBM cell. Just because you have found an abstraction does not mean that everyone can use that architecture.

Orthner: The same could be said about GPUs. Look at the number of people who can write CPU code versus GPU code.

Allan: It does matter in some ways, and you still have to be able to verify the end product. If we are talking about IoT, field upgradeable devices – how many different scenarios do you have to verify with the interactions between a real-world device and synchronous updates happening.

Schirrmeister: Field upgradeability scares me because of the security intrusion possibilities. While it may be crucial to many aspects of IoT and some application domains, it is also a security issue.

Burns: The same with driverless cars. A co-worker got a new car, and brand new features show up after a software upgrade.

Allan: What does exist in the toolchain today that may become irrelevant? Every aspect of traditional ASIC design applies to this new world and the new architectures, whether we program in Cuda or HLS or RTL, we still need to do verification. If the value of the platform is time to market or field upgradeability, then verification has to keep pace with that and to embrace new verification technologies and platforms and engines so we are not the thing that is holding the industry back.

Burns: All of these industries that we are talking about are changing at a phenomenal rate.

Allan: 2G, 3G were not optimized for power. By 5G it all seems to be coming together. The smartchips will support all of them and we can reset the economics of the ecosystem. It will depend on a small subset of platforms and a few flows that are fit for purpose. I can see parts of the radio algorithm going onto an FPGA and that could be upgraded over time. I can see IoT devices having small pockets of FPGA architecture for different functions including part of the radio. Then you can upgrade them so they don’t have to be replaced every 5 years. Or to upgrade the keys so that when things go wrong, let’s change the locks on the house, rather than rebuild the whole house. These devices need to last a generation rather than being a disposable piece of plastic.