You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!

Accelerating Edge Inference with Flex Logix’s InferX X1

For a long time, memories were the primary technology driver for process development. If you built memories, you got access to cutting-edge process information. If you built other products, this could give you a competitive edge. In many cases, FPGAs are replacing memories as the driver for advanced processes. The technology access benefits still apply and at least one company, Flex Logix, is reaping those benefits.

An eye-popping overview of the benchmark results was presented at the Spring Linley Processor Conference on the same day. This popular conference was conducted as a virtual event. I got a chance to attend many of the sessions and I can say that The Linley Group did a great job capturing their live event in a virtual setting, delivering both high-quality presentations and providing informal access to the presenters. Expect to see more events like this.

The presentation was given by Vinay Mehta, AI inference technical marketing manager at Flex Logix. Prior to joining Flex Logix, Vinay spent two years at Lyft designing next generation hardware for Lyft’s self-driving systems. His activities included demonstration of quantization and hardware acceleration of neural networks and evaluation of edge and data center inference accelerator hardware and software. Vinay is a very credible speaker on AI topics.

The InferX X1 is completing final design checks and will tape out soon (TSMC 16FFC). It contains 4,000 MACs interconnected with Flex Logix’s EFLX eFPGA technology. This flexible interconnect helps the product achieve high utilization. Total power is 13.5 watts max, with typical power consumption substantially lower. Chip samples and a PCIe evaluation card are expected in Q3 2020. The part has flexibility built-in to support low-latency operation for both complex and simpler models at the edge.

Vinay covered many aspects of customer benchmarks for the InferX X1. To begin with, power and size stack up as shown in the figure below. The Flex Logix part appears to be lower power and less expensive (thanks to the small die size).

Regarding performance benchmarking, Vinay spent some time reviewing the various benchmarks (e.g., MobileNet, ResNet-50, Inception v4, YOLOv3). He also explained that many benchmarks assume perfectly ordered data, which often is not the case in real-world workloads. Putting this all together to examine benchmark performance for latency-sensitive edge applications yields the figure below. Note these results focus on latency without regard to power and cost.

Vinay pointed out that the view above isn’t holistic in the sense that customers will be interested in the combination of performance, power and cost. Looking at the benchmark data through the lens of throughput relative to die size, which is a proxy for cost, you get the figure below. The InferX X1 has a clear advantage thanks to its small size and efficient utilization of resources.

Vinay then spent some time explaining how various convolutional neural network (CNN) algorithms are mapped to the InferX X1 architecture. The ability to “re-wire” the part based on the memory access patterns of the particular convolutional kernel or series of convolutional kernels being implemented is a key reason for the results portrayed in the figure above. Flex Logix’s embedded FPGA technology provides this critical level of differentiation, as it allows for more complicated operations (such as 3D convolutions) to map efficiently to its unique 1D systolic architecture.

Vinay’s talk covered many more aspects of real-time inference requirements at the edge. There was also a very useful Q&A session. If you weren’t able to attend his presentation at the Linley Conference, there is a replay available. I highly recommend you catch that. You can register to access the replay here. The presentation is on day four, April 9, 2020 at 11:10 AM.