Convolutional Accelerator for Convolutional Neural Networks (CNN)

NeuralCAx16 combines 256 multipliers with crafted data path design. It is flexible for any size of filter and channel configurations. Using the latest technology, NeuralCAx16 can run over 1 GHz and outperform many GPU/CPU CNN solutions.
A typical application of NeuralCAx16 is presented in Figure 1. The host processor can be a RISC engine, a microprocessor, or any in-house developed ASIC core. Its functions are: handling the memory data, performing other layers of operations , and interfacing to the real word.

Features

Single clock synchronous design

Straightforward interfaces for SoC applications

8/16 bits fixed point and 16 bit floating point operations

Fully synthesizable technology independent of verilog RTL code

High performance

Programmable for different channel and filter sizes

Benefits

Convolutional layer typically consumes more than 95% of computation power while CNN is in operation. If such burden is offloaded, a general processor, such as a RISC, can handle the remaining operations. NeuralCAx16 enables chip developers to build their own AI/Deep leaning processors or specified ASIC chips in a matter of months.