Table of Contents

Compiling

This will compile the original c file and legup_pcie_wrappers.c (which you can modify yourself).

Mandelbrot speedup beyond multiple accelerators

Mandelbrot should hopefully exhibit a nearly linear speed-up by increasing the number of accelerators. The number of accelerators is limited by the number of DSPs on the FPGA. Although I don't think we can use all DSPs for the majority of clock cycles, we can try to get close and not run into memory bottlenecks.

DSP usage

We can try to turn on resource constraints for DSP usage and possibly multi-pumping.

Memory bottleneck

With ~50-200 accelerators, memory access will start to become a bottleneck. Here are some ideas:

Turn on loop pipelining to decrease the number of accelerators needed, but now using multiple DSPs per accelerator

Split accelerators between both memory ports

Try LVT and/or multi-pumping (James should know this well)

The optimal will be if every two accelerators shared one dual-ported memory, though this won't be flexible enough to extend to other benchmarks, it is valid for mandelbrot with a fixed input size