Field Programmable Gate Arrays (FPGAs) have an ever-expanding impact to more
and more applications, ranging from Deep Neural Networks to High-Performance
Computing (HPC) and other uses such as customization of Instruction Set Extensions and
computation offloading in systems with tightly coupled embedded FPGAs (eFPGAs). As
applications diverge in complexity, performance, memory needs and area limitations,
there is a need for a wider variety of FPGA architectures. However, developing and
implementing new FPGA architectures remains challenging and requires a lot of time, due
to their high content in custom layout designs and the need for design software and flows
tailored for each specific architecture, leading to the production of more generic
products.
Many academic works are focusing on the automated FPGA design generation
process, in an attempt to promote customizability and reduce time-to-market. In other
approaches, researchers target only the exploration process, in which they seek for the
optimal architecture for a specific case scenario, using area and delay estimation models.
In this thesis we choose to combine the two approaches. We develop an extension
for the popular open-source tool Verilog-to-Routing (VTR) in order to export in Verilog
the representation of user-specified FPGA architectures, develop support for custom user
hard-blocks (RAMs, DSPs, FP Units), and generate Bitstreams for given benchmarks. Our
objective is to create synthesizable and technology independent RTL design code, able to
be synthesized with any standard cell library. We discover the real design properties of
an FPGA architecture using our proposed ASIC flow and retrieve real area and delay
measurements and eventually proceed with the exploration of optimal FPGA
architectures for given sets of benchmarks.
Using our VTR extension, we perform FPGA design-space exploration for a set of
HPC oriented benchmarks that are derived using Xilinx's High Level Synthesis (HLS). Our
exploration starts by identifying pareto-optimal FPGA architectures starting with the size
of Lookup Tables (LUTs) and the number of LUTs per Configurable Logic Block (CLB) and
then explore the size of routing channels and wire segments' configurations. We also
compare the optimal FPGA architectures derived when using the HPC benchmarks with
the respective architectures derived when we use the generic MCNC benchmarks.
Finally, we create TCL scripts for synthesis and back-end implementation (place
and route) which can adjust to any architectural characteristic and size and automate the
ASIC flow for new FPGA chips.