Lab 5 - Adding a PL Peripheral

This lab involved adding a peripheral to the PL (Block RAM) and connecting it to the PS via the AXI Interconnect.

Picking up from where we left off, we add the AXI BRAM Controller IP:

After making a couple of changes to the IP (bus width etc), run Block Automation, which will automatically add a Block Memory Generator

The PS doesn't have a AXI Master Port, so edit the PS7: Enable X_AXI_GP0 and enable FCLK_CLK0 (50Mhz). Run Connection Automation once more:

Vivado automatically adds the AXI Interconnect Block, the PS Reset and the Designer Assistant makes connections between the BRAM Controller, AXI Interconnect & PS7. It also wires up the Clock & Reset.

The Address Editor tab shows us the address to which the BRAM Controller has been mapped.

Here's what the 'high level' schematic looks like. The BRAM Generator is 2nd from the left, followed by the AXI BRAM Controller, AXI Interconnect & Zynq7.

Since this is an implemented design, Vivado lets you look at what's in each of those block. The BRAM Generator has a couple of Flip-Flops & LUTs which eventually connect to a RAMB36E1, which is the primitive for a "36K-bit Configurable Synchronous Block RAM", or Block RAM. The output width is 32 bits, and since we had set the width of the BRAM Generator to 64 bits, 2 of these are connected in parallel

I tried tracing the path of the databus from the BRAM to the AXI Interface, which involved opening up the lower levels of components, which displays the actual primitives that the design is mapped to in hardware. This also exposes many internal datapaths, and after expanding the cone a couple of times, Vivado was already displaying over 10000 Nets. For reference, the image on the right is a zoomed in version of the right of the highlighted section on the left. Thanks to Block Automation, we do not need wire all this up manually!

This is what the implemented design looks like:

We don't have much of a design in the PL (technically, only BRAM), but BRAM Controller & Interconnect use up some of the programmable logic.

HW Chapter 7 video: Zynq PS DMA Controller

Now that the BRAM in the PL is connected to PS, its time to consider how it'll be used. Since it's been mapped to a memory address, the simplest way would be to use pointers to copy data to/from the memory address to an array. However, this isn't the best solution when it comes to performance, since the data would need to go from the PL to the Central Interconnect via the Slace GP port, then to the On-Chip Memory and L2 Cache before making it to the DRAM. The CPU processes each transfer, so it gets held up as well. However, if you make use of DMA, not only is the CPU free to continue executing, but the data path is shorter since it bypasses the cache.

The DMA controller itself is complex: transfers are controlled by the DMA instruction execution engine which has its own instruction set. It supports upto 8 channels, each of which has its own thread and uses round robin arbitration to ensure that all channels have equal priority.