Advanced eXensible Interface (AXI)Reference Guide

Xilinx has adopted the Advanced eXtensible Interface or AXI protocol for Intellectual Protocol (IP) cores generated by their tools. This tech note provides a guide to the basic concepts on the AXI interface and some of the Xilinx IP to support it.

The AXI protocol was created by ARM and is backward compatible with existing AHB and APB interfaces. Since the FPGA vendors have standardized on ARM CPUs for their SOCs like the Xilinx Zynq and Ultrascale SOC, Intel Cyclone, Arria, and Stratix SOCs, it makes sense that they would use the AXI protocol. Xilinx has widely adopted this interface, whereas Intel supports both the AXI and the Avalon interfaces.

Key Features

The key features of the AXI protocol are:

Provides separate address/control and data phases

Supports unaligned data transfers, using byte strobes

Supports burst transactions with only the start address issued

Has separate read and write data channels

Supports issuing multiple outstanding addresses

Support for out-of-order transaction completion

Permits easy addition of register stages to provide timing closure

The AXI protocol can provide many enhanced features, but the intent of this article is not to go into detail on the fancy features that most designers don’t need, but to concentrate on the normal interfaces that will suffice for most designs.

The current version of AXI is AXI4 which is widely supported by the FPGA vendors.
But it should be noted that the Zynq-7000 actually supports AXI3 which is a subset of AXI4. The differences are minor, and the Xilinx tools automatically insert the necessary adaptation
logic to translate between AXI3 and AXI4.

Xilinx Supports Three Interfaces

Xilinx supports three subsets of the AXI4 protocol:

AXI4 - This is the full AXI4 protocol which exposes all of the features of the interface.

Master / Slave Transactions

Operations on the AXI4 involve masters and slaves. Operations are initiated by a master and with the slave responding to the operation. Transfers on AXI can be initiated by either the master or the slave as described in the next section on channels.

Channels

AXI communications take place in channels. Each channel is essentially a separate bus, but these channels are combined to facilitate what would normally be thought of as a single bus. For example, reads take place over the read address channel, sourced by the master, and the read data channel, sourced by the slave. Writes use three channels: the write address, write data, and write response channels. Having separate channels allows for concurrency, for example issuing read requests while the previously requested data is being returned. The following figure from the ARM AXI Specification illustrates the AXI4 Read Channel:

The master requests the read by providing the address, burst length, and other attributes on the read address channel, and the slave responds by sending the requested data on the read data channel. Each channel is like an individual bus with its own strobes.

The following figure illustrates the AXI4 Write Channel:

The write address channel is essentially the same as the read address channel, validating the address, burst size, and other attributes of the cycle. Data is sent on the write data channel. Writes are considered buffered so that the master can perform writes without slave acknowledgement of the previous write. After the entire burst is complete, a single completion is sent from slave to master on the write response channel.

The five channels can be identified by the prefixes on the signal names:

AW – Write Address Channel

W – Write Data Channel

B – Write Response Channel

AR – Read Address Channel

R – Read Data Channel

For example, AWREADY and AWVALID are the ready and valid strobes for the write address channel whereas WREADY and WVALID are the ready and valid strobes for the write data channel. The following timing diagram displays the basic handshake:

Signaling

Like most synchronous interfaces today, AXI operates from a single clock. Although each channel can have a seperate clock as defined in the spec, this is also dependent upon the specific IP core. All interfaces include READY and VALID strobes which validate the transfer. Considering these signals from the perspective of the Write Data Channel, the slave issues READY when it is able to accept information (data, address, etc.) and the master issues VALID to validate information on the interface. Once the master issues VALID, it must remain asserted until at least the first the transfer occurs (READY high). A transfer takes place when both READY and VALID are high, so either side can generate wait states when desired. There is also a LAST signal which indicates the final data item on the interface.

Normally the READY strobe defaults to high, but is deasserted when the slave can't accommodate any more data or requests. The following image displays an AXI4 write on the three write channels:

In the above timing diagram, you can see the address is validated on the Write Address channel, and this is followed by the data on the Write Data channel. The slave responds with the status on the Write Response channel.

There are dependencies between the different channels, some defined in the AXI4 standard, and some defined by specific Xilinx IP modules.

Full AXI4

The timing diagram above shows the basic signals used for a write transfer, but it leaves out many of the optional signals in the full AXI4 protocol. The following is the full list of signals for three AXI4 write channels:

Write Address Channel

AWID- Write address ID. This signal is the identification tag for the write address group of signals.

AWADDR - Write address. The write address gives the address of the first transfer in a write burst transaction.

AWLEN - Burst length. The burst length gives the exact number of transfers in a burst. This information determines the number of data transfers associated with the address. This changed between AXI3 and AXI4.

AWSIZE - Burst size. This signal indicates the size of each transfer in the burst.

AWBURST - Burst type. The burst type and the size information determine how the address for each transfer within the burst is calculated.

BREADY - Response ready. This signal indicates that the master can accept a write response.

The AXI4 protocol as defined is generic in that it doesn’t specify timing, allows for a wide variety of bus widths, and provides much flexibility in the type of transfers supported. For full AXI4, Xilinx supports data bus widths of 32 through 1024 in powers of 2. Burst lengths of up to 256 are supported. There are many features that aren’t of general interest, many of which aren’t supported by the Xilinx IP. Examples of such features as locked / exclusive access, protection/cache bits, quality of service, the low power interface, etc. These can easily be tied off to static values if one desires to interface to the full AXI4 – a requirement for bursting capabilities.

One somewhat nasty thing that must be dealt with for full, bursting AXI4 is that bursts are not allowed to cross 4KB boundaries. This means that bursts will need to be broken and partial bursts with the remaining data initiated.

AXI4 Lite

AXI4 Lite is designed for low bandwidth, simple connections. A typical use of the Lite interface would be to read and write from device registers from the ARM CPUs. The biggest differences between full AXI4 and AXI4 Lite are that Lite is restricted to 32 and 64-bit wide data widths, and bursting is not allowed. Additionally, many of the more obscure features of full AXI4 aren’t available, and the ports for these features don’t exist. Note that this includes the LAST signal which not needed and not present for the read and write data channels. The following are the signals available for AXI4 Lite for the five channels:

AXI4 Streaming Protocol

The AXI4 and AXI4 Lite protocols are memory mapped in that transactions always require an address. The AXI4 Streaming protocol removes that requirement. It is a data-centric protocol for bursting large amounts of data and includes much flexibility. Streaming signals can be identified as they are prepended with a “T”.

The streaming interface supports byte enables (the TKEEP strobe) and unaligned transfers (TSTRB strobe), and the same data handshake is used to transfer data: TVALID and TREADY. There is a TLAST strobe which can be used to indicate the end of transfer or other user defined termination condition. The TDEST signal can be used to indicate routing information from the source to the destination. It can also be used in conjunction with the identification field, TID.

AXI Datamover IP

To stream data to or from the Zynq PS memory, the AXI Datamover IP is very useful. This IP provides an S2MM interface to stream data to the memory mapped domain, or an MM2S interface to stream data from the memory mapped domain. One nice feature is that it automatically detects and handles the 4KB boundary crossing. Cycles are initiated on an AXI command channel, and then the data transfers take place on dedicated AXI4 streaming interfaces. For S2MM, the data is received on a streaming interface and then written from the Datamover on an AXI4 write interface. There is also a dedicated channel for returning status from the transfer. One can quite easily construct a DMA engine using this core (which Verien has done).

The width of the data bus can be 8 to 1024 bits and with programmable burst length, up to 256 data beats. There is also an optional Bytes To Transfer (BTT) mode which is useful for streaming data without knowing a prior the burst size.

AXI4 Interconnect and Smartconnect IP

Xilinx provides the AXI4 Interconnect IP core which can be used to manage AXI4 connections. The AXI Smartconnect is a newer version of the Interconnect core, though Xilinx recommends the Interconnect for lower performance applications (AXI4 Lite applications). These cores provide crossbar connectivity, support for multiple clock domains, FIFOs, width conversion, and protocol conversion. Note that these cores are used to connect AXI to AXI. If you are simply transferring data to the Zynq PS memory subsystem, then you may not need this.

Some of the features that this IP is useful for are: converting from full AXI4 to AXI4 Lite, gluing asynchronous clock domains together, fanning in multiple AXI4 interfaces to a single interface, or fanning out one to many.

Designing with AXI

Designing for the AXI protocol can take on different forms. When using the IP Integrator in Vivado, IPI takes care of most of the work for you. For the most part, you can simply configure and connect the models, but it helps to know what's going on behind the scene for high bandwidth connections.

If designing in HDL, it's a bit easier to design to the AXI4 Lite protocol than the full AXI4. But designing to full AXI4 really isn't as bad as one would think. It's true, there are many signals to deal with, but most of these are tied off to static values. One easy way to access these static values is to write out an example project from the IP Integrator.

For verification, there are bus functional models available. Originally, Xilinx provided an AXI Bus Functional Model (BFM), but this has been replaced by the AXI Verification IP. Here are Verien, we've developed our own which we use for our designs and can customize as needed to verify the design.