QoS value as a priority indicator

The CCI-550 uses the QoS value as a priority indicator for arbitration of requests. The QoS value can be from an input to a slave interface, or it can be overwritten by a programmed value.

The CCI-550 uses
the QoS value when selecting the request to admit into the main transaction queue.
Requests with the highest QoS have the highest
priority unless an
anti-starvation mechanism is activated. The CCI-550 uses a Least Recently Granted (LRG) scheme when two or more transactions share the
highest priority. The arbiter has starvation avoidance mechanisms to prevent high
bandwidth requests from stalling lower priority requests indefinitely.

The CCI-550
propagates QoS values. This determines the service rate when downstream interconnect and
slave devices are sensitive to the QoS value. The NIC-400 Network Interconnect is
sensitive to the QoS value.

Note

Ensure that you balance the relative priorities of all
slave interfaces. For example, setting
each
one to the highest QoS value reduces the arbitration to
LRG,
and there is no advantage
in
using the QoS value.

You can override the ARQOS and AWQOS input signals on each slave interface by using a programmable
register. The value from this register is only applied if the relevant static input
signal, QOSOVERRIDE[6:0],
is HIGH. CCI-550-generated
transactions use the QoS value of the trigger transaction or the override value if the
QOSOVERRIDE signal is
set.

Note

The QOSOVERRIDE signal only applies to transactions for which the ARQOS or AWQOS signals are set to a
value of zero. Therefore, each interface can have a mixture of overridden traffic and
other traffic, with an unaffected non-zero QoS value.

High and low priority requests

You can use the QoS Threshold Register to set a QoS value threshold that classifies requests as high or low priority. A high priority request is a read or write request with an ARQOS or AWQOS value that is equal to or greater than the threshold.

In heavy congestion, high priority requests use a TT reserved slot to take a
fast path through the CCI-550.

QoS value regulation based on requested bandwidth

You can configure each CCI-550 slave interface to have a bandwidth regulator for read and write requests. The regulator enables you to modify the QoS value of read and write requests to suit the allocated bandwidth through each slave interface.

To use the regulator, each slave interface has a read and write bandwidth
allocation and a QoS value range. You can set the programmable bandwidth_allocation
value in bytes per cycle. The CCI-550 has 128-byte interfaces and bandwidth_allocation is a 4-bit value
that represents 0-15 bytes per cycle. The following table shows the bandwidth_allocation
settings, where the CCI-550
is running at 800MHz.

Table 2-7 bandwidth_allocation settings

bandwidth_allocation

Bytes per cycle

Bandwidth (GB/s)

0b0000

0

0

0b0001

1

0.8

0b0010

2

1.6

0b0011

3

2.4

0b0100

4

3.2

0b0101

5

4.0

0b0110

6

4.8

0b0111

7

5.6

0b1000

8

6.4

0b1001

9

7.2

0b1010

10

8.0

0b1011

11

8.8

0b1100

12

9.6

0b1101

13

10.4

0b1110

14

11.2

0b1111

15

12.0

When enabled on an interface, the regulator uses the maximum QoS value when
requests are issued at a rate lower than, or equal to, their allocation. If requests are
issued at a rate greater than the allocation, the regulator reduces the QoS value until
either the request bandwidth reduces or the minimum QoS value is reached. The regulator
has an accumulator that tracks the excess requested data, in bytes, and modifies the QoS
value according to a programmable value excess_bytes_per_qv. The following table shows
the possible excess_bytes_per_qv values.

Table 2-8 excess_bytes_per_qv values

Encoding

Excess, in bytes

0b000

256

0b001

512

0b010

1024

0b011

2048

0b100

4096

0b101

8192

0b110

16384

0b111

32768

The regulator has a nominal 64-byte granule size, and most transactions are
expected to be of cache line length. Therefore, transactions that are not a multiple of
64 bytes are rounded up to the nearest 64 bytes.

Example of using the bandwidth regulator

This example system uses a CCI-550 running at 800MHz to connect:

Two CPU clusters.

A display processor.

A GPU.

The example system also has the following characteristics:

Each CPU cluster requests read data at an average of 1GB/s. However, at
times it can saturate the read data channel, and therefore has a peak bandwidth
of 12.8GB/s.

The display processor requires an average of 2.8GB/s of read bandwidth and
has a 32KB read data buffer that must not underflow.

The GPU consumes an average of 6.0GB/s, but peaks at 12.8GB/s. The GPU is
more tolerant of bandwidth variations than the other processors.

The memory system can provide 16GB/s of read bandwidth.

The following table summarizes the bandwidth requirements for each
processor.

Table 2-9 Example system bandwidth allocation

Component

Average read bandwidth (GB/s)

Peak read bandwidth (GB/s)

Cluster 1

1.0

12.8

Cluster 2

1.0

12.8

Display

2.8

2.8

GPU

6.0

12.8

Total

10.8

41.2

The table demonstrates that the memory cannot provide sufficient bandwidth to
cater for the peak rates of each processor. It is therefore necessary to use QoS to
manage bandwidth allocations.

When allocating QoS values, assume that requests with higher values are
serviced ahead of requests with lower values.

To achieve the lowest latency, the CPU clusters must issue requests with the
highest priority. However, during periods of peak request rates, the CPUs might use
all the available memory bandwidth. To prevent this, the CPUs must be regulated to a
QoS value that is lower than that of the display processor.

The following table shows example QoS values for the system.

Table 2-10 Example system QoS values

Component

Maximum QoS value

Minimum QoS value

Cluster 1

14

8

Cluster 2

14

8

Display

12

12

GPU

7

7

You can set the bandwith_allocation for the CPU clusters to a value that is
higher than their average, for example 4.8GB/s. The memory controller can provide
16GB/s, leaving 6.4GB/s of memory bandwidth for the display and GPU. The display has
a read buffer of 32KB, which must not underrun. The CPUs are given a maximum QoS
value that is two levels higher than that of the display processor. Setting the QoS
regulators so that the CPU QoS value is modified at a rate of 4KB per QoS value,
each CPU is permitted 8KB of excess data. The following figure shows the effective
QoS assignments and ranges.

Figure 2-2 QoS value and display buffer underrun

To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.

If the two CPU clusters request data at a peak rate, the display can be
starved. However, the maximum excess data the clusters can request at a priority
higher than the display is 2 x 8KB, that is, 16KB. This excess is sufficiently lower
than the 32KB display buffer size, so the buffer is unlikely to underrun, provided
the available memory bandwidth does not drop below 4.8 + 4.8 + 2.8, that is,
12.4GB/s. Some contingency is built into this example, to allow for inaccuracies in
assumptions.

QoS value regulation is only applied to requests with a QoS value of 0. You
can tie-off the ARQOS
and AWQOS inputs LOW
if you want the CCI-550
to always drive the QoS values. Alternatively, you can set both these inputs LOW for
traffic you want to regulate, and HIGH for other traffic.

Regulation based on outstanding transactions

Each slave interface has a programmable mechanism for limiting the number of outstanding read and write transactions.

An Outstanding Transaction (OT) is a read request
that has not yet received its last beat of read data, or a write request that has not yet
received a response. You can use the OT regulation mechanism with QoS value mechanisms or when
the system is not sensitive to the QoS value.

There is a combined OT count for read and write transactions, and this count
includes all possible request types. Two-part DVM messages count as two outstanding
transactions, and transactions that the CCI-550 splits into 64-byte granules count as multiple transactions.

When programming the Maximum OT register, the hardware implementation sets the
value for the maximum number of OTs for a slave interface. This maximum value is the value of
the register from reset. The minimum value for the OT register is SIx_W_MIN + 2. This minimum value is the number of tracker slots that are
reserved for requests from each slave interface, to prevent deadlock. If you write a value
outside these limits, then the limited value is set and read back.

The OT limit sets a maximum bandwidth for the attached master, based on the
average response latency from downstream. You can use the following approximation to allocate
memory bandwidth resource among various masters in the system:

OT limit = maximum bandwidth * average latency / bytes per request

For example, if the average latency between arrival at the main CCI-550 tracking structures and downstream
response is 128ns, the maximum required bandwidth is 8GB/s, and requests are 64 bytes in
length, then the necessary OT limit for an ACE-Lite master assuming a negligible hit rate
is:

max OT = 8 * 128 / 64 = 16

Note

For ACE masters, the time from the response to the RACK or WACK acknowledgement must be included in the response
latency.