As mentioned before, buffers can be very power-hungry too. To get
the most benefit out of signal gating, a signal with SSA should be
gated before it propagates into buffers. When buffers are inserted
along a wire to reduce interconnect delay, gating at the most
upstream location reduces SSA in both the wire and the buffers.
When the wire is driven by an output buffer, gating should be
placed before the buffer. For a fully dedicated output network,
instead of using one large output buffer to drive the whole output
network, it is better to use a smaller buffer for each dedicated
interconnect and its receiving DPU while gating the signal with
SSA before each buffer. Fig. 11 shows
the two different output buffers for a dedicated output network
for a DPU output sending data to two receivers (Load1 and Load2).
Fig. 11(b) also shows the gating
locations and control signals (en1 and en2). We call such output
buffers split output buffers. For the shared output
network implemented in the trunk-branches style, there are shared
and dedicated parts in the output network. To maximize the benefit
of SSA gating, instead of using a single large output buffer, we
need to use split buffers, , a buffer for the shared part
and a dedicated buffer for each dedicated part. Then we can gate
SSA before the dedicated buffer. Fig. 12
shows the two different output buffers for a trunk-branches output
network. It only shows two receivers (Load1 and Load2).
Fig. 12(b) also shows the gating locations
and control signals (en1 and en2). In both the dedicated and
shared output network cases, split buffers consume no more power
or area than the single large buffer. Meanwhile, split buffers
facilitate SSA gating.