Use DCM and MMCM for Xilinx FPGA Clock Deskew

Clock generation and distribution is a big concern in complex FPGA design. Improper design without consideration of clock phase alignment can cause design unable to close timing, or barely meet timing with unnecessary consumption of power and gate count, or extra design to accommodate async clocks. Here I am using Xilinx FPGA as an example to talk about my understanding of how to use DCM to achieve clock de-skew. DCM has been replaced by MMCM in latest Xilinx FPGA. The idea is still the same. So allow me to use DCM at first to my convenience.

Fig1. Is a typical usage of DCM with internal feedback. Why use DCM and what is the issue here? Well, first internal logics need clock (clk C) to be balanced at all FF clk inputs. Normally we want to use global clock distribution network to achieve it. (For simplicity let me ignore BUFR and BUFH) So BUFG is instantiated to use global clk network. If the fpga clock domain doesn’t need to be synchronized with fpga input clock (clk A), you still don’t need DCM. Clk A just drives BUFG and fpga internal logic still works. But if fpga logic needs to exchange data with external logic which is in clk A domain, DCM is needed. This is because BUFG delay is normally large so clk A and clk C have large skew. With Fig1 scheme, DCM adjusts internal delay to make sure two clock inputs to DCM, clk B and clk C, are balanced (toggling at the same time). Assuming IBUFG delay is small, clk A and clk C are also balanced.

In Fig2, the top DCM balances clk A and clk B using internal feedback and the bottom DCM balances clk A and clk C using external feedback. Therefore the component generating and using clk A, this fpga itself (using clk B for its internal logics), and the next component using clk C are all in the same clk domain (balanced). Note the board routing delays from point C to point D of the next component and from point C to point E of this fpga input need to be the same and SMALL. Small is important otherwise clk A and clk C are not balanced since DCM makes sure clk A and clk E are balanced while there is big delay between clk C and clk E.

hollis, I think that’s the case. IBUF and OBUF insertion delay is small and therefore variation is small. BUFG reflects clock tree insertion delay. So if clock drives a large number of FFs, clock tree delay can be large too.