Time to Digital Converter Used in All Digital Pll

Master Thesis ICT Time to Digital Converter used in ALL digital PLL Master of Science Thesis In System-on-Chip Design By Chen Yao Stockholm, 08, 2011 Supervisor: Dr. Fredrik Jonsson and Dr. Jian Chen Examiner: Prof. Li-Rong Zheng Master Thesis TRITA-ICT-EX-2011:212 1 ACKNOWLEDGEMENTS I would like to thank: Professor Li-Rong Zheng for giving me the opportunity to do my master thesis project in IPACK group at KTH. Dr. Fredrik Jonsson for providing me with the interesting topic and guiding me for the overall research and plan. Dr.

Jian Chen for answering all my questions and making the completion of the project possible. Geng Yang, Liang Rong, Jue Shen, Xiao-Hong Sun in IPACK group for the discussion and valuable suggestions during the thesis work. My mother Xiu-Yun Zheng and my husband Ming-Li Cui for always supporting and encouraging me. i ABSTRACT This thesis proposes and demonstrates Time to Digital Converters (TDC) with high resolution realized in 65-nm digital CMOS. It is used as a phase detector in all digital PLL working with 5GHz DCO and 20MHz reference input for radio transmitters.

Introduction All digital phase locked loop (ADPLL) is employed as frequency synthesizer in radio frequency circuits to create a stable yet tunable local oscillator for transmitters and receivers due to its low power consumption and high integration level. It accepts some frequency reference (FREF) input signal of a very stable frequency of and then generates frequency output as commanded by frequency command word (FCW). The desired frequency of output signal is an FCW multiple of the reference frequency. For an ideal oscillator operating at all power is concentrated around , but the spectrum spreads into nearby frequencies in practical situation.

This spreading is referred as phase noise which can cause interference in adjacent bands in transmitters and reduce selectivity in receivers [1]. Fig. 1. Effect of LO phase noise in transmitter [1] For example, shown as Fig. 1, when a noiseless receiver must detect a weak desired signal at frequency in the presence of a powerful nearby transmitter generating at frequency with substantial phase noise, the desired signal will be corrupted by phase noise tails of transmitter. Thus the modern radio communication systems require strict specifications about phase noise of synthesizers. In the ADPLL, the time to digital converter (TDC) serves as the phase frequency detector (PFD) meanwhile the digitally controlled Oscillator (DCO) replaces the VCO.

The core module is DCO which deliberately avoids analog tuning voltage controls. The DCO is similar to a flip flop whose internal is analog but the analog nature does not propagate beyond the boundaries. Compared to the analog PLL, the loop filter can be implemented in a fully digital manner which will save a large amount of area and maintain low power consumption. 1 Fig. 2. Block diagram of the phase-domain ADPLL frequency synthesizer [2] Fig. 2 shows a type II ADPLL which includes two poles at zero frequency. It has better filtering capabilities of oscillator noise compared to type I ADPLL, leading to improvements in the overall phase noise performance. The ariable phase signal is determined by counting the number of rising clock transitions of the DCO oscillator clock. The reference phase signal is obtained by accumulating the Frequency Command Word (FCW) with every rising edge of the retimed Frequency Reference (FREF) clock. The sampled variable phase is subtracted from the reference phase in a synchronous arithmetic phase detector which is defined by = + ? [k] [2]. Fig. 3. Retiming of the reference clock signal (FREF) [3] 2 There are two asynchronous clock domains, FREF and CKV, and it is difficult to compare the two digital phase values physically at different time instances without facing the metastability problem.

During frequency acquisition, their edge relationship is not known, and during phase lock, the edges will exhibit rotation if the fractional FCW is nonzero [1]. Therefore, it is imperative that the digital-word phase comparison should be performed in the same clock domain. This is achieved by retiming process which is performed by oversampling the FREF clock with CKV for synchronization purpose (fig. 3). The retimed clock, CKR is used to synchronize the internal ADPLL operations. However, the retiming process generates a fractional phase error in CKV cycles which is estimated by TDC [3]. The DCO produces phase noise at high frequency, while the TDC determines the in band noise floor [4].

The noise contribution of TDC within the loop bandwidth at output of ADPLL is where denotes the delay time of a delay cell in the TDC chain, is the period of RF output and is the frequency of the reference clock [1]. The equation above indicates that a smaller leads to smaller quantization noise from TDC. As a result, the effort is devoted to achieve high resolution TDC in order to obtain low phase noise of ADPLL. Fig. 4. Operating principle of time-to-digital converter [5] Fig. 4 illustrates the principle of time-to-digital converter based on digital delay line. The start signal is delayed by delay elements and sampled by the arrival of the rising edge of stop signal.

The sampling process which can be implemented by flip-flops freezes the state of delay line as the stop signal occurs. The outputs of flip-flop will be high value if the start signal passes the delay stages and the sampling process will generate low value if the delay stages have not been passed by start signal. As a result, the position of high to low transition in this thermometer code indicates how far the start signal can be propagated in the interval spanned by start and stop signal. 3 2. State of art 2. 1 Buffer delay line TDC Fig. 5. Buffer delay line TDC [5] The start signal ripples along the buffer chain and flip-flops are connected to the outputs of buffers. On the arrival of stop signal the state of delay line is sampled by flip-flops.

One of the obvious advantages of this TDC is that it can be implemented fully digital. Thus it is simple and compact. However, the resolution is relatively low since it is the delay of one buffer. 2. 2 Inverter delay line TDC Fig. 6. Inverter delay line TDC [5] The resolution in this TDC is the delay of one inverter which is doubled compared to buffers delay chain. In this case, the length of measurement intervals is not indicated by the position of high to low transition but by a phase change of the alternation of high to low sequence. Consequently, the rise and fall delay of inverter should be made equal which requires highly 4 match of the process.

In addition, the resolution is still limited by technology and therefore not high enough in our application of ADPLL. 2. 3 Vernier TDC Fig. 7. Vernier delay line TDC [6] Vernier delay line TDC is capable of measuring time interval with sub-gate resolution. It consists of two delay lines which delay both start signal and stop signal. The delay in the first line is slightly larger than the delay in the second line. During the measurement, the start signal propagates along the first line and the stop signal occurs later. It seems like the stop signal is chasing start signal. In each stage, it catches up by = Delay1- Delay2 Therefore the resolution is dependent on the difference of two delay stages instead of one delay element.

Although the Vernier delay line TDC improves the resolution effectively, the area and power consumption is increased dramatically as the dynamic range becomes larger due to that each stage costs two buffers and one flip-flop. Besides, the conversion time will be increased and in a result it might be not feasible to work in a system. 5 2. 4 Gated ring oscillator (GRO) TDC Fig. 8. Gated ring oscillator TDC [6] The GRO TDC could achieve large dynamic range with small number of delay elements. It measures the number of delay element transitions during measurement interval. By preserving the oscillator state at the end of the measurement interval [k? ], the quantization error [k? 1], from that measurement is also preserved. In fact, when the following measurement of [k? 1] is initiated, the previous quantization error is carried over as [k] = [k? 1]. This results in first-order noise shaping of the quantization error in the frequency domain. Apart from the quantization noise, according to the well-known barrel shift algorithm for dynamic element matching, GRO TDC structure realizes first order shaping of mismatch error [6]. Thus, we can expect that this architecture ideally achieve high resolution without calibration even in the presence of large mismatch. 6 3 System level design 3. 1 Goal

The proposed TDC is designed to work with a 5GHz DCO and a 20MHz reference input while the circuit is fabricated in 65nm IBM CMOS technology; the supply voltage is 1. 2V and development environment is Cadence 6. 1. 3. Fig. 9. Test bench for measuring rising/falling time of input of TDC In order to find out the rising/falling time of the input signal for TDC, the 5GHz sine wave signal which is the same as the output of DCO in ADPLL is put through the inverter with the smallest size and the rising/falling time of the output of inverter is measured (Fig. 9) . 7 Fig. 10. Input and output of inverter Rising/falling time = 16. 58ps. This value is applied to model the practical case of input signals for TDC.

The purpose for putting the sinusoid signal generated from DCO passing through the smallest inverter is to model the worst case for TDC with weakest driving ability. As the system level simulation result of ADPLL presents, the dynamic range of TDC is 20ps. The converter resolution is required to be around 2ps meanwhile the power consumption should be kept as low as possible. Since in the application of this ADPLL, sub-gate resolution and small dynamic range are targeted, two kinds of topologies of TDC are proposed. One is Vernier delay line TDC and the other one is parallel TDC. The comparison of these two architectures is concluded and both of them are designed on schematic level. 8 3. 2 Vernier delay line TDC

Start Matched delay cell1 EN EN_ Delay1 Delay1 Delay1 Start_ Matched delay cell1 D Q D_ CLK Delay1 D Q0 D_ CLK Delay1 Delay1 D Q26 D_ CLK Stop Fig. 11. Diagram of Vernier delay line TDC 200ps Matched delay cell2 Delay2 Delay2 Delay2 start 20ps stop enable Valid output 2ns TDC_output Fig. 12. Timing of the interfaces of Vernier TDC As the description about Vernier TDC before, the start signal and stop signal are propagated by two delay line with small delay difference each stage respectively. The clock gating technology controlled by enable signal is used to realize low power dissipation. The timing relationship of interfaces is described in Fig. 2 which indicates that enable signal should be set to high value half 9 cycle of start signal ahead of the stop rising edge and the conversion time is about 2ns. The delay time of each stage in TDC is about 60ps to 70ps and 27 stages are design to cover the whole dynamic range so that the conservative estimation of conversion time of TDC would be no more than 2ns. The next stage of TDC in ADPLL should sample the output when it is stable. Since the period of FREF is 50ns which means that the instance of measurement occur every 50ns, it is reasonable to adopt the method of serial conversion and prepare the valid output data after 2ns delay. 3. 3

The delay cells connected to stop signal are sized for delays = 0+? ?N =? . The time difference between the delayed stop signal is quantized with a resolution The conversion results are available immediately after the rising of stop signal. 3. 4 Performance comparison Parallel TDC Parallel delay elements with gradually increasing propagation delays are simultaneously sampled on the arrival of stop signal. No loop structure feasible Sub-gate resolution Conversion time independent from resolution Susceptible to variations Not feasible to high dynamic range Careful layout design Vernier TDC Principle Start and stop signals propagate along two delay lines with slightly different delays.

Loop structure Pros Loop structure possible Sub-gate resolution Modular structure High dynamic range possible with loop structure Differential delay lines Conversion time depends on measurement interval and resolution Cons Table1. Performance comparison between Vernier TDC and parallel TDC 11 4 Schematic design and simulation 4. 1 Sense Amplifier Based Flip-Flop Flip-Flops are critical to the performance of Time to Digital Converter due to the tight timing constraints and low power requirements. Metastability is a physical phenomenon that limits the performance of comparators and digital sampling elements, such as latches and flip-flops. It recognizes that it akes a nonzero amount of time from the start of a sampling event to determine the input level or state [15]. This resolution time gets exponentially larger if the input state change gets close to the sampling event. In the limit, if the input changes at exactly the same time as the sampling event, it might theoretically take an infinite amount of time to resolve. During this time, the output can dwell in an illegal digital state somewhere between zero and one. However, this flip flop is supposed to be reused in ADPLL so that the metastable condition of the retimed reference clock CKR is not acceptable. One reason is that the metastability of any clock could introduce glitches and double clocking in the digital logic circuitry being driven.

The other reason is that it is quite likely that within a certain metastability window between FREF and CKV, the clock to Q delay of the flip flop would have the potential to make CKR span multiple DCO clock periods. This amount of uncertainty is not acceptable for proper system operation [4]. For the application of TDC, due to that the metastability sampling window should be no larger than the high resolution to avoid bubbles in TDC code [7], sensed amplifier based flip-flop (SAFF) is chosen. 12 VDD MP1 MP2 MP3 MP4 MN3 VDD MN4 D MN1 MN5 MN2 D_ CLK MN6 Pulse Generator Symmetrical SR latch S_ S R VDD R_ MP7 MP8 MP5 MP6 MP9 Q MP10 Q_ MN9 MN10 MN7 MN11 MN12 MN8 Fig. 15. Symmetric SAFF The SAFF shown as Fig. 5 consists of sense amplifier in the first stage and SR latch in the second stage. The amplifier senses complementary differential inputs and produces monotonous transitions from high to low logic level on one of the outputs following the leading clock edge. The SR latch captures each transition and holds the state until the next leading clock arrives [8]. When CLK is low, S_ and R_ are charged to high level through MP1 and MP4 meanwhile MN6 is closed. If D is high, S_ will be discharged through MN3, MN1 and MN6 which is opened by clock leading edges. Accordingly, R_ is hold to high level and Q is high in this case. The additional transistor MN5 is used to provide the discharging patch to ground. For example, when 13 ata is changed as CLK is high which means D is low and D_ is high at this time, S_ would be charging to high level if there is no MN5. However, S_ could be discharged through MN3, MN5, MN2 and MN6 since MN5 provides another path to ground. Although SR latch is able to lock the state of outputs of sense amplifier, MN5 prevents potential charging caused by leakage current even after the input data is changed and therefore guarantee the stable outputs of flip-flop. The SR latch, as the output stage, is kind of symmetric topology with equivalent pull-up and pulldown transistors network. Q+ = S + R_·Q Q_+ = R + S_·Q_ In the equations above, Q represents a current sate and Q+ represents a future state after the transition of clock.

Thus this circuit has equal delays of outputs and provides identical resolution of the rising and falling meta-stability of their input data. In addition, the data input capacitive loading is only one NMOS transistor and the interconnect capacitance parasitic is minimized. 4. 1. 1 Schematic design The basic principles of the SAFF design are that the size of the input transistors should be small enough to minimize the load effect of SAFF and large enough to ensure the speed of it. The PMOS and NMOS networks should be matched and the sizes of transistors are adjusted to obtain equal delay of differential outputs. Fig. 16. Schematic of SAFF 14 Fig. 17. Schematic of Sense Amplifier Fig. 18. Schematic of symmetric SR latch 15 4. 1. 2 Sampling window simulation Fig. 19.

Test bench of SAFF The ideal switch is used to initialize the output signal Q otherwise Q will be floating at the beginning of simulation which would result in unpredictable rising or falling edge at the beginning therefore make it difficult to measure a fixed number of signal transition edge. In the practical case, the initial value of inputs of flip flop is either zero or one. The simulation is performed by tuning the delay time of CLK in order to change the time interval between CLK and D/D_. There are several cases simulated to verify the timing constraints of SAFF including setup timing, hold timing and sample window. 1. Normal sampling 16 Fig. 20. Normal Sampling Case Data D changes from zero to one and then is sampled after it is stable for a while. The crossing point of Q and Q_ is around 600mV which means there are equal delay of clock to Q and clock to Q_ due to the symmetric topology of SAFF. 2.

Setup time simulation Setup time is the minimum time prior to triggering edge of the clock pulse up to which the data should be kept stable at flip flop input so that data could be properly sampled. This is due to the input capacitance present at the input. It takes some time to charge to the particular logic level at the input. During the simulation, the input data is changing from low to high and high value is supposed to be sampled. Sweep the position of CLK to find out when SAFF cannot capture the correct data. 17 Fig. 21. Extreme case of sampling for setup time simulation The clock to Q delay is increasing exponentially when input data is approaching the clock triggering edge.

When the data comes later than clock edge for 15ps, the clock to Q delay is up to about 280ps shown in Fig. 21. If the data comes even later than this, the output of flip flop will enter into metastable state or will never output high value. 3. Hold time simulation Hold time is the minimum time after the clock edge up to which the data should be kept stable in order to trigger the flip flop at right voltage level. This is the time taken for the various switching elements to transit from saturation to cut off and vice versa. During the simulation, the input data is changing from high to low and high value is supposed to be sampled. Sweep the position of CLK to find out when SAFF cannot capture the correct data. 18 Fig. 22.

Extreme case of sampling for hold time simulation The clock to Q delay is increasing exponentially if transition of input data from one to zero happens close to the clock edge. As long as the data could keep stable long enough the flip flop is capable of recognizing it during limit time interval. The hold timing constraint is that data should be stable after the clock rising at least 16ps (Fig. 22) to guarantee flip-flop could sample the right value otherwise the flip flop will enter into illegal state or never output high value. 4. Sampling window 19 2. 9 2. 8 2. 7 2. 6 x 10 -10 Tclk-Q 2. 5 setup time 2. 4 2. 3 2. 2 2. 1 2 -0. 5 hold time 0 0. 5 1 1. 5 2 Tdata-clk 2. 5 3 3. 5 x 10 4 -11 Fig. 23. Sampling window simulation

Sampling window is defined as the time interval in which the flip-flop samples the data value. During the interval any change of data is prohibited in order to ensure robust and reliable operation [8]. The flip-flop delay increases as the signal approaches the point of setup and hold time violation until the flip-flop fails to capture the correct data [9] which is displayed in Fig. 23. Metastability is modeled in critical flip-flops by continuous inspection of the timing relationship between the data input and clock pins and producing an unknown output on the data output pin if the delay to clock skew falls within the forbidden metastable window. Referring to Fig. 3, the metastable window is defined as an x-axis region such that the clock to Q delay on the y-axis is longer by a certain amount than the nominal clock to Q delay. For example, if the nominal clock to Q delay is 200ps when the data to clock timing is far from critical, the metastability window would be 15ps if one can tolerate clock to Q delay increase by 20ps. If one can tolerate a higher clock to Q delay increase of 30ps, the metastable window would drop to 6 ps. A question could be asked as to how far this window can extend. The limitation lies in the fact that for a tight data to clock skew, the noise or other statistical uncertainty, such as jitter, could arbitrarily resolve the output such that the input data is missed.

Therefore, for a conventional definition of setup time, not only must the output be free of any metastable condition, but the input data have to be captured correctly. For this reason, the setup and hold times are conservatively defined in standard-cell libraries for an output delay increase of 10 or 20% over nominal. The specific nature of TDC vector capturing does not require this restrictive constraint. Here, any output-level resolution is satisfactory for proper operation as long as it is not metastable at the time of capture, and consequently, 20 the metastable window could be made arbitrarily small [1]. This SAFF demonstrates very narrow sampling window less than 1ps according to the simulation results. 4. 2

Vernier delay line TDC There are several components in Vernier delay line TDC including inverter, SAFF, matched delay cell, delay cell 1 and delay cell 2 in which matched delay cell has the same circuit topology with other two delay cells except that it has enable control pins. 4. 2. 1 Delay cells There are several methods to implement delay elements. The most popular three methods for designing variable delay cells are shunt capacitor technique, current starved technique and variable transistor technique [10]. In this thesis project, current starved delay element is employed because of its simple structure and relatively wide delay range of regulation.

Vdd VBP M4 M2 M6 Vdd in C M1 M5 out VBN M3 Fig. 24. Current starved delay element As can be seen from the Fig. 24, there are two inverters between input and output of this circuit. The charging and discharging currents of the output capacitance of the first inverter, composed of M1and M2, are controlled by the transistors M3 and M4. Charging and discharging currents depend on the bias voltage of M3 and M4 respectively. In this delay element, both rising and 21 falling edges of input signal can be controlled. By increasing/decreasing the effective on resistance of controlling transistor M3 and M4, the circuit delay can be increased /decreased.

Fig. 25. Schematic of Matched delay cell As the enable signal is set to high level, the input signal will pass through this delay cell. The enable signal should be set to high level before the active edge of input signal comes. The differential start signal and stop signal passed through this delay cell to produce matched rising/falling edge signal for the next stage in TDC. With respect to design of the size of transistors, the input transistors of the delay cell should be relatively large to shield the load effect of SAFF meanwhile allow T5 to control the changing and discharging current through the capacitors of the first stage of inverter.

The second stage of inverter should have enough driving ability for 5GHz input signals and therefore the sizes are specified large enough to withdraw sufficient current from power supply for transition. Due to that the differential signals are delayed, the delay cell is also required to have matched PMOS and NMOS networks to achieve equal delay time for rising or falling input signals. 22 Fig. 26. Schematic of delay cell 1 Fig. 27. Schematic of delay cell 2 23 The only difference between these two delay cells above is the size of transistor T5. The W/L ratio of T5 in delay cell 2 is a bit larger than delay cell 2 makes the delay of delay cell 2 is slightly shorter than delay cell 1. These two delay cells constitute two delay lines for Vernier TDC. Fig. 28.

Schematic of Vernier delay line TDC This Vernier TDC includes 27 stages of delay cells for the reason that it should cover the dynamic range of 20ps and the additional offset value introduced by the setup timing of SAFF. The first dumpy stage of delay cell is used to match the differential input signals for the following delay lines so that the input signals for each stage are characterized with the same rising or falling time. As a result, the delay difference between each delay pair for start and stop signal is only dependent on the different size of transistors in the current starved delay cell. 24 4. 2. 2 Simulation results The input of Vernier TDC, the delay difference between the start and stop signal, is swept from 0 to 20ps.

The Differential Non Linearity (DNL) is the deviation in the difference between two successive threshold points from 1LSB. Integral Non Linearity (INL) is the deviation of the actual output. Both of them are calculated and reported in Fig. 32. The maximum DNL is +0. 4LSB while the maximum INL is -0. 89LSB. The process (skew) parameter files in the model directory contain the definition of the statistical distributions that represent the main process variations for the technology. This gives designers the capability of testing their designs under many different process variations to ensure that their circuits perform as desired throughout the entire range of process specifications. This is a Monte Carlo approach to the checking of designs.

While being the most accurate test, it can also be time consuming to run enough simulations to obtain a valid statistical sample. Fig. 33. Monte Carlo simulation of the resolution for Vernier delay line TDC When running Monte Carlo to include FET mismatch, BOTH the Spectre mismatch and process vary statements are active. This will turn on both process and mismatch variations. Spectre provides the unique capability of running process variations independent of mismatch variations. This capability is not supported for this release. The average resolution calculated by averaging the delay difference between two delay lines is around 1. 66ps. The average power over one period is 148. 1E-6 W.

The maximum power consumption is about 3. 6mW and the conversion time is around 2ns which is in accordance with the interfacing time estimation in system level design. Since the enable signal closed the TDC after the conversion is completed, the start signal with high frequency is prohibited to propagate so as to eliminate the unnecessary transition of delay cells and in a result saving the power dissipation. 27 4. 3 4. 3. 1 Parallel TDC Delay cells In order to design a serial of delay cells with the equal difference of delay time used in parallel TDC, the size of the transistor in a current starved structure is swept. Fig. 34. Delay cell in Parallel TDC 28

Fig. 35. Delay time Vs width of transistor T5 Unlike Vernier TDC, only stop signal is delayed by various delay cells in parallel TDC. Thus the control of rising edge required, and then the size of transistor T5 is adjusted. As can be seen from Fig. 34, the size of transistors M1, M2, M4 and M5 is basically determined by the load capacitance which refers to the CLK pin of SAFF in this situation. Transistor T5 should be much smaller than M2 so that the discharging current could be controlled by T5. As the size of T5 increases, the delay time becomes smaller which means the delay cell is faster. According to the parameter analysis result in Fig. 5, the size of T5 can be determined by selecting the size corresponding to the delay time with 2ps difference for a serial delay cells. Fig. 36. Schematic of Parallel TDC 29 As the analysis in system level design, the delay cells are sized for delays = 0 + ? ?N. The single stop signal is delayed in parallel TDC, therefore the matched delay cell connected to differential start signal is used to cancel the 0 and offset value. 4. 3. 2 Simulation results Similarly to Vernier TDC simulation, the input of parallel TDC, the delay difference between the start and stop signal, is swept from 0 to 20ps. The resolution and linearity are calculated and analyzed by conversion results from TDC. Fig. 37.

Input of parallel TDC (stop – start) = 0ps 30 Fig. 38. Input of parallel TDC (stop – start) = 20ps The offset value of this TDC is 1 observed from Fig. 37. The result shown in Fig. 38 indicates that the start signal has passed through 11 stages of delay cells as the input is 20ps. Resolution = (20ps – 0ps)/ (11 – 1) = 2ps. 20 18 16 Output of parallel TDC (ps) 14 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 Input of parallel TDC (ps) 16 18 20 Fig. 39. Parallel TDC transfer function 31 1 INL DNL DNL and INL [LSB] 0. 5 0 -0. 5 0 2 4 6 8 10 12 Input of parallel TDC 14 16 18 20 Fig. 40. Parallel TDC linearity DNL and INL are calculated and reported in Fig. 40. The maximum DNL is +0. LSB while the maximum INL is 1LSB. The average power over one period is 87. 33E-6 W which is much smaller than Vernier TDC. The reason is that the clock gating technology controlled by enable signal eliminates the redundant transition of delay cells. As the system level design indicates, the parallel TDC only works for 420ps each period of stop signal because that the conversion is completed instantly due to the intrinsic characteristic of parallel TDC and therefore there is no power consumption during the rest time. Although the peak power consumption is approximately equivalent to Vernier TDC, the average power dissipation is decreased dramatically. 32 Layout and post-simulation 5. 1 Layout of SAFF and post-simulation For the layout of radio frequency circuit the interconnection parasitic will be a critical problem. In an audio application for instance parasitic will probably be a minor concern. However, the operation frequency of this circuit is 5GHz which means that the interconnection parasitic will influence the performance of circuit dramatically. To minimize this influence, we could move interconnections to higher metals and make the metals carry current rather than poly. Besides, the floor plan should be as compact as possible to optimize the parasitic and impedance of interconnections. GND T0

Symmetric SR Latch T15 T14 T13 T8 T9 T5 T3 Q_ T1 T12 T10 T11 T7 Q T6 T4 T2 VDD T0 T2 T4 T3 T5 T9 T1 D T6 T7 D_ CLK T8 CLK GND Sensed Amplifier Fig. 41. Floor Plan of SAFF 33 There are several steps for floor plan. First step is to examine the size of transistors and split transistor size in a number of layout oriented fingers. Then identify the transistors than can be placed on the same stack according to the principles of using almost the same number of fingers per stack and put the transistors with common drain or source together. In the floor plan shown in Fig. 41, power line VDD is reused by SR latch and sensed amplifier to make the connections compact.

Fig. 42. Layout of SAFF 34 In the development environment of Cadence 6. 1. 3, Calibre is used for DRC and Assura is used to do LVS check and RCX. Post-simulation is then performed with av_extracted view. Fig. 43. Post-simulation of sampling window Compared to Fig. 23, Fig. 43 illustrates that the timing constraint point moved from 16ps to 29ps which will affect the offset value of TDC. In addition, the delay time from clock leading edge to output Q is increased. However, this SAFF after layout can be employed to avoid meta-stability effectively due to that the sampling window is still less than 1ps. 5. 2 Layout of parallel TDC and post-simulation

In this TDC system, the clock distribution network formed as a tree distributes the signal to all the delay cells. To reduce the clock uncertainty, the network requires highly matched topology showed as Fig. 44 below. 35 Clock Fig. 44. Floor plan of Clock distribution This kind of topology guarantees the equal delay from the common point clock to each element. Fig. 45. Layout of parallel TDC After DRC and LVS, the RC net list is extracted to do post-simulation. The input of parallel TDC after layout, the delay difference between the start and stop signal, is swept from 0 to 30ps. The resolution and linearity are calculated and analyzed by conversion results from TDC. Fig. 46.

Parallel TDC linearity after layout DNL and INL are calculated and reported in Fig. 49. The maximum DNL is 0. 33LSB while the maximum INL is 0. 5LSB. The average power over one period is 442. 1E-6 W. The maxim total current is about 3. 24mA. The peak power consumption is almost the same as the TDC before layout, but there are obvious ripples even the TDC is disabled due to that the parasitic capacitors increase the time for charging and discharging. 5. 3 Comparison and analysis Technique Parallel 2-level DL parallel Pseudo-diff DL VernierGRO CMOS [µm] 0. 065 0. 35 0. 13 0. 09 0. 09 Supply [V] 1. 2 3 1. 2 1. 3 1. 2 Power [mW] 3. 89 50 2. 5 6. 9 4. 32 Resolution [ps] 3 24 12 17 6. 4 INL/DNL 0. 5/0. 3 -1. 5/0. 55 -1. 15/1 0. 7/0. 7 – Work This [12] [3] [7] [13] Table2. Comparison to previous work Table2 compares the proposed TDC to prior published work in CMOS technology. This TDC features the fastest resolution with the best linearity. The power consumption is not directly comparable because the results from the other works are corresponding to different input range. However, it still indicates that this TDC consumes very low power due to that the start signal 38 only passes two buffers and the stop signal with low frequency is delayed. The TDC error has several components: quantization, linearity and randomness due to thermal effects.

As can be seen from table5, the implemented TDC achieves medium linearity which can be improved if the layout is enhanced from floor plan considering the parasitic effects. With respect to quantization noise, the total noise power generated from this kind of TDC is spread uniformly over the span from dc to the Nyquist frequency without modulation. As a result, the proposed TDC contributes the lowest noise floor due to high resolution. = =3ps, , = 20MHz, we obtain = -104. 3 dBc/Hz. Banerjee’s figure of merit (BFM) [14], being a 1-Hz normalized phase noise floor, is defined as BFM = where is a sampling frequency of the phase comparison and N= is the frequency division ratio of a PLL.

It is used to compare the phase performance of PLLs with different reference frequencies and division ratios. In this TDC based ADPLL, BFM = -225. 3dB. Even though state-of-the-art conventional PLLs implemented in a SiGe process can outperform the ADPLL presented here in the in band phase noise, -213 dB in reference and -218 dB in reference, the worst case BFM of -205 dB appears adequate even for GSM applications, since there are no other significant phase-noise contributions as in the conventional PLLs [4]. However, the Gated Ring Oscillator TDC is able to push most of the noise to high frequency region which is then filtered by the loop filter in ADPLL through holding oscillation node state between measurements.

The obvious drawback of this TDC is that the dynamic range is relatively small which will limit the application of it. Parallel TDC is not feasible to compose the loop structure so that the area and power dissipation will be increased dramatically if larger dynamic range is required. But the Vernier TDC designed in this thesis can be used in the loop structure for large dynamic range. 39 6 Conclusion In this thesis, two kinds of Time to Digital Converters are designed with Vernier and parallel structure on schematic level respectively. The performance of these two TDCs are concluded and compared. In the Vernier TDC, only two delay cells are designed and then reused to constitute two delay lines with slightly different delay time.

This architecture is easy to implement and reduces the mismatch with delay cells. But the conversion time dependent on resolution and measurement interval time is relatively long since the signals are propagating along the delay cells in serial. On the other hand, in parallel TDC, the process of conversion is completed instantaneously due to that the signals are passing through the delay cells and then captured in parallel. Thus it has lower average power dissipation over one period. However, a set of delay cells are designed which obviously introduce nonlinearities. To minimize the mismatch problem, the single stop signal is delayed instead of two input signals for avoiding the differential mismatch situation.

To sum up, both of the TDCs achieve sub-gate resolution which is able to meet the application requirements and Vernier TDC has higher resolution and better linearity but longer conversion time and larger power consumption compared to parallel TDC according to the simulation results. The parallel TDC is chosen to be implemented on layout. Comparing the results from schematic simulation and post-simulation, the performance is decreased on resolution, linearity and power consumption after layout. The major reason for this phenomenon is the parasitic capacitance of transistors and real wires which is a significant factor to affect the final properties in high frequency circuits.

In the stage of schematic design, the sizes of transistors are not fully considered and results in difficulties on floor plan of layout. Specifically, the transistors are rather difficult to split into the same fingers per stack and therefore the floor plan is not compact enough to minimize the interconnections. Besides, the parasitic capacitance should have been emulated on schematic simulation in order to predict the effect after layout otherwise it would be very time consuming if the schematic design is modified after layout. In addition, the size of transistors is very small which makes them comparable to wire parasitic effects. Although small transistors are with smaller parasitic capacitance and less power consumption, they will more sensitive to layout mismatch.

The function of the TDCs designed and implemented in the thesis is guaranteed for the application but the performance needs to be improved. The layout turns out to be an essential stage for the final characteristics of the circuits. With a more thoughtful design flow and sophisticated consideration for mismatch, the circuits after layout could maintain the performance as schematic level. 40 7 Future work There is plenty of more work to be done to improve the performance of TDC. Due to that the TDC is essential to the aggressive goal of phase noise from all digital PLL, other kinds of architectures of it are worth to try for the required resolution and dynamic range. Since the performance of circuit after layout is not identical with schematic, the size of transistors could be modified for layout oriented. To reduce the parasitic effects, layout should be improved from a better floor plan. Vernier TDC with higher resolution and better linearity could be implemented on layout which can tolerate first order PVT variation if two delay chains are well matched [11]. Although the Vernier TDC and parallel TDC achieve high resolution, they have very low efficiency when measuring large time intervals, which requires extra hardware and power consumption. To overcome this limitation, a Vernier Ring TDC has been proposed recently.

Unlike the conventional Vernier TDC, this novel TDC places the Vernier delay cells in a ring format such that the delay chains can be reused for measuring large time intervals. Digital logic monitors the number of laps the signals propagate along the rings. Arbiters are used to record the location where the lag signal catches up with the lead signal. The reuse of Vernier delay cells in a ring configuration achieves fine resolution and large detectable range simultaneously with small area and low power consumption [11]. This architecture of Vernier Ring TDC combines the Vernier delay lines and GRO topology is worth to implement for wide application. ? ? 41 8 [1] [2] [3] [4] [5] [6] [7]