Performing static timing analysis is the process of verifying that every signal path in a design meets required clock-cycle timing, whether or not all of the signal paths are even possible. Static timing analysis is not used to verify the functionality of the design, only that the design meets timing goals. In theory, timing verification could be accomplished by running exhaustive gatelevel simulations with SDF backannotation of actual timing values after a design is placed and routed. This is often referred to as dynamic timing verification.

Timing analysis using Synopsys tools on a completely synchronous design is relatively easy to perform using either DesignTime within the Synopsys Design Compiler or Design Analyzer environments, or by using PrimeTime.

Timing analysis on modules with two or more asynchronous clocks is error prone, more difficult and can be time consuming.

Static timing analysis on signals generated from one clock domain and latched into sequential elements within a second, asynchronous clock domain is inaccurate and for the most part worthless.

The timing information for a signal latched by a clock that is asynchronous to the latched signal is inaccurate because the phase relationship between the signal and the asynchronous clock is always changing; therefore, the static timing analysis tool would have to check an infinite number of phase relationships between the signal and asynchronous clock.

The fact is, one must assume that signals that pass from one clock domain to another at some point will violate either setup or hold times on the destination sequential element. There is no good reason to perform timing analysis on signals that are generated in one clock domain and registered in another asynchronous clock domain. It is a given that these signals DO violate setup and hold times on the destination register. This is why synchronizers (see section 3.0) are needed, to alleviate the problems that can occur when a signal is passed from one clock domain to another.

For RTL modules that have two or more asynchronous clocks as inputs, a designer will be required to indicate to the static timing analysis tool which signal paths should be ignored. This is accomplished by "setting false paths" on signals that cross from one clock domain to another. This can be a tedious and error prone job unless the guidelines in the next two sections are followed.

Clock Naming ConventionsGuideline: Use a clock naming convention to identify the clock source of every signal in a design.Reason: A naming convention helps all team members to identify the clock domain for every signal in a design and also makes grouping of signals for timing analysis easier to do using regular expression "wild-carding" from within a synthesis script. A number of useful clock naming conventions have been used by various design teams. Examples included: uClk for the microprocessor clock, vClk for the video clock and dClk for the display clock. Each signal was synchronized to one of the clock domains in the design and each signal-name had to include a prefix character identifying the clock domain for that signal. Any signal that was clocked by the uClk would have a u-prefix in the signal name, such as uaddr, udata, uwrite, etc. Any signal that was clocked by the vClk would similarly have a v-prefix in the signal name, such as vdata, vhsync, vframe, etc. The same signal naming convention was used for all signals generated by any of the other clocks in the design.

Using this technique, any engineer on the ASIC design team could easily identify the clockdomain source of any signal in the design and either use the signals directly or pass the signals through a synchronizer so that they could be used within a new clock domain. The naming convention alone contributed significantly to the productivity of the design team. How do we know there was a productivity gain? One of the design engineers started his part of the ASIC design using his own naming convention, ignoring the convention in use by the other design team members. After much confusion about the signals entering and leaving his design partition, a team meeting was called and the non-compliant designer was "strongly encouraged" to rename the signals in his part of the design to conform to the team naming convention. After the signal names were changed, it became easier to interface to the partition in question. Fewer questions and less confusions occurred after the change.

Design PartitioningGuideline: Only allow one clock per module.Reason: Static timing analysis and creating synthesis scripts is more easily accomplished on single-clock modules or groups of single-clock modules.

Guideline: Create a synchronizer module for each set of signals that pass from just one clock domain into another clock domain.Reason: It is given that any signal passing from one clock domain to another clock domain is going to have setup and hold time problems. No worst-case (max time) timing analysis is required for synchronizer modules. Only best case (min time) timing analysis is required between first and second stage flip-flops to ensure that all hold times are met. Also, gate-level simulations can more easily be configured to ignore setup and hold time violations on the first stage of each synchronizer.

By partitioning a design to permit only one clock per module, static timing analysis becomes a significantly easier task. The next logical step was to partition the design so that every input module signal was already synchronized to the same clock domain before entering the module. Why is this significant? If allsignals entering and leaving the module are synchronous to the clock used in the module, the design is now completely synchronous! Now the entire module can be static timing analyzed without any "false paths" and Design Compiler can be used to "group" all of the same-clock synchronous modules to perform complete, sequential static timing analysis within each clock domain.

There is one exception to the above recommendation. Multi-clock designs require at least some RTL modules to pass signals from one clock domain to modules that are clocked within a different clock domain.

For example, someone created separate synchronizer modules that permitted signals from one and only one clock domain to be passed into a modulethat synchronized the signals into a new clock domain. Using the naming convention described earlier, all processor-clock generated signals (usignals)would be used as inputs to a module that might be clocked by the video clock. This module was called the "sync_u2v" module and the RTL code did nothing more than take each usignal input and run it through a pair of flip-flops clocked by vClk. Aside from the vClk and reset inputs, every other input signal to the "sync_u2v" module had a "u" prefix and every output signal from that same module had a "v" prefix.

No worst-case timing analysis is required on the "sync" modules because we know that every input signal to these modules will have timing problems; otherwise, we would not have to pass the signals through synchronizers. The only timing analysis that we need to perform within synchronizer modules is min-time (hold time) analysis between the first and second flip-flop stages for each signal. In general, if there are n asynchronous clock domains, the design will require n(n-1) synchronizer modules, two for each pair of clock signals (example: using the uClk and vClk signals: the two synchronizer modules required would be sync_u2v and sync_v2u). Only if there are no signals that pass between two specific clock domains will a pair of synchronizer modules not be required.

After modifying all of the RTL files to create either completely synchronous modules or synchronizer modules, the task of generating synthesis scripts becomes trivial. All of the script files which previously included "set_false_path" commands were either deleted or significantly simplified. All timing problemswere easily identified and fixed (because they were all within single-clock domain groupings) and the final synthesis runs completed two weeks earlier than anticipated, putting the project back on schedule and completely justifying the decision to repartition the design.

Synthesis Scripts & Timing AnalysisFollowing the guidelines of the previous section, to only permit one clock per module, to require that all signals entering non-synchronizer modules are also in the same clock domain that is used to clock that module and to require that synchronizer modules only permit input signals from one other clock domain, helps to simplify the timing analysis and synthesis scripting tasks associated with a multi-clock design.

GroupingGroup together all non-synchronizer modules that are clocked within each clock domain. One group should be formed for each clock domain in the design. These groups will be timing verified as if each were a separate, completely synchronous design.

Identifying False PathsIn general, only the inputs to the synchronizer modules require "set_false_path" commands. If a clock-prefix naming scheme is used, then wild-cards can be used to easily identify all asynchronous inputs. For example, the sync_u2v module should have inputs that all start with the letter "u". The following dc_shell command should be sufficient to eliminate all asynchronous inputs from timing analysis:

set_false_path -from { u* }

Performing Min-Max Timing AnalysisEach grouped set of modules for each clock domain is now a completely synchronous sub-design and tools such as DesignTime or PrimeTime can be used to verify worst case timing (including setup time checks) and best case timing (including hold time checks). The synchronizer blocks are timing verified separately. Worst case timing checks are not required because these modules are just composed of flip-flops to synchronize asynchronous input signals; therefore, there are no long path delays and the outputs are fully registered. After setting false paths on all of the asynchronous inputs, best case (minimum) timing verification is conducted to insure that hold times are met on all signals that are passed from the first to second stage synchronizing flip-flops.

Synchronizing Fast Signals Into Slow Clock DomainsA general problem associated with synchronizers is the problem that a signal from a sending clock domain might change values twice before it can be sampled into a slower clock domain. This problem must be considered any time signals are sent from one clock domain to another. Synchronizing slower control signals into a faster clock domain is generally not a problem since the faster clock signal will sample the slower control signal one or more times. Recognizing that sampling slower signals into faster clock domains causes fewer potential problems than sampling faster signals into slower clock domains, a designer might want to take advantage of this fact and try to steer control signals towards faster clock domains.This has been explained in Part 1 of this article.

The purpose of synchronizing signals is to protect downstream logic from the metastable state of the first flip-flop in a new clock domain.

A simple synchronizer comprises two flip-flops in series without any combinational circuitry between them. This design ensures that the first flip-flop exits its metastable state and its output settles before the second flip-flop samples it.You also need to place the flipflops close to each other to ensure the smallest possible clock skew between them.

Foundries help with signal synchronization by providing synchronizer cells. These cells usually comprise a flip-flop with a very high gain that uses more power and is larger than a standard flip-flop. Such a flip-flop has reduced setup-and hold- time requirements for the input signal and is resistant to oscillation when the input signal causes a metastable condition.

Another type of synchronizer cell contains two flip-flops, thus easing your job by placing the flip-flops close to each other and preventing you from placing any combinational logic between them. For synchronization to work properly, the signal crossing a clock domain should pass from flip-flop in the original clock domain to the first flip-flop of the synchronizer without passing through any combinational logic between the two (see Fig below).

This requirement is important because the first stage of a synchronizer is sensitive to glitches that combination logic produces. A long enough glitch that occurs at the correct time could meet the setup-and-hold requirements of the first flip-flop in the synchronizer, leading the synchronizer to pass a false-valid indication to the rest of the logic in the new clock domain.

A synchronized signal is valid in the new clock domain after two clock edges. The signal delay is between one and two clock periods in the new clock domain.A rule of thumb is that a synchronizer circuit causes two clock cycles of delay in the new clock domain, and a designer needs to consider how synchronization delay impacts timing of signals crossing clock domains.

Synchronizers fall into one of three basic categories:level, edge-detecting, and pulse.

Level Synchronizer:In a level synchronizer, the signal crossing a clock domain stays high and stays low for more than two clock cycles in the new clock domain. A requirement of this circuit is that the signal needs to change to its invalid state before it can become valid again. Each time the signal goes valid, the receiving logic considers it a single event, no matter how long the signal remains valid. This circuit is the heart of all other synchronizers.

Edge Synchronizer:The edge-detecting synchronizer circuit adds a flip-flop to the output of the level synchronizer (see Fig below). The output of the additional flip-flop is inverted and ANDed with the output of the level synchronizer. This circuit detects the rising edge of the input to the synchronizer andgenerates a clockwide, active-high pulse. Switching the inverter on the AND gate inputs creates a synchronizer that detects the falling edge of the input signal. Changing the AND gate to a NAND gate results in a circuit that generates an active- low pulse.

The edge-detecting synchronizer works well at synchronizing a pulse going to a faster clock domain. This circuit produces a pulse that indicates the rising or falling edge of the input signal. One restriction of this synchronizer is that the width of the input pulse must be greater than the period of the synchronizer clock plus the required hold time of the first synchronizer flip-flop. The safest pulse width is twice the synchronizer clock period. This synchronizer does not work if the input is a single clockwide pulse entering a slower clock domain; however, the pulse synchronizer solves this problem.

Pulse Synchronizer:The input signal of a pulse synchronizer is a single clockwide pulse that triggers a toggle circuit in the originating clock domain (See Fig below). The output of the toggle circuit switches from high to low and vice versa each time it receives a pulse and passes through the level synchronizerto arrive at one input of the XOR gate, while a one-clock-cycle-delayed version goes to the other input of the XOR. For one clock cycle, each time the toggle circuit changes state, the outputof this synchronizer generates a single clockwide pulse.

The basic function of a pulse synchronizer is to take a single clockwide pulse from one clock domain and create a single clockwide pulse in the new domain. One restriction of a pulse synchronizer is that input pulses must have a minimum spacing between pulses equal to two synchronizer clock periods. If the input pulses are closer, the output pulses in the new clock domain are adjacent to each other, resulting in an output pulse that is wider than one clock cycle. This problem is more severe when the clock period of input pulse is greater than twice the synchronizer clock period. In this case, if the input pulses are too close, the synchronizer does not detect every one.

I am composing this article to explore various aspects of clock and data synchronization.

The first part of the article talks about Level Synchronizers, Edge Synchronizers and Pulse Synchronizers. The second part deals with Synthesis and Scripting Techniques for Designing Multi-Asynchronous Clock Designs.

In todays design flows we have many software programs to help them create million-gate circuits, but these programs do not solve the problem of signal synchronization. It is up to the designer to know reliable design techniques that reduce the risk of failure forcircuits communicating across clock domains.

The first step in managing multiclock designs is to understand the problem of signal stability. When a signal crosses a clock domain, it appears to the circuitry in the new clock domain as an asynchronous signal. The circuit that receives this signal needs to synchronize it. Synchronization prevents the metastable state of the first storage element (flip-flop) in the new clock domain from propagating throughthe circuit.

Metastability is the inability of a flip-flop to arrive at a known state in a specific amount of time. When a flip-flop enters a metastable state, you can predict neither the element's output voltagelevel nor when the output will settle to a correct voltage level. During this settling time, the flip-flop's output is at some intermediate voltage level or may oscillate and can cascade the invalid output level to flip-flops farther down the signal path. The input must be stable during a small window of time around the active edge of the clock for any flip-flop. This window of time is a function of the design of the flip-flop, the implementation technology, operating conditions, and the load on the output for outputs that are not buffered. Sharp edge rates on the input signal minimize the window.More windows of vulnerability arise as the clock frequency increases, and the probability of hitting the window increases as the data frequency increases.

Clock jitter is the deviation from the ideal timing of clock transition events. Because such deviation can be detrimental to high-speed data transfer and can degrade performance, jitter must bekept to a minimum in a high-speed system.

High-speed signaling is very sensitive to jitter. As signals toggle faster and faster, tighter restrictions fall on the signal transmitter and receiver. In many high-speed data applications, the clock edge must fall within a tight margin of time to capture data correctly. The more jitter in a system, the more often the clock edge will fall outside the margin. The frequency of clock edge deviations from theacceptable margin translates to the system's bit error rate (BER).

/* example 2a */ reg a,b,c; always @(posedge clock) begin b = a; c = b; /* Only c will be a flip flop,b will go away after synthesis. *//* We could delete the 2 above assignments and replace it with c=a;b=a; In fact, b is the same as c and can be eliminated.*/ end

Note that I am talking about SIMULATOR memory, not flip-flop count after synthesis. In most cases, the simulator has to remember the value before and after posedge clock if a reg goes between modules in order in order to "execute modules in parallel", so there may be no savings.

Some people like blocking because you can see sharing of resources more readily. // example 5

Some of the simulators out there will execute module abc first and then module xyx. This effectively transfers contents of c to a in ONE clk cycle. This is what some people refer to as a simulator race conditon. Other simulators will execute module xyz and then module abc giving a different simulation result. In some simulators, order of execution cannot be controlled by users.

Basics of Clock Tree Synthesis:The main idea is to balance the skew between endpoints. They are built with the following constraints.

Clock Skew: Difference between the clock arrival times.

Clock Latency: Max delay between the clock root and clock leaf.

Transition Time

Clock buffers are usually bigger in size and have a shorter transition time as well as a more even rise and fall times.

Clock nets are generally routed first and on higher metal layers with minimum detouring to give it the highest priority in routing.

Few other clock tree related topics that will be covered subsequently are

Signal Integrity issues/Clock nets aggressor/clock shielding: Clock nets due to their importance have to be protected from becoming either aggressors or victims in SI closure. They are generally shielded with vdd or gnd to prevent that.

Effective skew: Worst skew between two flops that are talking to each other. This is either equal to or lower than the worst skew.

Useful skew: This is a concept where the skew (Difference in arrival of the clock at the flops is used to improve setup violations.

Few links that have more detailed information about clock tree synthesis

"Safe" State Machines:If the number of states (N) is a power of 2 and you use a binary or gray-code encoding algorithm, then the state machine is "safe." This ensures that you have M number of registers where N = 2M. Because all of the possible state values (or register statuses) are reachable, the design is "safe."

"Unsafe" State Machines:If the number of states is not a power of 2, or if you do not use binary or gray-code encoding algorithm with fully defined states (e.g., one-hot), then the state machine is "unsafe" as it can stray into an undefined state.

FSM types and significance in detail:

Binary Encoding:1. States are numbered starting from binary '0' and above.2. '1' flip flop for very bit of the encoded binary number.3. States are assigned in binary sequence.Adv:1. Lesser number of flip flops - log(n) for n states.2. Less area, so good for area constrained circuits.Dis-Adv:1. More that '1' bit can flip anytime.2. Getting into a stale state is possible.3. Complex decoding logic is necessary to find the state that you are currently in.4. More number of ff toggling at the same time causes more power to be consumed.

Gray Encoding:1. States are numbered starting from binary '0' and above in gray style.2. One flip flop for very bit of the encoded gray code.3. Assign adjacent gray codes to adjacent states.Adv:1. Same number of ff's as binary.2. Only '1' bit is different for adjacent states, so less chance of getting in to a stale state.3. Only '1' ff changes at any given time so less power consumed.4. Less area so good for area constarined circuits.Dis-Adv:1. Decoding logic is complex.

One Hot Encoding1. Only '1' flip flop for every state rather than '1' flip flop for every bit..2. Only '1' flip flop can be '1' at any time, all others must be '0'.Adv:1. Very simple decoding logic, so checking for a particular state is as easy as reading the correspoding ff.2. '2' ff's change their state every time - less power.

Dis-Adv:1. More ff's

Suited for FPGAs1. Uses the ffs in the CLBs for state decoding.2. Lesser number of routing hops required for decoding.

Metastability is the ability of a non-equilibrium electronic state to persist for a long period of time. Usually the term is used to describe a state that doesn't settle into equilibrium within the time required for proper operation.

The flip-flop is a device that is susceptible to metastability. It has two well-defined stable states, traditionally designated 0 and 1, but under certain conditions it can hover between them for longer than a clock cycle. This condition is known as metastability. In most cases it is considered a failure mode of the logic design and timing philosophy or implementation.

The most common cause of metastability is violating the flip-flop's setup and hold times. During the time from the setup to the hold time, the input of the flip-flop should remain stable; a change in the input in that time will have a probability of setting the flip-flop to a metastable state.

In a typical scenario where data travels from the output of a source flip-flop to the input of target flip-flop, metastability is caused by either:

(1) the target clock having a different frequency than the source flip-flop, in which case the setup and hold time of the target flip-flop will be violated eventually, or

(2) the target and source clock having the same frequency, but a phase alignment that causes the data to arrive at the target flip-flop during its setup and hold time. This can be caused by fixed overhead or variations in logic delay times on the worst case path between the two flip flops, variations in clock arrival times (clock skew), or other causes.

Delay statements, e.g. @(posedge clock), require careful attention if there are several in a row. If there are only delays on the positive edge of the clock you can implement them with a state machine:

2 : begin // this will definitely begin at the negative edge as state 1 precedes itcommand3;state <= 3;end

3 : state <= 4; // we arrive at the positive edge of the clock, but need to wait a clock cycle

4 : if (clock == 1) // wait for the positive edgebegincommand4;state <= 0;end // we'll get back to state 0 at the negative clock edge, the right time for command1endcaseend

As you can see, multiple clock edges requires care to implement in synthesizable Verilog.

Note: in general commandi refers to a block of commands. It is assumed there is an appropriate clock for the case statement state machines. Care is required in setting appropriate reset states, initialization, and completion of use of a state machine:

* Is there a signal to tell the state machine to begin?* Does a done signal go high, signalling the state machine has finished?* When it is not in operation, does the state machine idle correctly? Does it change signal values shared with other code? Does it set outputs from it to appropriate idling values?* Is the state machine reset to the idle state by a reset signal?* Ensure that you initialize all registers.* Ensure that your state register has the correct bit width - if it is too small, assigning a larger state value will just return it to an earlier state.

When implementing Verilog tasks in modules, the best approach is to group tasks that have the same output signals into separate modules. If different modules control the same signal, then explicit arbitration logic is required to specify which module is controlling the signal at a given time. To put tasks in a separate module, you will require start and completion handshaking signals.

1 : begin
do_update_a_and_b = 0; // stays high for only one cycle
if (done_update_a_and_b)
begin
// we assume the values of x and b weren't needed on the previous cycle, otherwise additional circuitry is needed
// or x_temp and b_temp values need to be used on that cycle - it's very difficult to coordinate this correctly
// in the general case
x = x_temp;
b = b_temp;
top_state <= 2;
end
end

3 : begin
do_update_a_and_b = 0; // stays high for only one cycle
if (done_update_a_and_b)
begin
// we assume the values of x and b weren't needed on the previous cycle, otherwise additional circuitry is needed
// or x_temp and b_temp values need to be used on that cycle - it's very difficult to coordinate this correctly
// in the general case
x = x_temp;
b = b_temp;
top_state <= 4;
end
end

4: command1;
endcase

Now if we didn't care about having a couple of additional cycle delays between updates (i.e. assuming nothing depends on the variable values immediately, and nothing else is changing variable values), we could implement this in a far simpler fashion:

Note: in general commandi refers to a block of commands. It is assumed there is an
appropriate clock for the case statement state machines.
Care is required in setting appropriate reset states, initialization, and completion of use of a
state machine:

o Is there a signal to tell the state machine to begin?
o Does a done signal go high, signalling the state machine has finished?
o When it is not in operation, does the state machine idle correctly? Does it change signal values shared with other code? Does it set outputs from it to appropriate idling values?
o Is the state machine reset to the idle state by a reset signal?
o Ensure that you initialize all registers.
o Ensure that your state register has the correct bit width - if it is too small, assigning a larger state value will just return it to an earlier state.

In some special cases, it may not be necessary to have done signals, but in general the blocks of commands being executed in parallel by fork may finish at different times. Again, if a cycle delay between command1 and the other commands executing is acceptable, then this code is simpler:

As y or z may have different values after a clock cycle passes, care needs to be taken in choosing the simpler alternative, that doesn't exactly implement the behavioural code.

Note: in general commandi refers to a block of commands. It is assumed there is an appropriate clock for the case statement state machines. Care is required in setting appropriate reset states, initialization, and completion of use of a state machine:

* Is there a signal to tell the state machine to begin?
* Does a done signal go high, signalling the state machine has finished?
* When it is not in operation, does the state machine idle correctly? Does it change signal values shared with other code? Does it set outputs from it to appropriate idling values?
* Is the state machine reset to the idle state by a reset signal?
* Ensure that you initialize all registers.
* Ensure that your state register has the correct bit width - if it is too small, assigning a larger state value will just return it to an earlier state.

Again, if a cycle delay between command1 and the other commands executing is acceptable, simpler code is the following:

case (state)
0 : begin
command1;
state <= 1;
end

1 : if (x != 0)
begin
command2;
end
else command3;
endcase

Note: in general commandi refers to a block of commands. It is assumed there is an
appropriate clock for the case statement state machines.
Care is required in setting appropriate reset states, initialization, and completion of use of a
state machine:

o Is there a signal to tell the state machine to begin?
o Does a done signal go high, signalling the state machine has finished?
o When it is not in operation, does the state machine idle correctly? Does it change signal values shared with other code? Does it set outputs from it to appropriate idling values?
o Is the state machine reset to the idle state by a reset signal?
o Ensure that you initialize all registers.
o Ensure that your state register has the correct bit width - if it is too small, assigning a larger state value will just return it to an earlier state.

You also need to add the variable state, reg state. If a cycle delay between command1 and command3 does not matter, then the following is simpler, but not identical to the original:

case (state)
0 : begin
command1;
state <= 1;
end

1 : if (x != 0) // wait until this is true
command3;
endcase

The latter approach is preferred in many cases for coding simplicity.

Note: in general commandi refers to a block of commands. It is assumed there is an appropriate clock for the case statement state machines. Care is required in setting appropriate reset states, initialization, and completion of use of a state machine:

* Is there a signal to tell the state machine to begin?
* Does a done signal go high, signalling the state machine has finished?
* When it is not in operation, does the state machine idle correctly? Does it change signal values shared with other code? Does it set outputs from it to appropriate idling values?
* Is the state machine reset to the idle state by a reset signal?
* Ensure that you initialize all registers.
* Ensure that your state register has the correct bit width - if it is too small, assigning a larger state value will just return it to an earlier state.

In general, we (human beings) express negative numbers by placing a minus (-) sign at the left end of the number. Similarly while representing the integers in binary format, we can leave the left-most bit be the sign bit. If the left-most bit is a zero, the integer is positive; if it is a one, it is negative.

To make it easy to design computers which do integer arithmetic, integers should obey the following rules:

(1) Zero is positive and -0 = 0
(2) The top-most bit should tell us the sign of the integer.
(3) The negative of a negative integer is the original integer ie., --55 is 55.
(4) x - y should give the same result as x + -y. That is, 8 - 3 should give us the same result as 8 + -3.
(5) Negative and positive numbers shouldn't be treated in different ways when we do multiplication and division with them.

2s complement has become the standard method of storing signed binary integers. It allows the representation of numbers in the range – (2n) to 2n-1, and has the major advantage of only having one encoding for 0.

A simple and elegant way to represent integers which obeys these rules is called 2s complement. The 2s complement of an integer is calculated by changing all bits of integer from 1 to 0 & 0 to 1, then adding 1 to the result.

1's complement addition is distinguished from the 2's complement addition typically encountered in (unsigned) computer arithmetic by how overflow bits are handled. 1's complement overflow bits are carried around back into the sum while 2's complement overflow bits are discarded. In general, the inverse of a number under a given mathematical operation is the value which when operated on with that number returns the identity element. The 1's complement additive inverse of a number is its bitwise complement (replace 0s with 1s and 1s with 0s). This proposal relies on a number and its complement summing to zero (the additive identity element). Actually they sum to negative zero--1's complement addition has two identity elements. Recall that an identity element under a given operation is a value which leaves any other number unchanged when the operation is applied. Under 1's complement arithmetic the addition of either zero (all 0's) or negative zero (all 1's) to a number will generate a sum equal to the original number.

1's complement addition is both associative and commutative (it forms an Abelian group over the unsigned integers), so it is immaterial whether an identity element is added to a number or the number is added to an identity element, or whether the number operates on its inverse or the inverse operates on the number--both arrangements have the same result. Also note that the operation of subtraction is equivalent to adding the inverse (complement) of the number.

A negative setup and hold condition is a very interesting proposition in static timing analysis. Support for this type of conditions was added in the Verilog LRM, only in the late 90's (using the $SETUP and $HOLD constructs).

The basic idea is something like this:Consider a module with an ideal flop in it. Now, there exists a data path (from primary inputs of module to D of flop) and a clock path (from primary inputs to CLK of flop). Suppose the data path delay is DD and clock path delay is 0 . Therefore, if we consider the clock pulse reaching at the primary input of the module as the reference time, the clock pulse will reach CLK pin (of flop) at 0. The data pulse will reach D pin at DD. Therefore, for setup check to be met, the data pulse must reach the primary inputs of the module, at -D, which means the setup requirement is D. Now consider a clock path delay of CD. This means that the clock pulse now reaches the flop, only after time CD. This means, the data pulse need not begin so early, and rather it has to begin at -DD+CD time(just right shifting the pulse by CD time). This means the setup requirement is now DD-CD. In this case, if CD>DD, then the setup requirement becomes negative, which means, the data pulse can reach the primary input of the module after the clock pulse has reached there.

Similarly for hold:Consider that the data delay is 0 and the clock delay is CD. Now, the data must not change for atleast CD time, for the flop to be able to latch it. Therefore, the hold requirement is CD. Now, consider a data delay of DD. This means that, now the data need not change only for CD -DD. This is the new hold requirement. If DD>CD, then hold requirement is negative. If we analyse these results mathematically, we can see that setup relation + hold relation =0.

Physically, this implies that an infinitesimally short pulse (a delta pulse) can be captured; which is of course not possible. A more accurate model would be:setup_val<DD-CD (for setup to be met, the time at which data begins should be atleast DD-CD before 0) hold_val < CD-DD (for hold to be met, the time for which the data should be stable should always be greater than the hold_val) Now, the model we described, regarding the module with an ideal flop, is actually a real world flop. In an actual flop, there are more than one data paths and 8 clock paths. Therefore the more accurate description would be: DDmax-CDmin >= setup_val (for setup to be met) CDmax-DDmin >= hold_val (for hold to be met) These kind of relationships, especially the ones, where a negative relations can hold cause problems in simulators. Take for example a data pulse, which rises at 0.0 and falls at 2.0. Now the clock pulse rises at 3.0 . Lets say data delay is 1.0 Assume the origin at the clock pulse (3.0) . Therefore data rise is at -3.0, fall is at -2.0 . The setup relationship may be specified as 2.0, which means data should be present at 0.0-2.0=-2.0 . Now data will arrive at -3.0+DD-CD=-3.0+1.0+0.0=-2.0 (setup OK) The hold relationship may be specified as -1.0, which means data must not change till 0.0+(-1.0)=-1.0. Now, according to our relationship, data will not change till 0.0+CD-DD=0.0-1.0=-1.0. All looks hunky dory...but... There is no problem with the timing checks, however in software, the simulator would capture the falling 2.0 edge rather than the high edge. So the simulator will get the functionally incorrect results, though timing accurate. If both setup and hold relationships were positive, then this would never have happened. So now what ? Very simple actually, instead of taking an ideal clock, the simulator takes a delayed clock. Therefore all calculations are done wrt this delayed clock (in the above example clock is delayed -1 wr.t data), so the simulator will not latch the falling edge.

Question: What is the maximum distance of the I2C bus?

This depends on the load of the bus and the speed you run at. In typical applications, the length is a few meters (9-12ft). The maximum capacitive load has been specified (see also the electrical Spec's in the I2C FAQ). Another thing to be taken into account is the amount of noise picked up by long cabling. This noise can disturb the signal transmitted over the bus so badly that it becomes unreadable.

The length can be increased significantly by running at a lower clock frequency. One particular application - clocked at about 500Hz - had a bus length of about 100m (300ft). If you are careful in routing your PCB's and use proper cabling (twisted pair and/or shielded cable), you can also gain some length.

If you need to go far at high speed, you can use an active current source instead of a simple pull-up resistor. Philips has a standalone product for this purpose.Using a charge pump also reduces "ghost signals" caused by reflections at the end of the bus lines.

Question: I'd like to extend the I2C bus. Is there something like a repeater for I2C?

Yes indeed this exists. Philips manufactures a special chip to buffer the bi-directional lines of the I2C bus. Typically, this is a current amplifier. It forces current into the wiring (a couple of mA). That way you can overcome the capacitance of long wiring.

However, you will need this component on both sides of the line. The charge pump in this devices can deliver currents up to 30mA which is way too much for a normal I2C chip to handle. With these buffers you can handle loads up to 2nF. The charge amplifier 'transforms' this load down to a 200pF load which is still acceptable by I2C components.

Question: Are there stand-alone I2C controllers available?

Yes indeed. There is a special chip to do the I2C interfacing. The PCD8584 or PCF8584 incorporate a complete I2C interface. These chips are designed in such way that they can interface to almost any microcontroller around.

Question: Can I abort an ongoing I2C bus transmission?

Is it okay to abort an on-going transmission any time.

According to the specification, this should work. It depends on the layout of the component. A real I2C compatible IC will be able to handle this. It might make sense to test this before you use it.

Usually, when a START or STOP condition is detected, the internal logic of the chip is forced into a certain state. Internally, the logic that detects START and STOP is different from the logic that does all other processing. The START together with the address register is to be considered as a functional unit inside the chip.

When a START is detected, all internal operations are cancelled and the chip will compare the incoming data with its own address.

When a STOP is detected, ALL chips on the bus will reset their internal logic to IDLE mode except for the START detector (this is also used to cut power consumption). Therefore, when a start condition is issued on the bus, the START detector will 'wake-up' the rest of the internal logic.

Question: Do I need to generate an ACK in read mode on the last byte? My chip starts sending data and occupies the bus...

This is a somewhat puzzling question. Indeed this is a bit strange. Usually, if you have read the last byte in a chip and generate an ACK, the chip should do nothing anymore, so the bus should be clear for you to create a STOP condition. Apparently, there are some chips that start transmitting data again. One such chip is the PCF 8574 I/O expander.

Though not always desirable, this feature can come in handy. If you need to sample incoming data fast, then you just continue reading from the chip. This prevents that you lose 'arbitration' of the bus in a multi-master environment.

It also speeds things up. You don't have to address the chip over and over again so you save the time for START, Address, ACK and STOP stage for every next byte read.This can lead to a more than doubled transfer rate.

Question: Why does the SCL line have to be bi-directional?

The clock line needs to be bi-directional when using a MULTI-MASTER protocol and when using the synchronization protocol.

When you are using only one Master then this is not required since the clock will always be generated by this device. If you run Multi-master then this changes. One master must be able to receive data from another master. At that time it must be able to receive clock information via the clock line also.

Question: How can I monitor the I2C bus?

There are a few commercial I2C monitor / debuggers around that can do this. Information on these devices can be found here.

There is another possibility to do this: By using the stand-alone I2C controller PCF8584 from Philips. This chip has a certain mode in which it does not take part in the real I2C action but only records what is going on. It listens to all addresses, but does not generate any acknowledge.Using some software routines and a MCU you could build a universal I2C data logger.

Question: How can I test / debug the I2C bus?

There is no general way to debug an I2C bus. However, a few guidelines might help to get it running.

First thing is to check the levels on the bus. You should see a clear signal that has a low level that is lower then 0.8 volt and a high level which is at least 3.5 volts.

If the high level is not high enough or does not rise fast enough then you can try to lower the value of the pull up resistor. You must take care however not to surpass the maximum allowable current in the I2C driver stage. The minimum allowable resistor for a 5 volt driven I2C bus is 5 V / 3mA = 1600 Ohms. A typical value of 4700 ohm should work fine.

Make sure the bus is not 'stuck' to '0'. This could be the result of a bad power supply (chips go into latch up during power-on) or a bad chip.

There are a few commercial I2C monitor / debuggers around. Information on these devices can be found here.

Question: Which microcontrollers do have an on-chip I2C interface?

A LOT of MCU's have a real I2C interface implemented in hardware, but this should not restrict the use of the I2C bus on other MCU's. ANY MCU can be made to talk to I2C using some small software routines.

There are microcontrollers with on-chip I2C modules as well as stand-alone I2C bus peripherals.

It can be shown for xor that: x' = x xor 1 , so the inverter is O.K., we need to show that (x + y) or even (x.y) can be built from xor. If we can do that then we are done. I have not found a way to do it yet, so the answer can be Not possible.

proof with new approach.

The question "can a mux be built from XOR only" is the same as "can an arbitrary logic function be implemented with XOR only", given that mux is universal. So, we only need to show that some simple function cannot be implemented (as a counterexample). Let's try AND.

Suppose x.y could be implemented with XOR only. Then x.y would be the output of some XOR gate in the circuit. Let's call the inputs of this XOR gate f(x,y) and g(x,y). The truth tables for f and g will look like:

x y f g ------------ 0 0 a a 0 1 b b 1 0 c c 1 1 d d'

If you expand this truth table to the 16 possible cases, you will see that either f or g will be an AND or OR function in each case. OR is basically an AND with complemented inputs and output; so, effectively, an AND function is needed to synthesize AND. This will go on ad infinitum, so the task is impossible. So, a mux cannot be built with XOR gates only.

1.) You are adding 8 10-bit numbers. How many bits do you need for the result?
2.) You are adding two 2's-complement numbers together. One is 8 bits, the other is 4 bits. How do you handle the fact that the two numbers are not the same width?
3.) You have a free running clock. How would you design a divide by 3 circuit, duty cycle is not an issue.
4.) Describe the design of a FIFO circuit.
5.) Describe the design of a bus arbiter circuit.

Why "FOR LOOP" is not advisable to code in RTL eventhough it is synthesizable?

I agree with this explanation. The thing you MUST remember when coding any RTL for synthesis is that you are designing logic, not running software on a processor. Way to often I see individuals treating RTL like it is serially executed code running on a processor. This type of code usually results in very poor synthesis and timing results. When I am given code like this to work on, usually after someone has designed something poorly, it usually gets thrown in the trash and redesigned. So, even though a software like construct exists in RTL and it "may" be synthesizable and simulate the desired function, that does not mean one should use it.

When coding RTL, you should really think about what you are trying to design in hardware, not necessarily how to code a function using the software constructs of RTL. After you know what the desired hardware function is, then code the RTL to implement that function. What you should do to answer your question is ask yourself what hardware am I trying to create, and will this for loop create it. I recommend an excellent book that covers a lot of coding techniques for various hardware functions in VHDL and Verilog. It is "HDL Chip Design. A practical guide for designing, synthesizing and simulating ASICs and FPGAs using VHDL or Verilog" by Douglas J. Smith. I consider this book a must have for front end design. It has many examples of good and bad code.

Summary:though synthesizable but it is not too friendly in terms of resources as well as synthesis time. If it is a small loop it may be ok, but if it is a big one..or loop inside loop then syntheis tool first have to unroll the whole logic and do the implementation..which may not always provide good quality of result both in terms of timing + area. Thats why "for" loop is preferred only for Testbench and not for RTL..but at the same time it is used for selective cases in RTL too.

a) Generally speaking SETUP fixing is always DIFFICULT. This can be resolved by inserting buffers (as you mentioned) only in cases wherethe SETUP violation is because of large load/slew violations which causes huge delays in combinatorial blocks. say there is an AND gate which is driving much more loads than it should and you see A to Y delay for that 3ns. Now this load violation can be fixed by adding a buffer after the AND gate and you may see now the AND gate has aonly 1.5ns and BUFFER added 0.3ns. Thus you gain 1.8ns in data path. To see if load/slew violations are causing your SETUP failure see report_timing with -cap -tran (assuming you annoate set_load or SPEF file also while doing STA). But if load/slew is NOT the culprit..then it is indeed tough to fix SETUP and you may need to revisit the logic structure between the flops.

b) HOLD fixing is comparatively far easy. Simply by adding buffers in the data path. There are lots of automated scripts and even DC can do that with -fix_hold. This is generally done at the last stage after the CLK routing has been done.

c) I would say both are equally IMPORTANT and any one of them is sufficient enough to cause a RESPIN :-(

Finally MOST important thing to remember always is SETUP is frequency dependent..while HOLD is NOT!

Note: This is generally true if you use only +ve edge clocking. If you mix both +ve and -ve edges in your design, then hold time also has a frequency dependency.

Find the resource elements consumed during design stage that is before RTL coding.Is it necessarily needed to draw the low level gate elements to calculate the logic gate consumption..I don't think designers go for detailed low level diagrams to calculate the resource utilisation. Then how to approximately find the resources used in design?Ans: The main purpose of this exercise is generally to find the number of Flops (or the synchronous elements) in the logic. Even before RTL coding one can get some basic ideas about the flops by guessing state elements, number of registers for counters, retime blocks etc. Then based on the type of circuitry we try to guess how much combinatorial logic will be there between per flops (eg for pipeline design , it will beless) thats again some percentage of sequential elements. At this stage we dont try to include buffers/inverters for load or different fanouts. etc. They can be all part of the combinatorial logic.

At last an approximate number can be inferred giving the number of gates (depending upon technology library) for the RTL module.

Negative hold time is generally seen where a delay is already added in the data path inside the flop. This is usually done by the library vendor.

Assume the flop which foundry gives us as a library part that has ports named as CLK-port, Data-port. Now, in essence this is wrapper and should be treated as one. Inside this we have the actual flop whose ports are CLK-in, data-in. CLK-port is connected directly to CLK-in, Data-port goes through some delay element (either buffer or routing net) to Data-in. So even if the actual flop has hold requirement of say 0.2ns, if the data delay element value is 0.5ns, the library will give spec as -0.3ns HOLD requirement for the above flop. This signifies that even if the data changes 0.3ns before CLK, it can still be latched and as for the actual flop(inside the wrapper) it will still meet 0.2ns HOLD. (data changes after 0.2ns from clk change).

Advantage:The biggest advantage is less iteration after layout...Easy and less painful synthesis (else HOLD fixing can be an iterative process)