One more clock delay on next silicon inputs

Wendy at ON Semi noticed there's a potential metastability problem with the way we are registering pin inputs. We added another set of flops for the incoming pin signals. This means that there is now one more clock delay on all inputs from pins. I will make new FPGA files soon so we can confirm this doesn't cause any problems with existing code, especially what will go into ROM.

Comments

Thanks Chip. So we will see pins from another older clock ie prior clock.
IIRC currently we see 1 clock prior to the start of a testp and 2 clocks prior for ina/inb, so that will now be 2 & 3 respectively.
IIRC outputs are 3 clocks after the instruction completion. This will remain the same?

For the current ES silicon (v32i), presuming the delay from DRVx to the pin is +3 clocks after the end of the DRVx instruction, a minimum delay of 5 clocks (waitx #3) is required for the TESTP x instruction to see the output value on the pin. ie the TESTP instruction latches the pin value 1 clock prior to the start of the TESTP instruction.

On the respin silicon, this is expected to be waitx #4 or 2 clocks prior to the start of the TESTP instruction. INx instructions should take an extra clock prior.

Here is the test code for the P2D2. Change the clock values in the CON section as marked for P2-EVAL.

I've just run that on FPGA with v33j image at 80 MHz. I had to add a couple more values for "w". Here's a terminal output of values from 2 through 5. (Note: The 3 inputs that are always set are the push buttons on the P123 board.)

24 MHz - w=3 (w=2 fails)
200MHz - w=4 (w=3 varies per pin so is on the edge)
300MHz - w=5 (w=4 varies per pin so is on the edge)

This is the essence of the test

drvl pin
waitx #8
drvh pin
waitx w
testp pin wc
rcl x,#1

Hmm.. That test seems to be saying the async path delays exceed higher SysCLKs ( which the added delay is not going to fix, but I think that was done for other reasons).
It also means
* ROM code may not work at higher sysclks (some were wanting to call-into ROM routines ?)
* Deterministic pin-pin paths may not be possible above some sysclk, and may require sysclk bands (with PVT edges) above that sysclk.

If you enable clocking in the digital pin mode, things should firm up. In your current test, you are stacking up a bunch of delays, whereas if clocking were to be enabled, the 3.3V I/O pin would register the input and output, hiding the internal propagation delays for which setup and hold time requirements ARE covered for clocked mode. If I could redesign the I/O pad, I would make it always synchronous (clocked).

If you enable clocking in the digital pin mode, things should firm up. In your current test, you are stacking up a bunch of delays, whereas if clocking were to be enabled, the 3.3V I/O pin would register the input and output, hiding the internal propagation delays for which setup and hold time requirements ARE covered for clocked mode. If I could redesign the I/O pad, I would make it always synchronous (clocked).

Chip,
My ROM SD code is working nicely at 24MHz which is the target speed so I think I will just leave that part of the code as-is. Do you agree?

If you enable clocking in the digital pin mode, things should firm up. In your current test, you are stacking up a bunch of delays, whereas if clocking were to be enabled, the 3.3V I/O pin would register the input and output, hiding the internal propagation delays for which setup and hold time requirements ARE covered for clocked mode. If I could redesign the I/O pad, I would make it always synchronous (clocked).

Chip,
My ROM SD code is working nicely at 24MHz which is the target speed so I think I will just leave that part of the code as-is. Do you agree?

If you enable clocking in the digital pin mode, things should firm up.

Chip,
That has never made one bit of difference to consistency right from first discovery! This is why I've always been raising concern.

So, enabling clocking does increase latency by one clock, for both input and output, right?

But you are seeing additional delay times that are longer than a clock at very high frequencies. Those delays must be from the circuitry in the 3.3V I/O pad, then, which is not going to become different from what it currently is. If I had known the chip was going to actually work at 300MHz, I would have designed the I/O pad to work faster.

At this point, the I/O pad is what it is.

Wendy at ON is implementing some extra timing constraints which will group the delivery of all core-to-pin signals within 300ps, aside from being within setup-time and hold-time requirements. The pin-to-core IN signals will be constrained similarly. This will regulate asynchronous pin I/O, so that all pins will behave as identically as is reasonably possible.

So, enabling clocking does increase latency by one clock, for both input and output, right?

That is my understanding, yes. +1 clock for each.

But you are seeing additional delay times that are longer than a clock at very high frequencies. Those delays must be from the circuitry in the 3.3V I/O pad, then, which is not going to become different from what it currently is. If I had known the chip was going to actually work at 300MHz, I would have designed the I/O pad to work faster.

I run thru this with w varying from #2 to #5 and outputting either a 0 or 1. I test both low and high initially.
At different speeds w needs to be different, and at some frequencies, there is a variable w where the result can be either so we are on the edge with that value.

And Cluso has just proven the problem is not only no better in the real silicon but the overall effect is worse because the finished product has so much over-clock-ability.

I don't know what to do about it. An I/O pin is big and slow compared to internal signals. It's like the difference between a hydraulic backhoe and your hands on the controls. Just to implement ESD protection on the I/O pad, you incurr 1000x the capacitive loading of the core-side signals.