Spent some time looking at this and AFAICT, while there has been some refactoring, the suspend/resume paths in the i8042.c, atkbd.c, and serio.c drivers have not changed in behavior. This is difficult to reproduce with the given test of typing 'ls' "really quickly" but I've been able to reproduce via the method in #9777 by running pointing browse to a long webpage and then waking up via the up or down arrow key. As per #9777, the scroll browser will just continously scroll until I press another key. This is not always reproduce-able either but it is more consistent then the 'ls' method as it only requires one rapid keypress.

The only clue I've noticed so far is that when we see this issue, the keyboard device is redetected,the old device goes away, and a new input device is created:

[ 142.102501] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input7

One difference between XO-1 and XO-1.5 I noticed from the 802 build is that we're using the uinput driver as part of olpc-kbdshim and I'm wondering about is the possibility of something in that code path causing the reconnect/reset.

new info: there is a very high correlation between enabling console printk messages and the problem going away. with no console dmesg output, i can reproduce it almost every time using the 'type a character at a VT screen' method. with full console printk, i almost can't make it happen.

In both cases the break code is sent at the same point in the stream. One can argue that sending up the break code in the middle of the commands that Linux is sending is marginal but nothing seems invalid about the above streams.

None of the commands that Linux sends above are necessary. The keyboard is already fully operational at the time the wakeup key is pressed.

i don't understand the ifdef'ed logic in i8042_suspend()/i8042_resume().

it looks to me like, if we're a recent OLPC board, that we'll skip the calls to i8042_controller_reset() and i8042_controller_check(), but we'll still run almost all of i8042_resume(), from i8042_controller_selftest() onward. perhaps some of this is correct, but it seems like we shouldn't need to reset the controller.

Just to be explicit, I believe that the code that needs to be skipped per above absolutely will cause the observed false-repeat symptom, depending on an uncontrollable race condition between the OS issuing the commands and the keyboard sending the break event from the other direction.

The i8042_command() routine does not attempt to distinguish between valid command response bytes and interspersed events from the keyboard. It is in fact possible at the protocol level to make the distinction (the value sets for the two cases are disjoint), but would add some complexity to the command response parsing logic (it would have to do a range check and redirect data events to a queue). According to my reading of i8042_command(), it does not make the distinction, instead just counting N incoming bytes and stuffing them in the param[] array. That being the case, i8042_command() *will* eat keyboard events if they happen during a window of vulnerability.

1) The underscores above are an artifact of wiki formatting meeting identifier names prefixed with double underscore, not an attempt at emphasis

2) Even if one were to enhance the command routine to distinguish data from command responses, skipping the commands for resume on OLPC would still be the right thing, as they are redundant and time consuming in our case, since the EC and its internal 8042 remain powered and retain state.

05-12-2009 18:28:24 [Mitch_Bradley] pgf: the d4 commands are mouse
parameters
05-12-2009 18:29:58 [smithbone] yeah. write next command to aux device
05-12-2009 18:37:01 [Mitch_Bradley] ... Did you see my
annotations?
05-12-2009 18:37:21 [Mitch_Bradley] starting at line 105
05-12-2009 18:37:56 [Mitch_Bradley] actually line 100 is the smoking gun
05-12-2009 18:38:31 [Mitch_Bradley] but all those keyboard commands from
89 .. 124 really shouldn't be happening for us
05-12-2009 18:38:51 > pgf: ah. i hadn't seen them. looking now.
05-12-2009 18:39:11 [Mitch_Bradley] also we should do the same thing
for the mouse - suppress the re-setup
05-12-2009 18:53:38 > pgf: currently i'm suspicious of serio_resume() forcing a port reconnect.
05-12-2009 18:53:52 > pgf: i wish i had a better picture of the layers
involved, and who calls who.
05-12-2009 19:05:07 [Mitch_Bradley] pgf: I think
keyboard/atkbd.c/atkbd_reconnect() is the current culprit
05-12-2009 19:32:35 > pgf: Mitch_Bradley: likely. i'm wondering why we
don't see this on XO-1.
05-12-2009 21:24:35 [Mitch_Bradley] pgf: possibly doesn't happen on XO-1
because of small timing differences that cause the break event not
to coincide with the command stuff
05-12-2009 21:29:21 [smithbone] pgf: because on XO resume is slower?

but anyway, I've taken Paul's patch and also tweaked the serio code (which was *always* reconnecting stuff during resume). Now, according to those i8042 dbg() messages that you guys have been looking at, there is no I/O along those paths during suspend, and during resume there is only: