Commit Message

Hi,
while testing how acpiphp behaves when an eSATA express card is repeatedly
inserted and removed from a slot, I realized few new things. The
while true; do echo -n "$(date +%T) SlotStatus "; setpci -s1c.7 0x5a.w; sleep 1; done
reveals that when I cold boot with a card in a slot, it reports 0040. When I boot with
an empty slot, we start with 0000 instead.
The first insertion of the card moves us to 0140.
If the mechanics works well, value stays at 0140 and upon sucessfull eject it falls down to 0100.
If the card slips out (like in this case) the value reports 0140 while the
slot is actually empty. Then the next insert in unnoticed, value is still 0140, and following
eject (even eject) rescues us and the slot reports 0100 properly.
The above is a "general" behavior I would say. I haven't seen the 0040 status before
but could be I did not test coldplug with pciehp yet while using the setpci shell
command.
A side note:
To prepare for even dimmer future, let me tell you that I already saw more values!
During eject there is also 0148 possible and that falls down to 0108 upon eject.
Funny was that I left the eSATA card in the slot while starring at the new numbers
and meanwhile they started due to some background activity changing back and forth
(I am not showing it here). Probably because of the sata_sil24 being half dead.
I am placing a file http://195.113.57.32/~mmokrejs/tmp/3.7.10_acpiphp_pcie_aspm_off.tar.bz2 (200kB)
which contains lspci outputs every second and lsusb, dmesg:
while true; do lspci -vvvxxx > "lspci_loop.$(date +%T)"; sleep 1; done
It seems lspci does not always report all changes in sufficient detail unless
I use -vvvxxx. Below is tracking by every second when I inserted a card and
it slipped out, followed by another (successful) insertion into the slot.
I am showing some of the diffs here but open along this email a file slot_status.txt
from the tar.bz2 above. Definitely, below you can see PresDet is changed on the SltSta
Status: line only while NOT on the Changed: line. From other threads I opened I conclude
this is a common bug to pciehp and acpiphp, or is a lspci bug, or a hardware/bios bug.
I speculate the "SltSta: Changed:" line of lspci is not updated properly, or at all,
not only the PresDet in it.
I am curious whether the below hex diffs could be transformed into some readable
line in default, "lspci -vv" output.
# diff -u -w lspci_loop.20:17:32 lspci_loop.20:17:33
Another insert ...
# diff -u -w lspci_loop.20:18:49 lspci_loop.20:18:52
etc.
Now, maybe somebody have the guts to look into the held locks in hung tasks
as shown in dmesg_after_alt_sysrq_w.txt. ;-)
[ 361.740663] INFO: task kworker/u:4:1615 blocked for more than 120 seconds.
[ 361.740670] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 361.740675] kworker/u:4 D ffffffff81ca46c0 0 1615 2 0x00000000
[ 361.740688] ffff88040b3b9c88 0000000000000046 0000000000000096 ffff88040d2f5d00
[ 361.740698] ffffffff815e48c1 ffff88040b8a6c80 ffff88040b3b9fd8 ffff88040b3b9fd8
[ 361.740707] 0000000000011700 ffff88040b8a6c80 0000000000000292 ffff880405d08000
[ 361.740718] Call Trace:
[ 361.740738] [<ffffffff815e48c1>] ? _raw_spin_unlock_irqrestore+0x3a/0x41
[ 361.740741] [<ffffffff815e3b27>] schedule+0x5f/0x61
[ 361.740745] [<ffffffff8134a71e>] ata_port_wait_eh+0x6d/0xcb
[ 361.740750] [<ffffffff81090712>] ? __init_waitqueue_head+0x4c/0x4c
[ 361.740752] [<ffffffff813463ae>] ata_port_probe+0x25/0x32
[ 361.740754] [<ffffffff813463eb>] async_port_probe+0x30/0x4d
[ 361.740758] [<ffffffff81095a38>] async_run_entry_fn+0xc0/0x1a4
[ 361.740760] [<ffffffff8108b7da>] process_one_work+0x1ae/0x2d2
[ 361.740762] [<ffffffff8108b770>] ? process_one_work+0x144/0x2d2
[ 361.740771] [<ffffffff81095978>] ? async_schedule+0x12/0x12
[ 361.740773] [<ffffffff8108c4d2>] worker_thread+0x13e/0x1d5
[ 361.740774] [<ffffffff8108c394>] ? manage_workers+0x23d/0x23d
[ 361.740776] [<ffffffff8108ff0a>] kthread+0xac/0xb4
[ 361.740778] [<ffffffff8108fe5e>] ? __init_kthread_worker+0x54/0x54
[ 361.740781] [<ffffffff815e542c>] ret_from_fork+0x7c/0xb0
[ 361.740783] [<ffffffff8108fe5e>] ? __init_kthread_worker+0x54/0x54
[ 361.740785] 2 locks held by kworker/u:4/1615:
[ 361.740785] #0: (events_unbound){.+.+.+}, at: [<ffffffff8108b770>] process_one_work+0x144/0x2d2
[ 361.740790] #1: ((&entry->work)){+.+.+.}, at: [<ffffffff8108b770>] process_one_work+0x144/0x2d2
Sure, acpiphp (but also pciehp) should prevent any drivers to pickup the newly formed
device and force 6sec delay, because the cards sometimes slip out of the slot. It is
meaningless to introduce the delay in every PCI driver, that should be handled by those
handling express slot hotplug. Until then, we will see in logs things like this.
The eSATA used to OOPs my kernel hardly in 3.4 and the Firewire still kills the kernel
with 3.7 and 3.9-rc1 (stacks not shown, were posted already in the past). Compared with
that, the kworker hung task is an improvement as it does not kill my running kernel
and my work. ;-)
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html