I have checked through the forums and my kernel is properly configured. I hear that it has something to do with the order in which EHCI and something else are loaded up. Has there been a satisfactory solution to this yet, and, if so, what is it?

Alex

Last edited by evoweiss on Wed Jul 17, 2013 5:38 pm; edited 4 times in total

I had similiar problems trying to mount the memory stick of my cellphone

Code:

update-usbids

helped me out

Hi, the problem went away and returned, though using update-usbids did not help either this time or last. It's really frustrating and I'm just wondering what the cause is and how to fix the damn thing. The USB drive is fairly new, too.

Again, after waiting for a pretty good while, the problem seemed to resolve itself. I'm wondering if this has something to do with the drive going into suspend mode before being umounted or, again, some udev configuration problem. Any help/tips would be appreciated.

Does anybody have a clue as to what may be going on? The fact that it was working for a while and over reboots and that it worked as a USB 1.0 device would suggest this is a configuration problem of some sort. Any help would be appreciated.

This problem is usually caused when ehci binds to the device instead of ohci. If you never need ehci and ehci-hcd is a module, not compiled-in, then you can just blacklist ehci-hcd and the problem will go away.

If you need ehci for other devices or if it is compiled-in then you need to play around in /sys. First do an "ls -F /sys/bus/pci/drivers/*_hcd/". On my system, I get:

The devices are the symlinks (with trailing @ signs in the ls listing). The trick is to unbind the slow usb device from ehci and bind it to ohci (or, if needed uhci). For example if I wanted to unbind 0000:00:12.2 from echi and rebind it to ohci, I would, as root, run:

It's the high speed mode I need. The USB drive runs very slowly on ohci, so ehci is preferred. Fortunately, I discovered that the problem resolved itself when upgrading to kernel 3.9.6 (I read about this somewhere). It's odd and I apologize for taking your time as I should have marked this as solved. However, your advice is helpful in case the problem returns.

Best,

Alex

BitJam wrote:

This problem is usually caused when ehci binds to the device instead of ohci. If you never need ehci and ehci-hcd is a module, not compiled-in, then you can just blacklist ehci-hcd and the problem will go away.

If you need ehci for other devices or if it is compiled-in then you need to play around in /sys. First do an "ls -F /sys/bus/pci/drivers/*_hcd/". On my system, I get:

The devices are the symlinks (with trailing @ signs in the ls listing). The trick is to unbind the slow usb device from ehci and bind it to ohci (or, if needed uhci). For example if I wanted to unbind 0000:00:12.2 from echi and rebind it to ohci, I would, as root, run:

That is strange. If echi-hcd is compiled as a module, then make sure that module gets loaded. Did the lspci output change after the power outage? This site has the lspci output of many computers. If your make/model is listed, you could compare what you get with what others get.

If power outages trigger the problem then what fixes it?

I'm now wondering if maybe it is a hardware problem. A variable lspci outout would indicate that. Another possibility is there is file system corruption but why that would always target echi is a mystery.

That is strange. If echi-hcd is compiled as a module, then make sure that module gets loaded.

EHCI as well as the other USB related stuff, is compiled in the kernel.

Code:

# USB HID support
#
CONFIG_USB_HID=y
# CONFIG_HID_PID is not set
# CONFIG_USB_HIDDEV is not set
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB_ARCH_HAS_XHCI=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_COMMON=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set
# CONFIG_USB_ANNOUNCE_NEW_DEVICES is not set

#
# Miscellaneous USB options
#
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_SUSPEND is not set
# CONFIG_USB_DWC3 is not set
CONFIG_USB_MON=y
# CONFIG_USB_WUSB_CBAF is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
# CONFIG_USB_XHCI_HCD is not set
CONFIG_USB_EHCI_HCD=y
# CONFIG_USB_EHCI_ROOT_HUB_TT is not set
# CONFIG_USB_EHCI_TT_NEWSCHED is not set
CONFIG_USB_EHCI_PCI=y
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_ISP1760_HCD is not set
# CONFIG_USB_ISP1362_HCD is not set
CONFIG_USB_OHCI_HCD=y
# CONFIG_USB_OHCI_HCD_SSB is not set
# CONFIG_USB_OHCI_HCD_PLATFORM is not set
# CONFIG_USB_EHCI_HCD_PLATFORM is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_HCD_SSB is not set
# CONFIG_USB_CHIPIDEA is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

Quote:

Did the lspci output change after the power outage? This site has the lspci output of many computers. If your make/model is listed, you could compare what you get with what others get.

No change at all.

Code:

If power outages trigger the problem then what fixes it?

I have yet to figure that out. Last time I think it was compiling a fresh (new) kernel. However, that may have just been a coincidence.

Quote:

I'm now wondering if maybe it is a hardware problem. A variable lspci outout would indicate that. Another possibility is there is file system corruption but why that would always target echi is a mystery.

I think both are unlikely but have no way to check. The thing is, there used to be no problem. Moreover, if I disable USB 2.0 support, it works, though I have a very slow usb drive.

I have tracked down where the problem was but I don't know exactly what it was. Anyway, an earlier kernel of mine (3.8.13) did not have the problem. I copied over the configuration file, did a make oldconfig, recompiled the kernel, etc. and all is well again.

This is the third really strange problem I've seen this week. It feels like reality is leaking. One of the really strange problems also involved this ehci stuff. A usb stick wasn't being recognized but the device would reliable show up after doing and "ls /sys/bus/pci/drivers/*_hcd". yes, just the ls command made the device show up. I hit the other one when I was trying to debug the first one.

This is the third really strange problem I've seen this week. It feels like reality is leaking. One of the really strange problems also involved this ehci stuff. A usb stick wasn't being recognized but the device would reliable show up after doing and "ls /sys/bus/pci/drivers/*_hcd". yes, just the ls command made the device show up. I hit the other one when I was trying to debug the first one.

Strange these things. In any event, should I have any updates for you, I will let you know.

I just rebooted and it happened again. I then rebooted once more and no problem.

Also, it seems to work consistently off one of my older kernels (3.8.13). Here's the output from a diff of the two kernel config files.

Code:

3c3
< # Linux/i386 3.8.13-gentoo Kernel Configuration
---
> # Linux/x86 3.9.6-gentoo Kernel Configuration
41d40
< CONFIG_HAVE_IRQ_WORK=y
48d46
< CONFIG_EXPERIMENTAL=y
111a110
> CONFIG_RCU_STALL_COMMON=y
194a194
> # CONFIG_HAVE_64BIT_ALIGNED_ACCESS is not set
195a196
> CONFIG_ARCH_USE_BUILTIN_BSWAP=y
199a201
> CONFIG_HAVE_KPROBES_ON_FTRACE=y
223d224
< CONFIG_GENERIC_SIGALTSTACK=y
224a226,227
> CONFIG_OLD_SIGSUSPEND3=y
> CONFIG_OLD_SIGACTION=y
280a284,285
> # CONFIG_X86_GOLDFISH is not set
> # CONFIG_X86_INTEL_LPSS is not set
368a374
> # CONFIG_HAVE_BOOTMEM_INFO_NODE is not set
567d572
< # CONFIG_WAN_ROUTER is not set
573a579
> # CONFIG_VSOCKETS is not set
611a618
> CONFIG_FW_LOADER_USER_HELPER=y
652a660
> # CONFIG_BLK_DEV_RSXX is not set
662a671
> # CONFIG_ATMEL_SSC is not set
681a691
> # CONFIG_VMWARE_VMCI is not set
1032a1043
> CONFIG_MOUSE_PS2_CYPRESS=y
1065a1077
> CONFIG_TTY=y
1088a1101
> CONFIG_SERIAL_8250_DEPRECATED_OPTIONS=y
1095a1109
> # CONFIG_SERIAL_8250_DW is not set
1109a1124
> # CONFIG_SERIAL_RP2 is not set
1154a1170
> CONFIG_GPIO_DEVRES=y
1164a1181
> # CONFIG_BATTERY_GOLDFISH is not set
1172,1174c1189,1193
< # CONFIG_FAIR_SHARE is not set
< CONFIG_STEP_WISE=y
< # CONFIG_USER_SPACE is not set
---
> # CONFIG_THERMAL_GOV_FAIR_SHARE is not set
> CONFIG_THERMAL_GOV_STEP_WISE=y
> # CONFIG_THERMAL_GOV_USER_SPACE is not set
> # CONFIG_THERMAL_EMULATION is not set
> # CONFIG_INTEL_POWERCLAMP is not set
1234d1252
< # CONFIG_STUB_POULSBO is not set
1401a1420
> # CONFIG_HID_STEELSERIES is not set
1432,1433c1451,1454
< # CONFIG_USB_SUSPEND is not set
< CONFIG_USB_MON=y
---
> CONFIG_USB_SUSPEND=y
> # CONFIG_USB_OTG is not set
> # CONFIG_USB_DWC3 is not set
> # CONFIG_USB_MON is not set
1441c1462,1465
< # CONFIG_USB_EHCI_HCD is not set
---
> CONFIG_USB_EHCI_HCD=y
> # CONFIG_USB_EHCI_ROOT_HUB_TT is not set
> # CONFIG_USB_EHCI_TT_NEWSCHED is not set
> CONFIG_USB_EHCI_PCI=y
1446,1451c1470,1471
< CONFIG_USB_OHCI_HCD=y
< # CONFIG_USB_OHCI_HCD_SSB is not set
< # CONFIG_USB_OHCI_HCD_PLATFORM is not set
< # CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set
< # CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set
< CONFIG_USB_OHCI_LITTLE_ENDIAN=y
---
> # CONFIG_USB_OHCI_HCD is not set
> # CONFIG_USB_EHCI_HCD_PLATFORM is not set
1515a1536
> # CONFIG_USB_SISUSBVGA is not set
1526a1548,1549
> # CONFIG_OMAP_USB3 is not set
> # CONFIG_OMAP_CONTROL_USB is not set
1565a1589
> # CONFIG_MAILBOX is not set
1569c1593
< # Remoteproc drivers (EXPERIMENTAL)
---
> # Remoteproc drivers
1574c1598
< # Rpmsg drivers (EXPERIMENTAL)
---
> # Rpmsg drivers
1749d1772
< # CONFIG_SPARSE_RCU_POINTER is not set
1753a1777,1781
>
> #
> # RCU Debugging
> #
> # CONFIG_SPARSE_RCU_POINTER is not set
1761a1790
> CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
1858a1888,1889
> # CONFIG_CRYPTO_CRC32 is not set
> # CONFIG_CRYPTO_CRC32_PCLMUL is not set
1925d1955
< CONFIG_PERCPU_RWSEM=y

Just another update, though I don't know what good it will do. I had to shut the computer down for a bit and then back on. The problem resumed. However, after rebooting into the 3.8.13 kernel, and then rebooting back into 3.9.6, the problem went away again.

It seems like there is something wrong with the kernel configuration or otherwise, but I cannot suss it out. Any idea on what the best way to move forward and figure this out would be?

I suggest you look at the dmesg output and try to see difference in it between when it is broken and when it is working. You should probably focus your attention around lines that contain "hci_hcd".

So the problem starts, usually after a power outage and then lasts through reboots. But if you rebuild the kernel then it goes away. Is that right?

It sort of sounds like a filesystem problem. Some file or files get corrupted when the system goes down and then they get repaired when you rebuild. The part that doesn't make sense is: why would it be the same file getting corrupted in the same place every time?

Another approach is to install tripwire (or something like it). Tripwire will keep a database of checksums of files. It was designed as an intrusion detection system so you would be alerted if a file got changed (presumably by an intruder). Have it keep track of your kernel and all of your modules. If the problem occurs and none of those have changed then you can probably rule out file system corruption. There still might be corruption but if the kernel and the modules don't change then I don't see how reinstalling the kernel would fix it.

You could also install smartmontools (if you haven't already) and see check the health of your hard drive. The output is rather cryptic but I'm sure there are instructions or tools somewhere for deciphering it.

I suggest you look at the dmesg output and try to see difference in it between when it is broken and when it is working. You should probably focus your attention around lines that contain "hci_hcd".

I will definitely try that next time around.

Quote:

So the problem starts, usually after a power outage and then lasts through reboots. But if you rebuild the kernel then it goes away. Is that right?

Apparently not (see recent emails). I have one kernel, 3.8.13 that seems to have no problem at all from what I can tell. Maybe it's a fluke as I haven't systematically checked, but that's the nature of the beast (hard to predict when things will foul up). If, after booting up with the error messages, I reboot into that kernel, all seems to go fine. Then the next time I boot into 3.9.6 all is well, too.

Quote:

It sort of sounds like a filesystem problem. Some file or files get corrupted when the system goes down and then they get repaired when you rebuild. The part that doesn't make sense is: why would it be the same file getting corrupted in the same place every time?

No idea...

Quote:

Another approach is to install tripwire (or something like it). Tripwire will keep a database of checksums of files. It was designed as an intrusion detection system so you would be alerted if a file got changed (presumably by an intruder). Have it keep track of your kernel and all of your modules. If the problem occurs and none of those have changed then you can probably rule out file system corruption. There still might be corruption but if the kernel and the modules don't change then I don't see how reinstalling the kernel would fix it.

I'll try this if dmesg doesn't yield any clues.

Quote:

You could also install smartmontools (if you haven't already) and see check the health of your hard drive. The output is rather cryptic but I'm sure there are instructions or tools somewhere for deciphering it.

I'll give that a go, too, if I don't get anything with dmesg. Watch this space.