Description of problem:
General protection exception during input device removal.
Version-Release number of selected component (if applicable):
kernel-2.6.9-78.0.5.ELsmp
How reproducible:
surprise removal of USB root hub or USB input devices triggers this problem. It is infrequent and occurs perhaps once per several hundred device removals. This has only been seen on systems with 8 CPUs.
Steps to Reproduce:
1. Induce moderate (disk-IO) workload.
2. Perform suprise device removals.
3.
Actual results:
Kernel panic occurs
Expected results:
No panic
Additional info:
Two memory dumps from this problem are available. Analysis of the dumps will be attached. In summary, the problem seems to occur because there is no locking or reference counting to protect input_devices_read from referencing structures concurrently with their deallocation by unregistering input devices.

Created attachment 323846[details]
Crash analysis from USB device re-route
This is the analysis of a panic on 2008-11-12. The trigger for this panic was an AC switch. This operation moves the external USB devices from one root hub to another. Apparently the panic occurred during unregistration of the KB and mouse, before they were re-registered on the other root hub.

Created attachment 323848[details]
Crash analysis from USB Root hub removal
This is the analysis of a panic on 2008-11-16. The active IO subsystem was broken. As a result, the PCI devices in that chassis are removed. USB devices
are switched over to the control of the other IO chassis. Apparently the panic occurred due to un-registration of the KB and mouse, however, the memory image shows them re-registered on the surviving USB root hub (PCI device 0000:0b:1d.0).

I do not have a conclusion whether this is a regression.
Stratus hit bug 453507 early in this test cycle. To eliminate that, we have moved to the latest errata kernel for RHEL4.7 since that is what our customers would be running. Consequently we do not have enough test time on the kernel released with RHEL4.7 to determine whether this is a regression in the errata kernel.
Given that we have run similar tests (but on slower processors) with RHEL4.6 it seems this problem may have been introduced in RHEL4.7. But the problem may have already been in the RHEL4.6 code base and the faster processors may be necessary to open the window enough to get hit by a race condition.

I don't believe this is a regression; rather, it's a latent issue that only shows up when you (a) have a lot of CPUs and (b) are doing very fast surprise device removals while also reading /proc/bus/input/devices.
This bug is similar to the RHEL5 Bug 468915. Note that the input.c code is very different between the two kernels, though, so a different fix will be required for this one. The underlying issue remains the same: in both RHEL4 and RHEL5 kernels there is insufficient locking of the input device lists.
This bug is a bit more difficult to reproduce than Bug 468915, though. I have not been able to reproduce it in the Red Hat lab using the 4-CPU system at my disposal. Bob has been able to reproduce it within hours in the Stratus lab using a faster 8-CPU system.
I'm working on a patch.

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
release.

~~ Attention Partners! Snap 1 Released ~~
RHEL 4.8 Snapshot 1 has been released on partners.redhat.com. There should
be a fix present, which addresses this bug. NOTE: there is only a short time
left to test, please test and report back results on this bug
at your earliest convenience.
If you encounter any issues, please set the bug back to the ASSIGNED state and
describe the issues you encountered. If you have found a NEW bug, clone this
bug and describe the issues you encountered. Further questions can be
directed to your Red Hat Partner Manager.
If you have VERIFIED the bug fix. Please select your PartnerID from the
Verified field above. Please leave a comment with your test results details.
Include which arches tested, package version and any applicable logs.
- Red Hat QE Partner Management

~~ Attention! Snap 4 Released ~~
RHEL 4.8 Snapshot 4 has been released on partners.redhat.com. There should
be a fix present that addresses this bug. NOTE: there is only a short time
left to test, please test and report back results on this bug ASAP.
The latest kernel build can be obtained here:
http://people.redhat.com/vgoyal/rhel4/
If you encounter any issues, please set the bug back to the ASSIGNED state and
describe the issues you encountered. If you have found a NEW bug, clone this
bug and describe the issues you encountered. Further questions can be
directed to your Red Hat Partner Manager.
If you have VERIFIED the bug fix. Please select your PartnerID from the
Verified field above. Please leave a comment with your test results details.
Include which arches tested, package version and any applicable logs.

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.
http://rhn.redhat.com/errata/RHSA-2009-1024.html

Note

You need to
log in
before you can comment on or make changes to this bug.