Updating to the new kernel caused kernel panics related to DMAR. Disabling all virtualization options in the bios allowed the kernel to boot. The latest 3.x kernels have all been quite unstable for me, but disabling the virtualization options seems to improve stability as well. There is talk about kernel options such as intel_iommu=off helping with this issue, but I haven't tried these options.

intel_iommu= [DMAR] Intel IOMMU driver (DMAR) option
on
Enable intel iommu driver.
off
Disable intel iommu driver.
igfx_off [Default Off]
By default, gfx is mapped as normal device. If a gfx
device has a dedicated DMAR unit, the DMAR unit is
bypassed by not enabling DMAR with this option. In
this case, gfx device will use physical address for
DMA.
forcedac [x86_64]
With this option iommu will not optimize to look
for io virtual address below 32-bit forcing dual
address cycle on pci bus for cards supporting greater
than 32-bit addressing. The default is to look
for translation below 32-bit and if not available
then look in the higher range.
strict [Default Off]
With this option on every unmap_single operation will
result in a hardware IOTLB flush operation as opposed
to batching them for performance.
sp_off [Default Off]
By default, super page will be supported if Intel IOMMU
has the capability. With this option, super page will
not be supported

Try the kernel parameters (off in particular), I don't have VT-d capable hardware for testing and can't debug it myself; therefore the panic message itself would be quite interesting as well.

Pause or scroll lock might work, using a digital camera could also help.

That lkml message, well at least the referenced RedHat bugzilla, points quite a bit towards hardware (BIOS-) bugs (interestingly all reports seem to affect i5-520m CPUs as well) - are you running the newest BIOS version for your notebook?

Unless today's lkml post leads to new discoveries or patches, the only fix would be to disable VT-d support (DMAR) completely, the same you could already do now by supplying intel_iommu=off to your kernel. However VT-d is something not to give up easily, as it provides quite some functionality for capable hardware. Interestingly DMAR was already enabled on our i386 kernels for quite some time, without any according negative reports, therefore I'm currently soliciting internal feedback for i7-860, i7-920 and i7-2720QM , which should support VT-d as well. From the larger mainstream distros, at least RedHat/ Fedora appears to default to DMAR as well (at least since Fedora 13, iow for more than a year by now).

I have been running this system with Intel VT-d disabled. So tonight I enabled it and updated to the 3.1 kernel. I didn't get a panic, but it wasn't pretty, either. In the first shot, it hangs at this point for about 20 seconds (sorry for the bad focus):

then it goes on like this, and finishes booting:

I rebooted 3 times -- it is very consistent. Here is the relevant section of dmesg:

Leaving the VT-d enabled in BIOS, neither 3.1-0.slh.1-aptosid-amd64 nor 3.1-0.slh.2-aptosid-amd64 can boot with "iommu=off". I never saw "kernel panic" but I saw "IOMMU_MAP..." and "ATA_13..." errors flying by for 20 - 30 seconds and no progress on the boot.

So, 3.1-0.slh.2-aptosid-amd64 with plain vanilla boot codes is working well with VT-d enabled, on this hardware.

Bod

Post subject:Posted: 25.10.2011, 16:37

Joined: 2011-04-18
Posts: 11

Status: Offline

With booting me writing this error:

Waiting for /dev to be fully populated...[ 5.685530] cfg80211: failed to add phy80211 symlink to netdev!

Waiting for /dev to be fully populated...[ 5.685530] cfg80211: failed to add phy80211 symlink to netdev!

Kernel 3.1-0.slh.1-aptosid-686 and Kernel 3.1-0.slh.2-aptosid-686.

My workstation - Laptop HP Compaq nw8240

Please keep different issues separate (new topic) and go a bit more into depth (what are practical effects/ breakage, full dmesg, a little more about your hardware/ wlan card, etc. pp.), it's easier to follow the debugging it that way

Bod

Post subject:Posted: 25.10.2011, 17:09

Joined: 2011-04-18
Posts: 11

Status: Offline

Quote:

Please keep different issues separate (new topic) and go a bit more into depth (what are practical effects/ breakage, full dmesg, a little more about your hardware/ wlan card, etc. pp.), it's easier to follow the debugging it that way

Thank you. I thought it was a problem the new kernel. Since the old (3.0) this problem did not exist.

I want to be clear that this bug report is very specific to sandy bridge processors with vt-d enabled. Please post details on the processor and whether virtualization is enable and especially if "VT for direct IO" is enabled. This is the specific setting that causes issues on my computer.

slh,

I've updated the bios - this has not changed the error.
pause or scroll lock didn't help, but the digital camera did.

Here is the output that is repeated constantly - I've tried leaving the computer to see if it would boot if left for hours - it doesn't boot.