Attached a diff for the kernel driver which should fix the problem. After you applied the diff to the VirtualBox kernel driver sources (which are located at /usr/src/vboxhost-4.3.26) please recompile the host kernel modules by

/etc/init.d/vboxdrv setup

and start your VM. Please make sure to run this on a Linux kernel with 'nosmap' and 'nosmep' removed.

Thanks. Unfortunately I don't understand why EFLAGS.AC is still not set. Could you repeat the experiment and attach all corresponding items from the same VM session:

The VBox.log file

The Linux kernel log

The vboxdrv.ko file if different than vboxdrv.ko.2.gz

This will help me to debug the problem because the VBox.log file contains the load addresses of the VMM modules. Unfortunately we cannot reproduce the problem as we still don't have Broadwell hardware.

I will attach it in a few seconds. Though I am not sure if this is identical to what I used... The logs were made with a package I compiled myself, I now have installed my distribution's packages. Both were built in a clean chroot.

eworm, that's important. I'm running VBo on a Linux 4.0.0 host and never saw such problems for many weeks now. It would be nice if you could provide at least a VBox.log file together with the output of 'dmesg' and the corresponding vboxdrv.ko as you provided before.

I've applied the latest patch in this thread and it has resolved the issue I was encountering with Virtualbox freezing the host. I have yet to encounter any further trouble but I will keep an eye out for the issue eworm mentions and will follow-up if I encounter it. Thanks!

Hi frank; having the same issue as eworm on my Thinkpad T450s, using the latest virtualbox 4.3.28, so can confirm the issue isn't fixed. I've uploaded my system.log, and am trying to find the other information such as my virtualbox log, will upload it as I find it.

fardog and eworm, your log files indicated that your VM processes still crash at the very same position. At the moment we cannot explain this. It's also interesting that only ArchLinux users seem to be affected, at least I'm not aware of users having 4.3.28 installed and having problems with SMAP. One developer installed ArchLinux on a Broadwell laptop and still was not able to reproduce the problem.

Could you install this 4.3 test build and try to reproduce the crash? In that case, please attach the VBox.log file, the output of 'dmesg' and the vboxdrv.ko module as you already did before. Thank you!

Latst time my system crashed with linux 4.0.2-1 and Virtualbox 4.3.26 (+ patches). Given the fact that it happens really seldom I can not tell whether or not latest versions are still effected. Configuration did not change since then, though.

I am not sure how to reliably test this... Even rebooting the guest twenty times and more in a row without issues does not indicate it is fixed. I will think about it...

Wondering what influence the guest setup has... Does it matter? I took a look at the last crash logs available and saw that the BUG follows:

kernel: device bridge entered promiscuous mode

Where bridge is a bridge interface with static IP and dhcp daemon. Anything else that could have an effect?

Hi frank; I won't be able to test that build until later tonight or tomorrow, but will give it a go. For the time being, I've uploaded my kernel config (for version linux 4.0.2-1, I haven't upgraded to the latest 4.0.3-1 yet, although it looks like 4.0.4-1 is eminent in Arch's repos). This is the version that the crash logs above were from.

Please note: this config.gz was from the running system, which has nosmap set as a boot parameter since that's how I can get virtualbox to run (I depend on it heavily for work); I'm not sure if that shows up in the config file, but I didn't want it to confuse you. The crash logs above are from a different boot, when I was NOT running the nosmap flag.

Is this related? Possibly we have to disable automatic NUMA page balancing by setting pTask->mm->numa_next_scan (src/VBox/Runtime/r0drv/linux/memobj-r0drv-linux.c, line 1551) for every CPU?

No, completely unrelated. Look at your kernel crash dump:

CR4: 00000000003427e0, so bit 20 and 21 are set. That means that SMAP is activated.

BUG: unable to handle kernel paging request at 00007f8460fcd000. That means that the kernel is accessing memory which is mapped into userland. This is considered being hacky but for historical reasons, VirtualBox still works this way. For example, on 32-bit hosts it would be not possible to map the complete guest address space into the 1G kernel address space.

EFLAGS: 00010202. That means that bit 18 of EFlags (AC) is clear. But with VBox 4.3.28 this bit is supposed to be set on SMAP-enabled hosts.

That means that the AC flag is somewhere cleared in the kernel code and currently we don't know where. We even installed ArchLinux on a SMAP-enabled laptop, unfortunately no success...

Digging though kernel code I found a place where clac() is called, but there is no stac() before. Possibly that is the place where things go wrong?

No :-)

It works like this: stac() is for setting the AC flag. If the AC flag is set in R0 then the SMAP check (if R0=kernel is allowed to R3=userland) is disabled. clac() clears the AC flag and therefore enables the SMAP check. The latter is default in recent Linux on Broadwell CPUs.

The place you found is just the last part of an error handler. The code for copying data from user to kernel obviously needs to have the AC flag set to temporarily disable the SMAP check. That's done for instance in copy_user_generic_string (see copy_user_64.S). The copy_user_handle_tail() function is called if there was a normal page fault while accessing the provided user data from the kernel.

I've hit this bug regularly (1/4 virtual machine boots avg) since this report was filed. Also followed duplicate/similar reports regarding Broadwell, but this report seems to have the most relevant info.

I just crashed 3/3 times, and each requires a hard power-off of the host. This is a data-loss-potential bug. I'm surprised to see it unresolved.