Proxmox Subscriber

Yes, use Kernel 3.16 from Debian backports. We're running a few Haswell nodes since 2 month without any problems with this kernel. Before we had daily VM freezes.
This works only for KVM, OpenVZ ist not included in this kernel. If you're on OpenVZ, there is no fix afaik.

RobHost GmbH | Managed Hosting

Stop hovering to collapse...Click to collapse...Hover to expand...Click to expand...

@ robhost - I said I am not using OpenVZ, as pointed out in that separate link I provided, fully describing the issue. But as on the other hand I am not running only debian to use back-ports and I have mentioned of using also the centos 6.7 with 2.6.x branch this is out of question as this envs are production ones.

I have checked the configuration of C-STATES on the Dell Server, in BIOS. It was disabled as I have on both servers set the Performance Profile to "Performance".

Proxmox Subscriber

@ robhost - I said I am not using OpenVZ, as pointed out in that separate link I provided, fully describing the issue. But as on the other hand I am not running only debian to use back-ports and I have mentioned of using also the centos 6.7 with 2.6.x branch this is out of question as this envs are production ones.

Click to expand...

You missed the point to use the backported kernel on the PVE host, NOT an your VMs.

RobHost GmbH | Managed Hosting

Stop hovering to collapse...Click to collapse...Hover to expand...Click to expand...

Still brings shadow on the final answer, but brings another argument into count - compat vers on qcow2 compat: 0.10 to compat: 1.1 Although, in my case I went installing the bare metal with 3.4.x branch CD from start and didn't upgrade from 3.1

Hi, I'm running 15x dell r630, with Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz process, kernel 3.10, and I don't have any problem.
All on nfs or ceph, no local disk

(running around 1000vms, debian and windows guests)

Click to expand...

Still, that doesn't mean the issue doesn't exist

There is a sure difference between my 730xd's (running also on local and remote storage) and yours r630's in terms of chipset and used CPUs, but I am having a hard time believing that the CPUs instruction set on my 2630's v3 is so new that it brings trouble into this.

Definitely there's a catch on the software side, as dell bios is very straight forward.

Although, it doesn't apply. As I described I am running 2 setup, different locations, one with local storage and one with remote storage. So the problem couldn't come only from the LSI firmware bug due to the fact that one of the setup is running on a central storage, and still the problem exist (the 2.6.x kernel - centos 6.7 - I had described before).

So, I have some servers on the x-density framework, not all of them are using local storage as source storage.

Although, it doesn't apply. As I described I am running 2 setup, different locations, one with local storage and one with remote storage. So the problem couldn't come only from the LSI firmware bug due to the fact that one of the setup is running on a central storage, and still the problem exist (the 2.6.x kernel - centos 6.7 - I had described before).

So, I have some servers on the x-density framework, not all of them are using local storage as source storage.

Click to expand...

Ok.

About the bios, do you have done last upgrade ? (because they are some cpu microcode update for intel processor)

As said, I returned with more info on the topic in order to shed some light on the gathered research I've manage to do so far.

Considering my last post, I have focused on the way the internal disk scheduler is set, from default values towards changing it to deadline on all VMs. This has improved the stability quite a bit, but was not enough to stop this bug from manifesting.

So, since the last post, I was constantly checking the status of the VMs via my NMS and their resource utilisation. What I could observe is that at the time the VM gets stuck, on a low loaded VM, the memory buffers and cached values start to rise pretty solid (even if the lock is cleared afterwards via the earlier add/del disk described method). While the vCPUs are in a "lock state" the host context switches, system interrupts and load average go sky rocket on the graphs and the system I/O Activity freezes completely.

From all the described technologies on the site, the VMCS Shadowing provided the mostly kernel errors pages on current (in use) kernel branches. Therefore a further lookup over, reveals the "kvm_vm_ioctl" KVM kernel functions to be the central point of all sort of misbehaviours.

So as I understand this (I'm not a kernel developer) the "leak" comes from the following logic:

Running a virtual machine -> allocates the corresponding configured vCPUs to the KVM process (to vCPU & ioctl) as well as setting io scheduling in kvm instance in qemu-kvm. The vCPUs are tighten to the physical CPUs & ram memory that binds on a KVM specific instruction set to open ioctl system calls. These should try to create a set of file descriptors to the current process involving the disk access role.

The lock'up only occurs when, the kvm_vm_ioctl tries to free-up some memory resources previously allocated.What is mostly interesting is that there is no version of 3.16.x to test with in pve repository, and this bug was supposedly fixed in 3.10 and we don't know the correlation that it is between 3.10.47 and the pve-kernel-3.10.0-13-pve revision and if it includes the fix, but might explain why robhost is running stable on 3.16.x branch from backports and that supposedly got fixed in the 4.1.x kernel branch.

Regarding Spirit's CPU version in comparison to mine, it is well known the fact that each hardware CPU branch version/revision (mine entry-to-middle, his high end) has a major architecture, thus minor changes between different high-to-low gamma, whereas to the cpu microcode support included in each bios update by all hardware/mobo vendors.

Currently, I'm under stability testing with 3.16.x kernel from backports, forcing me to drop the 3.10.x pve stable kernel release (maybe until a 3.16 pve might raise - although I don't believe it so, since 3.10 is dead next year and the progress on 4.x branch is way to far ongoing on the Proxmox 4 versions for somebody to reconsider bug fixing on a dead-end kernel version/product).

Hopefully my logic and explanations are close to right and this will help others in the future.

As said, I returned with more info on the topic in order to shed some light on the gathered research I've manage to do so far.

Considering my last post, I have focused on the way the internal disk scheduler is set, from default values towards changing it to deadline on all VMs. This has improved the stability quite a bit, but was not enough to stop this bug from manifesting.

So, since the last post, I was constantly checking the status of the VMs via my NMS and their resource utilisation. What I could observe is that at the time the VM gets stuck, on a low loaded VM, the memory buffers and cached values start to rise pretty solid (even if the lock is cleared afterwards via the earlier add/del disk described method). While the vCPUs are in a "lock state" the host context switches, system interrupts and load average go sky rocket on the graphs and the system I/O Activity freezes completely.

From all the described technologies on the site, the VMCS Shadowing provided the mostly kernel errors pages on current (in use) kernel branches. Therefore a further lookup over, reveals the "kvm_vm_ioctl" KVM kernel functions to be the central point of all sort of misbehaviours.

So as I understand this (I'm not a kernel developer) the "leak" comes from the following logic:

Running a virtual machine -> allocates the corresponding configured vCPUs to the KVM process (to vCPU & ioctl) as well as setting io scheduling in kvm instance in qemu-kvm. The vCPUs are tighten to the physical CPUs & ram memory that binds on a KVM specific instruction set to open ioctl system calls. These should try to create a set of file descriptors to the current process involving the disk access role.

The lock'up only occurs when, the kvm_vm_ioctl tries to free-up some memory resources previously allocated.What is mostly interesting is that there is no version of 3.16.x to test with in pve repository, and this bug was supposedly fixed in 3.10 and we don't know the correlation that it is between 3.10.47 and the pve-kernel-3.10.0-13-pve revision and if it includes the fix, but might explain why robhost is running stable on 3.16.x branch from backports and that supposedly got fixed in the 4.1.x kernel branch.

Regarding Spirit's CPU version in comparison to mine, it is well known the fact that each hardware CPU branch version/revision (mine entry-to-middle, his high end) has a major architecture, thus minor changes between different high-to-low gamma, whereas to the cpu microcode support included in each bios update by all hardware/mobo vendors.

Currently, I'm under stability testing with 3.16.x kernel from backports, forcing me to drop the 3.10.x pve stable kernel release (maybe until a 3.16 pve might raise - although I don't believe it so, since 3.10 is dead next year and the progress on 4.x branch is way to far ongoing on the Proxmox 4 versions for somebody to reconsider bug fixing on a dead-end kernel version/product).

Hopefully my logic and explanations are close to right and this will help others in the future.

Click to expand...

Nice debug . I was looking for redhat 3.10 kernel updates changelogs (because current proxmox 3.10 was not updated since may 2015), but nothing too news related to kvm has been backported by redhat.

If you need a more recent kernel than 3.16, you can try proxmox 4.0 kernel, it should work :

No clues but I have seen this on two different E5 v3 processors with Proxmox VE 4.0.System 1: dual E5-2650L V3 supermicro motherboardSystem 2: dual E5-2683 V3 asrock motherboard

Click to expand...

Separately, I have a different env running on Core i7 socket 2011 v1 CPUs and never had encountered this issue before.

If am I am to take a wild general guess, considering what and how a Linux Kernel work (that it molds on to the hardware system), I suppose v3 architecture from Intel is one step ahead of the Kernel development schedule to be consider a fully stable & supported layout, otherwise v1 imposes no trouble.

Proxmox Subscriber

If am I am to take a wild general guess, considering what and how a Linux Kernel work (that it molds on to the hardware system), I suppose v3 architecture from Intel is one step ahead of the Kernel development schedule to be consider a fully stable & supported layout, otherwise v1 imposes no trouble.

Click to expand...

Not the kernel develpment at all, but the RHEL kernels (which PVE uses in a patched version). This ist why newer 3.16 and also 4.0 kernels do not have this problem (and PVE 4.0 also not, depending on its 4.0 kernel).
Imho upgrading to PVE 4 or using the backported kernel 3.16 or even the 4.0 from PVE 4 in PVE 3 ist the only way to fix this issue.

RobHost GmbH | Managed Hosting

Stop hovering to collapse...Click to collapse...Hover to expand...Click to expand...

@ e100 - tested every possible setting, even ISCSI, hot plug disabled, all same results. But while some are still frying the fish, other stubborn people prefer to have a solid solution to this bug and use virtio (the best performance) without any issue.

As a tested solution to this issue, I can confirm after testing that backports kernel 3.16 still imposed issues on the virtual machines I was running. I was still experiencing lock-ups on the VMs running 2.6 kernels (centos 6.7) and 3.16 vCPU & IO wait (debian 8.2) was misbehaving over VM transfer to another host in the cluster one one setup, but on the other hand, on another setup which is only running 7.9 VMS, the backports did help solve the issue.

As now, I can conclude that all this issues, I have posted and explained into another thread of mine ( http://forum.proxmox.com/threads/24277-VM-high-vCPU-usage-issues ) are totally gone.
Following the trial & error suggestion Spirit has done, meaning to try an upgrade all cluster nodes to 4.2.3-2-pve kernel, even if running 3.4 Proxmox version and see the outcome afterwards, has succeeded.

As a rule of thumb, I manage to identify a couple of years ago when I started find my way around how Proxmox works and what are the pros and the cons, I discovered an important fact to keep in mind:

Run the Hypervisor Host with a kernel version at least equal or same branch to the VMs that you're planing to deploy on to it.

Quick Navigation

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.