When does Nova apply its filters (Ram, CPU, etc.)? Of course at instance creation and (live-)migration of existing instances. But what about existing instances that have been shutdown and in the meantime more instances on the same hypervisor have been launched?

When you start one of the pre-existing instances and even with RAM overcommitment you can end up with an OOM-Killer resulting in forceful shutdowns if you reach the limits. Is there something I've been missing or maybe a bad configuration of my scheduler filters? Or is it the admin's task to keep an eye on the load?

On 08/30/2018 08:54 AM, Eugen Block wrote: > Hi Jay, > >> You need to set your ram_allocation_ratio nova.CONF option to 1.0 if you're >> running into OOM issues. This will prevent overcommit of memory on your >> compute nodes. > > I understand that, the overcommitment works quite well most of the time. > > It just has been an issue twice when I booted an instance that had been shutdown > a while ago. In the meantime there were new instances created on that > hypervisor, and this old instance caused the OOM. > > I would expect that with a ratio of 1.0 I would experience the same issue, > wouldn't I? As far as I understand the scheduler only checks at instance > creation, not when booting existing instances. Is that a correct assumption?

The system keeps track of how much memory is available and how much has been assigned to instances on each compute node. With a ratio of 1.0 it shouldn't let you consume more RAM than is available even if the instances have been shut down.

On 08/30/2018 10:54 AM, Eugen Block wrote: > Hi Jay, > >> You need to set your ram_allocation_ratio nova.CONF option to 1.0 if >> you're running into OOM issues. This will prevent overcommit of memory >> on your compute nodes. > > I understand that, the overcommitment works quite well most of the time. > > It just has been an issue twice when I booted an instance that had been > shutdown a while ago. In the meantime there were new instances created > on that hypervisor, and this old instance caused the OOM. > > I would expect that with a ratio of 1.0 I would experience the same > issue, wouldn't I? As far as I understand the scheduler only checks at > instance creation, not when booting existing instances. Is that a > correct assumption?

To echo what cfriesen said, if you set your allocation ratio to 1.0, the system will not overcommit memory. Shut down instances consume memory from an inventory management perspective. If you don't want any danger of an instance causing an OOM, you must set you ram_allocation_ratio to 1.0.

> To echo what cfriesen said, if you set your allocation ratio to 1.0, > the system will not overcommit memory. Shut down instances consume > memory from an inventory management perspective. If you don't want > any danger of an instance causing an OOM, you must set you > ram_allocation_ratio to 1.0.

let's forget about the scheduler, I'll try to make my question a bit clearer.

Let's say I have a ratio of 1.0 on my hypervisor, and let it have 24 GB of RAM available, ignoring the OS for a moment. Now I launch 6 instances, each with a flavor requesting 4 GB of RAM, that would leave no space for further instances, right? Then I shutdown two instances (freeing 8 GB RAM) and create a new one with 8 GB of RAM, the compute node is full again (assuming all instances actually consume all of their RAM). Now I boot one of the shutdown instances again, the compute node would require additional 4 GB of RAM for that instance, and this would lead to OOM, isn't that correct? So a ratio of 1.0 would not prevent that from happening, would it?

Regards, Eugen

Zitat von Jay Pipes <jaypipes@gmail.com>:

> On 08/30/2018 10:54 AM, Eugen Block wrote: >> Hi Jay, >> >>> You need to set your ram_allocation_ratio nova.CONF option to 1.0 >>> if you're running into OOM issues. This will prevent overcommit of >>> memory on your compute nodes. >> >> I understand that, the overcommitment works quite well most of the time. >> >> It just has been an issue twice when I booted an instance that had >> been shutdown a while ago. In the meantime there were new instances >> created on that hypervisor, and this old instance caused the OOM. >> >> I would expect that with a ratio of 1.0 I would experience the same >> issue, wouldn't I? As far as I understand the scheduler only checks >> at instance creation, not when booting existing instances. Is that >> a correct assumption? > > To echo what cfriesen said, if you set your allocation ratio to 1.0, > the system will not overcommit memory. Shut down instances consume > memory from an inventory management perspective. If you don't want > any danger of an instance causing an OOM, you must set you > ram_allocation_ratio to 1.0. > > The scheduler doesn't really have anything to do with this. > > Best, > -jay > > _______________________________________________ > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack> Post to : openstack@lists.openstack.org > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

On Mon, Sep 3, 2018 at 1:27 PM, Eugen Block <eblock@nde.ag> wrote: > Hi, > >> To echo what cfriesen said, if you set your allocation ratio to 1.0, >> the system will not overcommit memory. Shut down instances consume >> memory from an inventory management perspective. If you don't want >> any danger of an instance causing an OOM, you must set you >> ram_allocation_ratio to 1.0. > > let's forget about the scheduler, I'll try to make my question a bit > clearer. > > Let's say I have a ratio of 1.0 on my hypervisor, and let it have 24 > GB of RAM available, ignoring the OS for a moment. Now I launch 6 > instances, each with a flavor requesting 4 GB of RAM, that would > leave no space for further instances, right? > Then I shutdown two instances (freeing 8 GB RAM) and create a new one > with 8 GB of RAM, the compute node is full again (assuming all > instances actually consume all of their RAM).

When you shutdown the two instances the phyisical RAM will be deallocated BUT nova will not remove the resource allocation in placement. Therefore your new instance which requires 8GB RAM will not be placed to the host in question because on that host all the 24G RAM is still allocated even if physically not consumed at the moment.

> Now I boot one of the shutdown instances again, the compute node > would require additional 4 GB of RAM for that instance, and this > would lead to OOM, isn't that correct? So a ratio of 1.0 would not > prevent that from happening, would it?

Nova did not place the instance require 8G RAM to this host above. Therefore you can freely start up the two 4G consuming instances on this host later.

> Regards, > Eugen > > > Zitat von Jay Pipes <jaypipes@gmail.com>: > >> On 08/30/2018 10:54 AM, Eugen Block wrote: >>> Hi Jay, >>> >>>> You need to set your ram_allocation_ratio nova.CONF option to 1.0 >>>> if you're running into OOM issues. This will prevent overcommit >>>> of memory on your compute nodes. >>> >>> I understand that, the overcommitment works quite well most of the >>> time. >>> >>> It just has been an issue twice when I booted an instance that had >>> been shutdown a while ago. In the meantime there were new >>> instances created on that hypervisor, and this old instance >>> caused the OOM. >>> >>> I would expect that with a ratio of 1.0 I would experience the same >>> issue, wouldn't I? As far as I understand the scheduler only >>> checks at instance creation, not when booting existing instances. >>> Is that a correct assumption? >> >> To echo what cfriesen said, if you set your allocation ratio to 1.0, >> the system will not overcommit memory. Shut down instances consume >> memory from an inventory management perspective. If you don't want >> any danger of an instance causing an OOM, you must set you >> ram_allocation_ratio to 1.0. >> >> The scheduler doesn't really have anything to do with this. >> >> Best, >> -jay >> >> _______________________________________________ >> Mailing list: >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>> Post to : openstack@lists.openstack.org >> Unsubscribe : >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack> > > > _______________________________________________ > Mailing list: > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack> Post to : openstack@lists.openstack.org > Unsubscribe : > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

On 09/03/2018 07:27 AM, Eugen Block wrote: > Hi, > >> To echo what cfriesen said, if you set your allocation ratio to 1.0, >> the system will not overcommit memory. Shut down instances consume >> memory from an inventory management perspective. If you don't want any >> danger of an instance causing an OOM, you must set you >> ram_allocation_ratio to 1.0. > > let's forget about the scheduler, I'll try to make my question a bit > clearer. > > Let's say I have a ratio of 1.0 on my hypervisor, and let it have 24 GB > of RAM available, ignoring the OS for a moment. Now I launch 6 > instances, each with a flavor requesting 4 GB of RAM, that would leave > no space for further instances, right? > Then I shutdown two instances (freeing 8 GB RAM) and create a new one > with 8 GB of RAM, the compute node is full again (assuming all instances > actually consume all of their RAM). > Now I boot one of the shutdown instances again, the compute node would > require additional 4 GB of RAM for that instance, and this would lead to > OOM, isn't that correct? So a ratio of 1.0 would not prevent that from > happening, would it?

I'm not entirely sure what you mean by "shut down an instance". Perhaps this is what is leading to confusion. I consider "shutting down an instance" to be stopping or suspending an instance.

As I mentioned below, shutdown instances consume memory from an inventory management perspective. If you stop or suspend an instance on your host, that instance is still consuming the same amount of memory in the placement service. You will *not* be able to launch a new instance on that same compute host *unless* your allocation ratio is >1.0.

Now, if by "shut down an instance", you actually mean "terminate an instance" or possibly "shelve and then offload an instance", then that is a different thing, and in both of *those* cases, resources are released on the compute host.

Best, -jay

> Zitat von Jay Pipes <jaypipes@gmail.com>: > >> On 08/30/2018 10:54 AM, Eugen Block wrote: >>> Hi Jay, >>> >>>> You need to set your ram_allocation_ratio nova.CONF option to 1.0 if >>>> you're running into OOM issues. This will prevent overcommit of >>>> memory on your compute nodes. >>> >>> I understand that, the overcommitment works quite well most of the time. >>> >>> It just has been an issue twice when I booted an instance that had >>> been shutdown a while ago. In the meantime there were new instances >>> created on that hypervisor, and this old instance caused the OOM. >>> >>> I would expect that with a ratio of 1.0 I would experience the same >>> issue, wouldn't I? As far as I understand the scheduler only checks >>> at instance creation, not when booting existing instances. Is that a >>> correct assumption? >> >> To echo what cfriesen said, if you set your allocation ratio to 1.0, >> the system will not overcommit memory. Shut down instances consume >> memory from an inventory management perspective. If you don't want any >> danger of an instance causing an OOM, you must set you >> ram_allocation_ratio to 1.0. >> >> The scheduler doesn't really have anything to do with this. >> >> Best, >> -jay >> >> _______________________________________________ >> Mailing list: >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>> Post to : openstack@lists.openstack.org >> Unsubscribe : >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack> > > > _______________________________________________ > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack> Post to : openstack@lists.openstack.org > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

> On Mon, Sep 3, 2018 at 1:27 PM, Eugen Block <eblock@nde.ag> wrote: >> Hi, >> >>> To echo what cfriesen said, if you set your allocation ratio to >>> 1.0, the system will not overcommit memory. Shut down instances >>> consume memory from an inventory management perspective. If you >>> don't want any danger of an instance causing an OOM, you must set >>> you ram_allocation_ratio to 1.0. >> >> let's forget about the scheduler, I'll try to make my question a >> bit clearer. >> >> Let's say I have a ratio of 1.0 on my hypervisor, and let it have >> 24 GB of RAM available, ignoring the OS for a moment. Now I launch >> 6 instances, each with a flavor requesting 4 GB of RAM, that would >> leave no space for further instances, right? >> Then I shutdown two instances (freeing 8 GB RAM) and create a new >> one with 8 GB of RAM, the compute node is full again (assuming all >> instances actually consume all of their RAM). > > When you shutdown the two instances the phyisical RAM will be > deallocated BUT nova will not remove the resource allocation in > placement. Therefore your new instance which requires 8GB RAM will > not be placed to the host in question because on that host all the > 24G RAM is still allocated even if physically not consumed at the > moment. > > >> Now I boot one of the shutdown instances again, the compute node >> would require additional 4 GB of RAM for that instance, and this >> would lead to OOM, isn't that correct? So a ratio of 1.0 would not >> prevent that from happening, would it? > > Nova did not place the instance require 8G RAM to this host above. > Therefore you can freely start up the two 4G consuming instances on > this host later. > >> Regards, >> Eugen >> >> >> Zitat von Jay Pipes <jaypipes@gmail.com>: >> >>> On 08/30/2018 10:54 AM, Eugen Block wrote: >>>> Hi Jay, >>>> >>>>> You need to set your ram_allocation_ratio nova.CONF option to >>>>> 1.0 if you're running into OOM issues. This will prevent >>>>> overcommit of memory on your compute nodes. >>>> >>>> I understand that, the overcommitment works quite well most of the time. >>>> >>>> It just has been an issue twice when I booted an instance that >>>> had been shutdown a while ago. In the meantime there were new >>>> instances created on that hypervisor, and this old instance >>>> caused the OOM. >>>> >>>> I would expect that with a ratio of 1.0 I would experience the >>>> same issue, wouldn't I? As far as I understand the scheduler >>>> only checks at instance creation, not when booting existing >>>> instances. Is that a correct assumption? >>> >>> To echo what cfriesen said, if you set your allocation ratio to >>> 1.0, the system will not overcommit memory. Shut down instances >>> consume memory from an inventory management perspective. If you >>> don't want any danger of an instance causing an OOM, you must set >>> you ram_allocation_ratio to 1.0. >>> >>> The scheduler doesn't really have anything to do with this. >>> >>> Best, >>> -jay >>> >>> _______________________________________________ >>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>>> Post to : openstack@lists.openstack.org >>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>> >> >> >> _______________________________________________ >> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>> Post to : openstack@lists.openstack.org >> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack