Oracle Blog

Saturday May 26, 2012

I did some tests with tmem using an Oracle Database 11gR2 and swingbench setup. You can see a graph below. Let me try to explain what this means.

Using Oracle VM 3 with some changes booting dom0 (additional parameters at the boot prompt) and with UEK2 as a guest kernel in my VM, I can make use of autoballooning. What you see in the graph below is very simple : it's a timeline (horizontal)of how much actual memory the VM is using/needing. I created 3 16GB VMs that I wanted to run on a 36GB Oracle VM server (so more VM memory than we have physically available in the server). When I start a 16GB VM (vertical) the Linux guest immediately balloons down to about 700Mb in size. It automatically releases pages to the hypervisor that are not needed, it's free/idle memory otherwise. Then I start a database with a 4GB SGA, as you can see, the second I start the DB, the VM grows to just over 4GB in size. Then I start swingbench runs, 25, 50, 100, 500, 1000 users. Every time such a run starts, you can see memory use/grab go up, when swingbench stops it goes down. In the end after the last run with 1000 users I also shut down the database instance and memory drops all the way to 700Mb.

I ran 3 guests with swingbench and the database in each and through dynamic ballooning and the guests cooperatively working with the hypervisor, I was able to start all 3 16GB VMs and there was no performance impact. When there was free memory in the hypervisor, cleancache kicked in and guests made use of those pages, including deduping and compression of the pages.

If you want to play with this yourself, you can run this command in dom0 to get decent statistics out of the setup : xm tmem-list --long --all | /usr/sbin/xen-tmem-list-parse. It will show you the compression ratio, the cache ratios etc. I used those statistics to generate the chart below. This yet another example of how, when one can control both the hypervisor and the guest operating system and have things work together, you get better and more interesting results than just black box VM management.

Some developers on my team have spent quite a few years on doing research and development on new ways of doing memory management better across servers, both physical and virtual. You often hear me talk about cooperative memory management in the context of Oracle VM. I believe it is important to be able to do memory overcommit the right way and have both the guest OS and the hypervisor know what's going on, instead of having the hypervisor do things behind the guest's back. Transparent memory overcommit causes unpredictable behavior for applications and while this might be of little concern when consolidating lots of small VMs that aren't doing much interesting. When you have applications that are, by nature, memory intensive, such as an Oracle database or any sort of production environment, then there need to be better ways to do this. Better, here, meaning -> more efficient, more predictable and more cooperative between host and guest.

It's a long road to getting all these stars aligned but we are very close now in terms of Linux kernel features to help with this. Over the last few years Dan Magenheimer has been working hard on what's called transcendent memory, a collection of features that include cleancache, frontswap and ramster. Now with Linux 3.4, we have tmem, cleancache and ramster in kernel upstream. Frontswap is still pending but we're working on it.

As a refresher, you can read more about cleancache in this writeup I did just bout a year ago. Frontswap is more tricky, because here we can use transcendent memory as a swap device. Pages swapped out cannot just disappear, they have to be persistent until the OS decides to no longer need them. In other words, with cleancache we provide extra memory(extra cache) but with the understanding that that extra memory might just disappear. With frontswap, we provide swap space through the same mechanism, with the understanding that it's not going to just disappear. One place where frontswap is different from regular swap, is its size. A swapdevice is pre-created with a specific size, a file acting as a swap device or a physical/logical volume. They do not grow or shrink dynamically. frontswap is more dynamic. The advantage of frontswap is that you can use ram on a remote server or in the hypervisor as very fast storage. It can also do compression of memory pages using zcache.

If you want to play with ramster, Dan wrote up a simple how-to.
Dan also wrote up a presentation which you can find here.

Looking forward to frontswap getting into the kernel and many of these features get into customers hands through the Unbreakable Enterprise Kernel and as products evolve and features get management also into Oracle VM.

About

Wim Coekaerts is the Senior Vice President of Linux and Virtualization Engineering for Oracle. He is responsible for Oracle's complete desktop to data center virtualization product line and the Oracle Linux support program.