Linux HugeTLBfs: Improve MySQL Database Application Performance

Applications that perform a lot of memory accesses (several GBs) may obtain performance improvements by using large pages due to reduced Translation Lookaside Buffer (TLB) misses. HugeTLBfs is memory management feature offered in Linux kernel, which is valuable for applications that use a large virtual address space. It is especially useful for database applications such as MySQL, Oracle and others. Other server software that uses the prefork or similar (e.g. Apache web server) model will also benefit.

The CPU’s Translation Lookaside Buffer (TLB) is a small cache used for storing virtual-to-physical mapping information. By using the TLB, a translation can be performed without referencing the in-memory page table entry that maps the virtual address. However, to keep translations as fast as possible, the TLB is usually small. It is not uncommon for large memory applications to exceed the mapping capacity of the TLB. Users can use the huge page support in Linux kernel by either using the mmap system call or standard SYSv shared memory system calls (shmget, shmat).

Only selected hardware and operating system support memory pages greater than the default 4KB. The following configuration tested on RHEL 5.3 64 bit using a stock kernel with tons of RAM and multiple CPUs.

How do I verify that my kernel supports hugepage?

The kernel built with hugepage support should show the number of configured hugepages in the system. Otherwise, you need to be built Linux kernel with the CONFIG_HUGETLBFS option.

How do I configure HugeTLBfs?

The HugeTLBfs feature permits an application to use a much larger page size than normal, so that a single TLB entry can map a larger address space. A HugeTLB entry can vary in size. For example, i386 architecture supports 4K and 4M (2M in PAE mode) page sizes, ia64 architecture supports multiple page sizes 4K, 8K, 64K, 256K, 1M, 4M, 16M, 256M and ppc64 supports 4K and 16M. To allocate hugepage, you can define the number of hugepages by configuring value at /proc/sys/vm/nr_hugepages, enter:# sysctl -w vm.nr_hugepages=40 Above command will try to configure 40 hugepages in the system. Now, run the following again:# grep -i huge /proc/meminfo Sample output:

HugePages_Total: 40 – The size of the pool of hugepages. On busy server with 16/32GB RAM, you can set this to 512 or higher value.

HugePages_Free: 40 – The number of hugepages in the pool that are not yet allocated.

HugePages_Rsvd: 0 – The number of hugepages for which a commitment to allocate from the pool has been made, but no allocation has yet been made.

Hugepagesize: 2048 kB –

Configure MySQL to use HugeTLBfs

In MySQL, large pages can be used by InnoDB, to allocate memory for its buffer pool and additional memory pool. Find mysql user id:# id mysql Sample output:

uid=27(mysql) gid=27(mysql) groups=27(mysql)

Open /etc/sysctl.conf: # vi /etc/sysctl.conf Add the following configuration:

# Set the number of pages to be used.
# Each page is normally 2MB, so a value of 40 = 80MB.
# Set it 512 or higher if you have lots of memory
vm.nr_hugepages=40
# Set the group number (mysql group number is 27) that is allowed to access this memory. The mysql user must be a member of this group.
vm.hugetlb_shm_group=27
# Increase the amount of shmem allowed per segment
# This depends upon your memory, remember your
kernel.shmmax = 68719476736
# Increase total amount of shared memory.
kernel.shmall = 4294967296

Save and close the file. Open /etc/security/limits.conf, enter:# vi /etc/security/limits.conf Append the following line to set max locked-in-memory address space to unlimited:

@mysql soft memlock unlimited
@mysql hard memlock unlimited

Save and close the file. Finally, restart the mysql server:# /etc/init.d/mysqld restart

A note about mount command option

If your application uses huge pages through the mmap() system call, you have to mount a file system of type hugetlbfs like this:# mount -t hugetlbfs none /myapp Another example, with more control over uid, gid and other options:

Further readings:

Posted by: Vivek Gite

The author is the creator of nixCraft and a seasoned sysadmin, DevOps engineer, and a trainer for the Linux operating system/Unix shell scripting. Get the latest tutorials on SysAdmin, Linux/Unix and open source topics via RSS/XML feed or weekly email newsletter.

Your support makes a big difference:

I have a small favor to ask. More people are reading the nixCraft. Many of you block advertising which is your right, and advertising revenues are not sufficient to cover my operating costs. So you can see why I need to ask for your help. The nixCraft takes a lot of my time and hard work to produce. If everyone who reads nixCraft, who likes it, helps fund it, my future would be more secure. You can donate as little as $1 to support nixCraft:

The FreeBSD virtual memory subsystem now supports fully transparent use of superpages for application memory; application memory pages are dynamically promoted to or demoted from superpages without any modification to application code. This change offers the benefit of large page sizes such as improved virtual memory efficiency and reduced TLB (translation lookaside buffer) misses without downsides like application changes and virtual memory inflexibility. This is disabled by default and can be enabled by setting a loader tunable vm.pmap.pg_ps_enabled to 1. Add vm.pmap.pg_ps_enabled=1 to /boot/loader.conf.

huge pages are only supported by the innodb engine , and vm.nr_hugepages should be set to match the memory used (i.e. buffer pool size) by it accordingly , not some arbitrary value as you’re suggesting.

And why raise the shared memory limit to 16gb if you’re going to use only 1 gb (512 pages) (8gb is the default in linux 2.6).

your post does not provide any useful information over what is already in the mysql documentation, and by omitting cricital details it is actually harmful. If you’re going to blindly copy+paste existing documentation as if it was new, at least get it right.

i have set vm.nr_hugepages value to 512 and 1024 MB and added mysql group 27.NOW i am not able to figure out what should be the value of kernel.shmmax and kernel.shmall ? also this is what musql variable show .

Right comment however I think these articles are supposed to be hints and then people should understand what and why and tune it to reflect their needs. Copying any configuration file/how to with no understanding (in general) is just the worst.

I really do have a questions, Can I control the size of TLB from the linux, I do understand this is like the Patetable Cache and it is totally is controlled by the CPU. But I really do want to play around with the TLB trying to do some changes and try to mess with it, any body here has any idea how to do that, not mandatory on linux may be on one of the new OSes for the multi/many core processors ?