Friday, January 16, 2009

Enabling InnoDB Large Pages on Linux

In MySQL 5.0, InnoDB gained the ability to use Linux Large Page support for allocating memory for the buffer pool and additional memory pool.

A few customers have asked about using it and there is virtually no documentation on what is required on Linux to enable it. I actually ended up having to read some of the Linux kernel source code to figure out some of this.

To set this up and use it, you first need a kernel that supports it. All of the recent RHEL kernels do by default from what I can tell. On my Ubuntu systems, I'm not seeing it enabled normally.

Then on the OS level you will need to do the following procedures:

# Set the number of pages to be used # Each page is normally 2MB, so this would be 40 MB # This actually allocates memory, so it requires that much memory to be available echo 20 > /proc/sys/vm/nr_hugepages

# Set the group number that is allowed to access this memory # The mysql user must be a member of this group echo 102 > /proc/sys/vm/hugetlb_shm_group

For MySQL usage, you would normally want the shmmax to be close to the shmall.

You would normally want to put these into an rc file or similar to do it at every boot sequence (early in the boot sequence normally, prior to MySQL starting).

To verify it works with:

cat /proc/meminfo | grep -i huge

The final step is in order to make use of the hugetlb_shm_group, you need to give the mysql user 'unlimited' value for the memlock limit. This can either by done by editing /etc/security/limits.conf or by adding the following to your mysqld_safe:

ulimit -l unlimited

This will cause the root user to set it to unlimited before switching to the mysql user.

Finally, you will want to add the large-pages option to your my.cnf:

[mysqld] large-pages

With this option, InnoDB will use it automatically for the two memory pools. If it can not, it will fail back and use traditional memory and output a warning to the error log.

You can verify it is being used by looking at:

cat /proc/meminfo | grep -i huge

Solaris also has the ability to use large pages (of different sizes as well), but MySQL doesn't support that yet.

2 comments:

I did a few small benchmarks using customer provided code and in some cases I saw a few percent. It does not seem very dramatic. I have gotten no feedback from customers that this gives a big increase in production.

I have seen more customers use it in an attempt to pin the InnoDB buffer pool in memory. It does do this quite well.

I know that the large page support on Solaris is quite a bit better (it supports much bigger than 2MB pages). I wonder if there would be a bigger gain there.