Exadata 11.2.3.2.1 NFS Issues – Ksplice Support for Exadata?

When the 11.2.3.2.1 release of the Exadata Storage Server software was released, I was a little excited. There were numerous oneoff patches for the previous release, 11.2.3.2.0, which was the first version to support the Exadata X3, writeback flashcache, run UEK on the X#-2 systems, etc. With that many large changes introduced in one version, it was likely to see some bugs in the .0 release. Fortunately, Oracle was quick to fix many of those issues, but it resulted in several separate patches to update the cellsrv software.

I was working with a colleague last week where we ready to apply this patch to a customer’s Exadata system. Everything went off without a hitch – upgrading from 11.2.2.4.2 straight to 11.2.3.2.1. We even applied the patch to the customer’s quarter rack in rolling mode, which took under 6 hours to complete. After everything was back up and running, we took an archive log backup using RMAN. For this customer, we back everything up to NFS because it won’t fit within the FRA, and they don’t want to leave backups inside the production system. We were greeted with a strange error when we tried to kick off the backup job in RMAN:

It didn’t matter what we were trying to back up, just that it was going to NFS. This backup job had worked fine prior to the patch (we took a backup immediately preceding the maintenance window), but we had applied both a database bundle patch (this database was 11.2.0.2) and the latest storage server patch (11.2.3.2.1), which updates the Linux OS to OEL 5.8, as well as introduces the Oracle Unbreakable Enterprise Kernel into the mix.

We checked the mount options to make sure that everything was ok, and saw that it was:

After poking around a bit, we opened a service request, which was answered pretty quickly by Oracle support. It turns out that there is a known bug with the NFS driver included in the version of the UEK packaged with 11.2.3.2.1. Oracle provided 3 possible fixes, which I’ll detail below. The fixes were:

Enable direct NFS

First, I’ll cover enabling direct NFS. I actually have a blog post in the works, but to give you a quick once-over on it, here goes. Direct NFS is a custom-built NFS driver that does not interfere with the kernel’s NFS driver. Any operations that go through the database (RMAN, data pump, etc) will use this special driver that is optimized for database operations. Processes that utilize direct NFS operate in user space (like FUSE), which has less overhead than kernel space. Because direct NFS does not use the bad kernel NFS driver for backup operations, the bug is negated. It goes without saying that if your databases are interacting with NFS, you should use direct NFS. There is no penalty for doing so, and it’s really easy to do. Shut down your database instance, and relink for direct NFS:

Switch to non-UEK

The next option provided by Oracle support was to switch from the Unbreakable Enterprise Kernel included with 11.2.3.2.1 to the RedHat compatible kernel. To do this, follow the instructions in the readme for the patch (patch #14522699).

Ksplice on the UEK

The last option is the most interesting, in my opinion – applying an online kernel patch using ksplice. If you’re not familiar with ksplice, it is a magical little piece of software acquired by Oracle which allows administrators to apply kernel patches to the kernel running in memory, eliminating the need for a reboot (generally). Because ksplice isn’t fully supported on Exadata, we have a few limitations on what can be done, but this is an interesting glimpse into what may be coming in future releases. This patch (provided by Oracle Support) is a single ksplice patch for this bug. To install, unzip the patch, copy the patch file to the uptrack directory, and apply the patch. All of the following must be done as root.

That’s all there is to it. This will need to be applied on both nodes, and since it’s not a full installation of ksplice, the patch will have to be reapplied upon each reboot of the compute node. If you really want to, you could create a script to apply the patch when the server boots up. On full installations of ksplice, the kernel will be “respliced” upon each reboot.

Really helpful blog, I don’t think you are going to be alone in hitting this one!

I take it the uptrack command comes with the patch, rather than native on the Exadata? Did you have to get the patch specially from Oracle support, or is it generally available? i.e. do you have a patch number?

I installed the exact same patch last week. The ksplice patching went indeed really easy.
I have also been looking into using our local yum repo server for the ksplice offline client, but had to abandon it due a lack of disk space (and time).

Have you done some investigations into it?

Also I have been investigating backup performance and it seems that without dnfs you really need to enable async io via the filesystemio_options parameter to get some real performance (but had to hand the environment back, before I could do some formal camparison testing).
Dnfs is using async without having to specify it in the filesystemio_options parameter.

This repository ISO image contains only the Oracle Exadata release 11.2 Latest channel which includes 11.2.3.2.1 Base channel plus fixes for bugs 16263472, 16046497 and 15956690. If RPMs from additional channels are required, then they must be obtained via other methods. This patch supercedes the contents of the ISO contained in patch 15991297. Customers already on 11.2.3.2.1 should consider applying this updated patch. Refer to the patch README for more information.