Virtualizing Oracle 11g on RHEV 3.0 & NetApp

Last week, I had the pleasure of presenting at Red Hat’s mini-theater twice at Oracle Open World. I had a great week splitting my time between the Red Hat and NetApp booths. Since then, I’ve been asked multiple times to share my presentation of Deploying Oracle 11g on RHEV & NetApp, and while I have no issue with that, the embedded demos make it larger than most corporate email systems will allow in or out (mine included). Soooooooo.. I figured, why not just re-work it as a blog post? (warning, this post is a little longer than my other posts.. it didn’t feel right breaking this one up.)The presentation was short and sweet (~20 minutes), but I still managed to fit in 2 recorded demos, that I have included below. Here is the breakdown of what was covered:

Using RHEV constructs to reduce licensing and maintenance costs associated with Oracle 11g

Using NetApp and RHEL to ease support issues associated with virtualizing Oracle 11g

Use NetApp to reduce the time & cost of backup, restore, and testing of Oracle 11g

The demos include:

Backing up, corrupting, then restoring Oracle 11g

Live migrating Oracle 11g (under load) without losing performance

Licensing & Maintenance Costs

Our first challenge is to help limit the cost of licensing when virtualizing Oracle 11g. For the unitiated, Oracle Licensing generally falls into 1 of 2 categories – “Named User Plus” and “per Processor”. “Named User Plus” is generally used where the number of users and/or devices connecting to the database are easily identified, whereas “Per Processor” is generally used where the number of users and/or devices connecting to the database are NOT easily identified. In either case, the number of processors and cores plays a factor in licensing.

Additionally, Oracle has the concept of “hard” and “soft” partitioning as a means of determining how a license is to be applied to a server. In hard partitioning, not all CPU’s are made available to the Oracle database, so only those CPU’s must be accounted for. In soft partitioning, the database may not be using all of the available CPU’s, but it has the potential to touch them all.

In the realm of modern Virtualization, Oracle has taken a hardline approach to this:

Oracle considers RHEV and most other virtualization platforms to be “soft” partitioning. The only virtualization platform that Oracle supports under hard partitioning is their own OVM. Really, all they are really doing is pinning CPUs to a VM, and therefore the Oracle database, but you can do that with any hypervisor… Honestly, I can’t roll my eyes hard enough to indicate my disdain..

So how does soft partitioning work in the context of RHEV? Simply put, if you were running a single virtual instance of the Oracle database, along with your other virtualized apps, on a 8 node RHEV cluster, Oracle would require licensing for all CPU’s/cores on all 8 hypervisor. The reasoning is that the database could potentially touch any of the CPUs in the cluster, and therefore Larry wants is pound of flesh. Again, it’s his product, he can name the terms, but that doesn’t I have to like it.

This is what I would refer to as an “Open Cluster”. For the sake of argument, lets say that each of our 6 servers has 4 quad core CPU’s, giving us 96 cores that we have to account for in licensing. Using Oracle’s current rules, the list price of this would be (96 cores x core factor of .25), or licensing for 24 CPUs at $47.5k each, x 2 database instances, + 22% maintenance fee, for a total of $3,416,000. OUCH!! Even if you get a 50% discount, you’re still into 7 figures for 2 databases…

Let’s fix this, or at least mitigate. With a little planning, we now have our virtualized database instances running in a separate 2 node cluster. Instead of running all hypervisors in the same RHEV Cluster, we create separate Clusters in the same RHEV Data Center. This is not a nested cluster or a cluster within a cluster – this is a separate entity.

For the 2 x quad core servers, our equation becomes more reasonable in comparison, as we’re now licensing for 32 cores instead of 96 and our maintenance fee is more reasonable in comparison as well. Our equation now becomes (32 cores x a core factor of .25) = licensing for 8 CPUs. $380k x 2 = 760k, + 22% maintenance fee of $167k = $927k. Still expensive, but much more affordable than $3.4m

We gain the benefits of Live Migration and better utilization, but we’re not punished (as much) for gaining the flexibility afforded with virtualization. Larry’s pound of flesh is now represented by a few ounces and it hurts just a little less.

** sources at the bottom.

Getting Support for a Virtualized Oracle Database

Let’s start with a familiar situation.. Something goes bump in the night with your prized Oracle 11g database that just happens to be virtualized on something that doesn’t rhyme with “OVM”. You call support and are told to reproduce the issue on bare-metal. You calmly explain that because of the KVM architecture, that there really is no difference between a native database and a virtualized one, but it falls on deaf ears. Meanwhile, you’ve got developers, admins, managers, and customers trying to get you to move faster to get things resolved.

Here’s where some additional server planning and clever use of NetApp come to the rescue. Enter “SnapManager for Oracle”. SnapManager for Oracle (SMO) is a NetApp product that provides backup, recovery, and cloning for Oracle databases. It works with single-instance or RAC deployments, as well as RMAN, ASM, and a bunch of other acronyms that are really important to all of you DBA-types. It also works equally well with SAN or NAS. Here’s where we tie it into easing support of virtualized databases:

Maintain a physical server running the same version of RHEL, running the same patch level Oracle database somewhere else in the data center. It’s whole purpose in life is for this one moment in easing the stress of getting support issues resolved. In the same time (faster, actually) that you can open your support call, you could use SMO to create a zero-space clone of the entire database and mount it up on your physical server. Skip the initial maddening support call that makes you wonder whether you would have been better off going to that truck driving school and getting your CDL.

The cloning process completes in seconds and does not take up additional space. (The cloning process is very similar to the backup process as seen in the demo below.) Mounting the cloned database on the physical server takes a few more seconds. Then you call Oracle support for help on your “non-virtualized” database. Try not to be too smug about getting one up on “the man”. When the support issue is resolved, unmount the clone, destroy the clone, and go about your business. The database has just gone from virtual to physical and back to virtual.

As a side note, these same database clones can (and are) used to rapidly spin up dev/test environments. Because the clone is a zero-space copy, you can test against the entire dataset. Want to see how going from 4 core CPUs to 6 core CPUs affects your performance? Clone the database and find out. Want to verify that the latest Oracle patch set fixes the issues that concern you without causing new issues? Clone the database and find out, without affecting the production database at all.

Backup & Restore Oracle 11g

Here is the other big use of SMO – backup and restore. In the demo below, we show that the database is healthy and running, and then back it up. As soon as that is done, we corrupt the database by nuking the tablespace, then restore it.

Live Migration Without Performance Impact

This was always something that scared most of the DBA’s that I’ve worked with – “don’t virtualize my database, it will never handle live migration!!” That’s what makes this last demo really cool. We simulate 100 users by way of a TPC-C transaction, courtesy of Benchmark Factory (http://www.quest.com/benchmark-factory). We show the live performance graph while the virtualized database is running and under load, then initiate a live migration. Then we flip back to the graph and show there is no drop in performance during the migration or even during the final cut-over. In other words, the whole process completely transparent to end users and applications. And because the database itself never stops running, no connections are dropped either.

Oracle database + Live Migration = BFF’s!!

So, that was the gist of my brief presentation at Red Hat’s mini-theater during Oracle Open World. The core of it is that a little pre-planning in RHEV and creative use of database clones can make a huge difference in licensing costs, easing support calls, and easing dev/test environments. And the fact that live migration of an Oracle database isn’t really scary at all.

hope this helps,

Captain KVM

** Just so you folks know I’m not just making up dollar figures or wild claims on licensing, here are my sources:

One thing to keep in mind (I already shared this with you, but for the sake of your readers) is that in addition to doing “Cluster Islands,” as I like to call them, you can also avoid some SERIOUS licensing costs by re-evaluating whether you need STANDARD or ENTERPRISE Oracle licenses. By combining virtualization with the NetApp stuff you mention in this post, you can do away with a lot of the stuff that usually drives people into Enterprise licensing.

Standard drops you down to a PER-SOCKET licensing model, instead of those complicated PER-CORE multiplier models.

Sorry we missed each other at OOW. Yes, this works with NFS. SnapManager for Oracle works ~really~ well. We’ve converted a lot of DBA’s over the years into seeing the value of SMO, especially over some of the more traditional tools. Backup, recovery, and cloning is scary easy with SMO.

As far as RDM’s, that’s not the way I would do it (that’s just me). I would let the virtualization platform (RHEV, VMware, etc) handle the virtual disk that the VM boots from, but I would directly mount the application storage to the VM. So your VM’s main disk would contain the OS and Oracle binaries, but your other Oracle directories (that hold logs, tablespaces, etc) would be mount points for LUNs or NFS exports. That’s how we set the lab up at NetApp that we did our testing and recorded the demos.

Hello Captain,
Thank you for addressing partially one of mine earlier questions – Oracle and RHEV – I’m only dropping this comment, because i saw the bit on OVM.
Why XEN sucks – because it not only pins CPUs to vCPUS, it pre-allocates RAM memory, which 10 years ago was fine – on non-RISC box, you couldn’t expect much performance from guests, if the hypervisor first isn’t already sliced and diced the resources, but in the era of AMD-V and VT-X, faster (and much, much larger amount) of RAM, SSDs, iSCSI, FC, … this list goes on for a while 🙂 , we can harvest bigger part – I mean, for example, with kernel 3.2x you can have up to 160 vCPU’s and your only worry will be to cool them down (i almost fused my CPU to the radiator a month ago 🙂 )
All best,
Stoyan

Careful here.. …keeping a physical around and running for purposes of a support issue is a good way to get an Oracle audit fired in your direction.

Oracle has not reacted kindly to this kind of use of their products and could easily require a customer to license the physical with full-use licensing (whether named or CPU-based). Inactive environments loaded with Oracle could be considered a “standby” server – these must be licensed in their view if there’s a “hot” way to leverage them (i.e. invoking a V2P…).

You make an excellent point. I should have added that the physical server should be shut down when not in use. I’m certainly not advocating that folks avoid paying licenses. Another option is to simply purchase a “standard”, i.e. NOT enterprise license, for the standby server. List on enterprise is about $47,500 and standard is around $13,000 (last time I checked).

HI,
I am installing RHEV systems and have a problem configuring a second nic for the hosts. I have contacted RH support and they seem not to be able to help. While Oracle RAC can operate on hosts with only one active nic it would not be optimal for production use with hundreds of concurrent users. Any Suggestions?

Where exactly are you having the issue? I’m currently running a virtualized 4 node RAC deployment on RHEV and Clustered ONTAP in my test lab. I have several NICs present on each VM. My hypervisors (rhev-h) each have 3 VLANs (1 data VLAN, 1 Mgmt/rhevm VLAN, and 1 NFS VLAN). The VMs have 5 NICs (3 on 1 data, 1 on mgmt, and 1 on the NFS).

Have you done live migrations with Oracle RAC under load in your test environment? I migrated a couple of Oracle RAC nodes at exactly the same time tonight while they were under 100% CPU load. Worked a treat. I’m pretty sure this was a first as I was doing it in a hyperconverged infrastructure environment with Nutanix equipment.

Don’t get me started on Oracle Licensing and Support. On the support front Oracle has to provide commercially reasonable support if you’re running a supported DB version on a supported OS version, even though there is no official support statement for KVM.

If it’s a DB problem, they’ll help you fix it. If it’s a hypervisor / storage problem, they’ll refer you to your hypervisor vendor or storage vendor usually before requiring reproduction. As you point out, there is very little difference. I’ve written a lot about fighting the Oracle FUD on vSphere and now will be doing similar with KVM.

Also you have the 10 day rule that you can use in the case of Oracle systems that are running on top of a clustered file system, so you can run one node on KVM and failover to something else. Need to check the exact contract wording on this, just to make sure it’s a valid use of the 10 day rule. It usually has to be a failure scenario, not maintenance or similar support issues where things are still online. Contract is king.

Interested to hear what testing with RAC you have and haven’t done. My test was with 12c.

Thanks for stopping by long enough to post and ask questions! It’s been a while, but yes I’ve done Oracle 11g RAC and single instance migrations. I’ve not done it with Nutanix; I did it all while I was still working at NetApp. NetApp and Oracle work ~really~ well together, as do NetApp and Red Hat, and Red Hat and Oracle.. so I was willing to gamble that putting the 3 together would be killer. Not that I was the 1st or even the 1000th.. I specifically wanted to address some issues like licensing and the fact that if you virtualize, you don’t necessarily need to go with “enterprise” licenses.. you could actually go with standard. You can imaging that didn’t make me very popular with Oracle… I haven’t done anything with 12c.. I returned to Red Hat in June of last year and I’m really focused more on RHEL-OSP, RHEV, and CloudForms now..

So, still very much focused on KVM. Nice to hear from another writer!!

Thanks for leaving questions and comments. Both are always appreciated. If I understand your question correctly, you’re virtualizing Oracle on RHEV and NetApp. You want to use the NetApp SMO, but want to know if you need to install the host utilities kit for FC. If you are using “RHEV-H” (the Red Hat thin hypervisor), then you will not be able to install the host utilities. If you are using RHEL 6 with KVM, (“thick” hypervisor”) then you can and should use the host utilities kit.

How do you plan to connect your virtualized Oracle databases to the NetApp storage? The easiest way is to use NFS (NFSv3 or Oracle dNFS), followed by iSCSI.