This blog is by a long-time Oracle storage professional who has history with both NetApp and EMC.

May 07, 2009

Response to comments from Alessandro Perilli blog

I have noticed that a large number of comments have been submitted to the recent post on Alessandro Perilli’s blog which links to my recent post on Oracle’s support policy for VMware virtualization. Many, if not most, of these comments are actually comments directed at my post, not Alessandro’s post. Therefore, I will respond to them here. I am also attempting to post this as a comment on Alessandro’s blog, but that has not appeared on his site as of yet, and I wanted my response to be available.

To respond to several of the points raised by these comments:

Why would anyone want to run Oracle on a single vCPU under VMware?

Not sure how to parse this one. VMware ESX 3.5 allows 4 vCPUs per VM. Our performance testing was done with 4 vCPU VMs (2 VMs per server on a 8 core box, consisting of 2 Quad-core processors). 4 Servers were in the HA cluster, so a total of 32 physical CPUs were in the configuration, all of which were allocated to vCPUs on 8 VMs. This is laid out pretty thoroughly in the reference architecture published on Powerlink and EMC.com. You can find the EMC.com version here.

In terms of why you would want to run Oracle in a virtualized environment using VMware, that is pretty well laid out in my blog. But to summarize:

We found better performance in a virtualized environment using VMware HA cluster vs. Oracle RAC. I will deal with the fairness of that comparison on my next point.

The cost of VMware HA cluster is less than Oracle RAC, and VMware is appropriate for many customers. Not all, but many. For those customers, RAC would also work, but is, again, vastly more expensive. I lay out the usage cases where VMware HA cluster works as an alternative to RAC in another item below.

Manageability is higher with VMware HA cluster than RAC. Believe me, I should know. I run both routinely in our testing environment.

Why should I believe that performance of VMware HA cluster is higher than RAC when VMware will not allow third parties to benchmark their product?

Well, I am a third party, and the program I manage publishes performance benchmark results of both virtualized and physically booted Oracle production environments. These are available on Powerlink. The actual performance results are pretty confidential and proprietary, but I tend to open the kimono on this blog, as you have seen. We are using a TPC-C-like workload run under Quest Benchmark Factory for Databases. We do not claim that this is a published and audited TPC-C result, obviously. However, we have lots of experience running this.

I can tell you no one was more surprised by the VMware performance result than I was. We are still continuing to actively profile performance of both physically booted and virtualized Oracle environments. (In fact, we are beginning to profile OVM as well.) I had no particular axe to grind on this either way. The results were simply what they were.

RAC and VMware HA cluster are not comparable, are they?

Of course they are. Both Oracle Cluster Ready Services (the underlying technology behind Oracle RAC) and VMware HA cluster are cluster software products. They do basically the same sort of thing: Provide high availability for applications. They simply do so in a different sort of way. RAC provides one single database image across multiple physically booted servers. CRS provides transparent failover of VMs using VMware VMotion technology. Both are intended to protect application uptime and client access by providing a high availability solution.

Granted, they have different levels of HA. I would consider RAC to be a fault tolerant technology, in that the physical loss of a node will not result in database downtime (but may result in loss of client access). VMware HA cluster is a high availability product. A brief downtime is inevitable when a VM is being rebooted on a surviving node of the cluster after node failure. In our experience, VMware HA cluster works pretty well at getting the VMs back up and running, though.

In the end, it depends on what the customer needs, the level of HA being one of the issues. I cover that more in a later section.

Isn't RAC free with SE?

This is true, but very misleading. RAC is free with SE when the total CPU core count in the entire cluster is 2. This is a trivial cluster. Beyond that, EE is required.

And that is where the cost savings come in. EE is required for basically two products in most customer configurations:

RAC

Data Guard

RAC also carries the RAC upcharge above the cost of EE, making it the most expensive Oracle database software product, by far.

Assuming Data Guard is not required (and storage vendors have been competing with Data Guard using products like MirrorView and RecoverPoint for many years), then it comes down to RAC.

Assuming you can provide HA in another manner, then RAC is not required, and therefore SE can be used instead of EE, at 25% of the cost, plus savings on the RAC upcharge. I think you see the point.

The bottom line is that VMware HA cluster provides costs savings assuming the customer scenario allows this product to be used. Which is what I cover next.

Not all customers can use VMware for an HA solution, and need RAC instead. Right?

Not actually in the comments, but important. Again, EMC is a strong supporter of RAC. We run it in our lab, and we will continue to do so. No one would say that RAC does not work as well as VMware HA cluster or that it is not a great product. It is. But it is expensive and complex.

What RAC provides is two things:

A single database image

True fault tolerance

These are both great advantages, but not all customers need this. For example, I personally visited a very large Fortune 100 company. For confidentiality reasons, I will not use the name here, but believe me, they are a household name. This customer had many single-instance database servers running on 1U and 2U unclustered machines throughout their datacenter. The cost to manage these servers was immense, in the 9 figures per year. Yeah, that's a lot of jack.

The customer wished to consolidate all of these servers into a database cloud. I postulate that the way to do that could consist either of RAC or VMware HA cluster. But consider:

What is the level of HA provided by the current configuration? Very low. Each physical server is presently a single point of failure. VMware HA cluster would be a big improvement.

Does the customer need a single database image? Not at all. These servers are already islands of data.

The simplest and quickest way for this customer to consolidate the servers in this scenario, given the choice between RAC and VMware HA cluster, is VMware HA cluster. It is also the least expensive of those two alternatives.

I suspect that many, many customers are in this situation. For many applications VMware HA cluster is a viable high availability and consolidation solution for production Oracle databases. I would certainly not put a large ERP system on VMware. Nor the billing application for a large telco. But many, many applications could be put up on VMware just fine.

This is in many ways comparable to NAS vs. SAN. I am employed by EMC, so it may surprise you to hear me say this, but remember I came to EMC from NetApp. In my experience (not a scientific sample, but still) approximately 90% of the Oracle databases running in the world could be stored on an NFS server with absolutely no change in the client experience or uptime of the database. The same it probably true with VMware. Perhaps the percentage is higher or lower. I certainly have less experience with VMware at this point than NAS. But time will tell.

BTW, it would help me if folks could post comments to content written by me on my own blog, rather than having to run around and find it elsewhere. Just a thought...

Comments

I'm sorry but the comparison is misleading. RAC provides load-balancing, not just HA, unlike VMware. It doesn't actually work as advertised for many workloads, to be sure, but it's still a major difference.

--------------------

Response:

Actually, VMware provides load balancing as well, just in a different manner. The load balancing is provided by the Distributed Resource Scheduler (DRS) component of VMware cluster. This is very different from the way in which RAC load balancing works, however. I will probably do a blog post on this soon to point out the differences.

So go start virtualizing your Oracle DB's on the most proven and stable virtualization platform in the market today and tomorrow.

Cheers,
Denis

----------------------

Response:

This is very good news, but unfortunately, it does not apply to the database server, only to the application server as well as the Fusion middleware server. The use of VMware virtualization for these servers has become so common, that Oracle made the support position on these server official at this point.

To say that a single node Oracle database under VMware outperforms a RAC cluster is quite the claim. I'd love to see what you base it on.

To be honest, an old saying relating to apples and oranges comes to mind.

Oracle RAC is a very specialized technology that offers high performance and a minimalistic fault tolerance for specific applications.

I might add that in my experience, most people who consider RAC to be a viable HA-solution works in Oracle's marketing department.

I don't see any way that VMware can provide comparable performance without implementing true load balancing technology - just as I don't see how RAC can provide comparable redundancy without implementing full node to node failover such as offered by Veritas Cluster or even Oracle DataGuard.

Regards,

Roy Olsen

------------------

Response:

I base that on real-world TPC-C-like performance testing using Quest Benchmark Factory for Databases on identical hardware, running both physically-booted and virtualized configurations.

You seem to have some objections to RAC as an HA solution, which I am not willing to endorse. I have never said that RAC is not a viable HA solution. In fact, RAC can be up and running for many years without any downtime on the overall database, in my experience.

You should compare CRS with VMWare HA, not RAC. Also a question:are you advocating single schema/single instance oracle VM's? Will this not create a proliferation of DB instances that would eb difficlut to manage with VMWare? Also it looks to me like you are putting the management burden with the VMWare/server folks, right?

What are your views on consolidation with multiple schema's on single/multiple instances running on single machines but with CRS?

Thanks yveke.

-------------------------

Response by TOSG:

When I discuss the clustering layer of RAC, I am referring to CRS, yes.

The scale-out model is one which is frequently referred to by Larry Ellison. This occurred in the context of the Exadata II launch last week. It is the case that there are management issues with running many smaller instances, rather than one large instance. However, there are also management issues for having many logical schemas within a single instance. Overall, I would say it depends on what you want to do. If you want to run something like SaaS, many small instances is a good way to go.

Running many instances within a single OS environment (i.e. physically booted) is also an option. It can get confusing though. Also, memory management is a challenge. All SGAs are pulling from the same pool of memory. VMware helps with this.

There is less overhead on multiple OS images that you might think in a VMware context. Transparent page sharing recoups much of the memory of similar OS images.

Oracle SE with RAC is indeed 'noddy' but I believe that it extends to 2 *sockets* per machine not two cores. You can actually get some reasonable performance if you invest in fast multi-core (but single die) CPU architecture.

Oracle Standard Edition One may only be licensed on servers that have a maximum capacity of 2 sockets..Oracle Database Standard Edition can only be licensed on servers that have a maximum capacity of 4 sockets.

I know a lot of customers who go through all sorts of contortions, under-sizing their hardware so that they can "get away" with Standard Edition rather than Enterprise Edition.

But.. there are a lot of other capabilities (example, Transparent Data Encryption/TDE, Partitioning, the ever-popular Diagnostics & Tuning) which are simply unavailable on SE. For certain customers (e.g. folks who want a SOX, HIPAA, or PCI-DSS compliance framework for their DB) Standard Edition would not suffice.

So I would not agree with any statement that EE+RAC penalizes customers. Is it expensive? yes, it is. Does it make sense for many customers? yes, it does. Besides there's always the ULA..

The views expressed on this post are my own and do not necessarily reflect the views of Oracle.

disclaimer: The opinions expressed here are my personal opinions. I am a blogger who works at EMC, not an EMC blogger. This is my blog, and not EMC's. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.