Optimizing Oracle 10g on Linux: Non-RAC ASM vs. LVM

Using benchmarks to answer the question of whether automatic storage management performs as well as Linux filesystems using an LVM.

It's been over a year since my first and enthusiastic automatic storage
management (ASM) article, titled "Optimizing Oracle 10g on Linux Using Automated Storage
Management", was published. It still is available
here,
here
and
here.
Since then, quite a lot has changed in terms of the software technologies
now available:

Red Hat released Advanced Server 4.0 built on the 2.6
kernel

Along with AS 4.0, Red Hat released a greatly improved
LVM with a GUI

Red Hat released the Global File System, which now is a
part of Fedora

Oracle released version 10.2.0.1 of the database (10g
release 2)

Oracle released version 2.0 of the ASM kernel driver and
libraries

Oracle released version 2.0 of the Oracle Cluster File
System

As you can see, the software technology landscape has changed so extensively
that it has reopened the entire ASM debate. In my first ASM paper, I simply
assumed people either would be utilizing ASM or not--without considering
RAC usage ramifications. During this past year at shows, conferences and
on-site visits, a number have told me that although ASM makes obvious
sense for RAC environments, they also want to know if ASM is a viable
alternative for non-RAC environments. Specifically, does ASM perform as well
as Linux filesystems using a logical volume manager (LVM)?

Of course, that's a challenge far too enticing to pass up, especially
when tools such as Quest's Benchmark Factory are available that make
these tests trivial. So on to the races.

Test Criteria

Looking back at the technology change listing above, what we want to
benchmark is the new LVM vs. ASM 2.0 on Red Hat Advanced Server 4.0's
2.6 kernel running Oracle 10g Release 2. In other words, we want to test
all of the latest and greatest software technology available for non-RAC
scenarios. The goal is simply to benchmark their fundamental performance
characteristics against one another and, where possible, declare a winner.
For that purpose, we need to simulate two radically different kinds of
real-world workloads to cover differing needs. Thus, the following
industry-standard benchmark tests are being used:

The TPC-D benchmark measures a broad range of decision support
applications requiring complex, long-running queries against large and
complex databases.

Both tests simulate 100 users against 1GB databases. Although these
test parameters are not too large, they nonetheless are the maximum
realistic values that our limited test hardware can accommodate. But
it's expected that results from such tests should be sufficient for
extrapolating to larger environments.

Test Setup

Setting up an industry-standard database benchmark, such as the TPC-C
and TPC-D, using Quest's Benchmark Factory is a snap and can be done in five
easy steps. First, after opening the application, press the New toolbar icon to
launch the New Project wizard. From there, specify that you want to create a
Standard Benchmark Workload, as shown in Figure 1.

Figure 1. Creating a Standard Benchmark
Workload

Second, choose which industry standard benchmark you want to perform
from the list of available tests, as shown in Figure 2.

Figure 2. Choosing the Benchmark

Third, choose the approximate database size to create for performing
the benchmark, as shown in Figure 3. Remember, Benchmark Factory has to
create and populate it.

Figure 3. Choosing the Database Size

Fourth, select the number of concurrent users you want to simulate for
performing the benchmark, as shown in Figure 4. As a side note, you can
run this from one or more Windows computers.

Figure 4. Selecting Concurrent Users

Fifth, run the test and record the results. The total time it generally takes
to configure a standard benchmark is roughly 30 seconds.

Disk Layout

In order for the benchmark test results to provide a fair, apples to
apples comparison, both the LVM and ASM disk layouts must be similar
enough to draw meaningful and reliable conclusions. That means
neither setup should get preferential treatment in the allocation
of devices. To that end, Figure 5 shows how the two environments were
allocated across four identical IDE disks; you can tell they are IDE disks
by the /hdb1 through /hde2 naming convention. These were 7,200 RPM SATA
IDE disks with 2MB of cache each. Notice also how two inner and two outer disk
partitions were allocated to each solution. The idea was to eliminate any
unintentional speed advantages due to quicker access times for inner disk
tracks. Finally, no operating system, swap or database binary
files are on these disks; they were used solely for database data.

Figure 5. Allocating Devices

Although SCSI is obviously the preferable choice, the popularity
of SATA IDE for low-cost RAID arrays is rising. The results obtained
should apply equally well to faster and more reliable disk technologies
such as SCSI, as well as highly popular RAID array appliances, such
as NAS and SAN. The chief goal here was to implement Oracle's SAME
(stripe and mirror everything) approach. Even though there are only
four disks, we nonetheless should be able to compare these two
methods' fundamental striping capabilities. And, minus all the other
bells-and-whistles distractions, that's essentially the heart of the
question people have been asking: do the ASM striping algorithms match
up well against those of the more mature LVM?

The Early Results

Remember as we look at these results that we're not worrying about which
environment is easier to set up and maintain, because as the prior paper
clearly pointed out, ASM has numerous advantages in those areas. Our goal
here is simply to see how they perform in head-to-head speed tests. So
the results here focus on only that aspect--speed.

Let's look first at the TPC-C results. Remember, we simulated 100 concurrent
users accessing a 1GB database. The results are shown in Figure 6.

Figure 6. The TPC-C Results

Basically, the TPC-C results were too close to award a winner. I suspect
that a key reason for the lack of any major difference is the Oracle
data, index, temporary and rollback segments did not have to grow and
shrink by any measurable amount in this type of load test scenario. This
is because OLTP transactions tend to be short and bursty in nature.
Thus, we are measuring primarily read-only access across four disk
stripes. Therefore, we have to call the TPC-C benchmark test results a
draw, with neither ASM nor LVM showing any real performance advantage.

Note: Although this tie was unexpected, it clearly shows why you need
to consider more than one type of benchmark test when comparing such
radically different technologies. Benchmark Factory offers additional
database benchmarking tests, including TPC-B, TPC-D, AS3AP and Scalable
Hardware Benchmark. Make sure that you choose the tests that best
reflect the database environment you will be building and maintaining.

Now let's look at the TPC-D. Again, we simulated 100 concurrent
users accessing a 1GB database. The results are shown in Figure 7.

Figure 7. The TPC-D Results

Here we have a clear-cut winner. The LVM ran 30% faster, achieved a 25%
higher transaction per second rate, scored 56% faster on kilobytes per second
and had a 108% better average response time. I suspect that the real
differentiator here was the temporary segment allocation necessary for
the large GROUP BY and ORDER BY operations.

Going the Extra Mile

I was not entirely happy, however, with simply running the industry-standard
benchmarks and speculating as to why the results ended up as they did. I
wanted a little more clarity regarding objects' segment creation and
allocation--and the corresponding tablespace growth issues. My belief
was the LVM somehow handles space allocation due to object growth
more efficiently than ASM does. Of course, this seems totally contrary
to what one would expect, as ASM touts the advantages of RAW
without the headaches. So how could the ext3 filesystem on top of the
LVM be faster? To this end, I devised a simple, brute force benchmark to test this
premise. I created a simple table with two indexes whose data format would
yield predictable growth with increasing row counts. Thus, I could test
the object space creation and allocation for both tables and indexes
with one simple script. The script is provided below.

The results of calling this script for row counts from 10,000 to
100,000,000 for both LVM and ASM are shown in Figure 8.

Figure 8. Script-Generated Results

The results from this additional experiment were quite simple and
conclusive. Although both approaches used exactly the same amount of space,
the LVM run times consistently beat the ASM run times by anywhere from
10 to 14%. As you can see by the graph's lines, the trend seems clear: LVM is
slightly more efficient at bulk data loads than is ASM.

The Final Results

So, what does all of this mean? For people doing RAC, ASM is a
viable and credible approach for disk space management, with numerous
administrative and maintenance benefits to its credit. But for those simply doing
non-RAC database deployments, ASM is not yet as scalable as the Linux
ext3 filesystem using an LVM. And while all these
benchmarks were done using the standard LVM included with Red Hat and
other popular Linux distributions, it's quite possible that an enterprise
targeted LVM, such as those available from either IBM or Veritas, would best even these
results. Therefore, for people not doing RAC who care more about
performance than administrative ease, for now you should stick with the
Linux filesystems and an LVM.

Bert Scalzo is a product architect for Quest Software and a member of
the TOAD development team. He designed many of the features in the TOAD DBA
module. He has worked as an Oracle DBA with versions 4 through
10g and has worked for both Oracle Education and Consulting. Mr. Scalzo
holds several Oracle Masters, as well as a BS, MS and PhD in Computer Science,
an MBA and several insurance industry designations. He can be reached at
bert.scalzo@quest.com or
bert.scalzo@comcast.net.

I'd say if you've got a raid controller in the machine, then I'd definately use the external redundancy. If you choose to go the external route, you really only need the mirror and not striping. ASM is going to stripe whatever you give it and then you will have two different components striping. Not only is that confusing, but that can *at times* work against yourself.
Personally, I don't 100% trust 10g completely yet and I would probably opt to use a separate mirroring mechanism than ASM...

I too am torn between utilizing the hardware (SAN) for RAID-10 or merely turning the disks over to ASM and letting Oracle perform the RAID-10 operations. I seem to find conflicting information regarding 10g RAC on RedHat Linux 4. Oracle MetaLink suggests using ASM for RAID, while Oracle Press suggests the Hardware RAID would be more efficient than software RAID (which I'd think ASM is software RAID). Additionally, my impression is that ASM RAID-10 would allow Oracle to increase performance by utilizing the extents on the mirror if it detects that a particular disk is getting too much I/O. I'm not sure Oracle would know how to access the mirror extents if the RAID-10 were handled at the hardware level, but I could be wrong. In your opinion, how would you configure 8 x 300mb disks to be used in an Oracle 10g RAC environment on RedHat Linux?

Went ahead and Google RAW, don't think I found the right definition, (or did I?), from [[http://www.dpreview.com/learn/?/Glossary/Digital_Imaging/RAW_01.htm][RAW]]:

"Unlike JPEG and TIFF, RAW is not an abbreviation but literally means "raw" as in "unprocessed". A RAW file contains the original image information as it comes off the sensor before in-camera processing so you can do that processing afterwards on your PC with special software."

As far as I can tell, in this context, RAW means "raw disk devices" aka "disk devices without a filesystem" Raw, in a digital imaging context is what you quoted. Raw is better in most unix contexts for oracle as Unix kernels "tend" to optimize for multi-user, multi-process contexts, even the filesystems are designed this way, and do not help a single application accessing massive data amounts so much. For this reason, Oracle tends to need as much resources for itself as it can (safely) get, and direct access to the disk, as opposed to letting the kernel arbitrate between itself and other processes.

I too am involved in a Oracle10g RAC rollout. We have five Dell PE6650 running RHAS 3.0, dual emulex HBAs connected to brocade switches and backed with a EMC Symmetrix storage array which is configured to allocate meta-luns in chunks of 36G each... I also have a CX700 storage device which is used for our dump space.
I just wanted to point out that ASM is slightly different than ASMlib. From what I've found ASMlib merely attempts to put a label on the disks you are planning on using in your ASM managed database. I compare it to a filesystem's label. So, you actually can use ASM managed disks w/o using ASMlib. The main difference is the discovery string used when setting up your ASM diskgroup.
My current issue is that I've seen vgdisplay and lvdisplay commands take up significant system time on my LVM managed ext3 FS for my dump space. Any ideas on how to tune that to not take up so much resources? They are being kicked off about every 15minutes and I've yet to ascertain from where?

Yes - I pointed out in one of my many replies that on the DELL EMC SAN array, I'm using ASM without using ASMlib - since DELL and EMC advise against it. So on that setup I'm using ASM with RAW and no ASMlib - as you point out. But I'm still not too happy with the RAW performance on Linux - it's not the natural and automatic speed demon I'm accustomed to on other UNIX platforms. You seem to have to work to get RAW to outperform, whereas on other UNIX platforms I've worked on the switch to RAW just simply is always faster regardless.
As for the LVM issues, let me think about that and get back to you. I'm on vacation Aug 31 - Sep 11, so I've not been able to do a great job in replying to people's question - since my mind is elsewhere :)

Next time you do this type of testing, I recommend using a much larger database. Also, if you can't use raw devices then you may want to try veritas file system, Veritas claims that it achieves the same I/O performance as raw devices for Oracle.

Yes - I've used Vertias with the Quick-IO options (speed benefits of RAW with none of the management headaches) - and it is my preferred method hands down. But not eveyone has that experience and is willing to pay for a file system when ext3 is free. As for a larger database, I also repeated these tests with a 10 GB database, exact same results. Basically I kept the SGA size relatively small versus the database size, so as to keep it from becoming cached in memory and producing meaningless results.

Excuse me. but Oracle says that ASM is basically a file system in a separate oracle instance. If you give ASM a small database cache then you're basically taking away ASM's file cache. If that's so, then you should also take away the ext3 file system cache in your test.

On the other hand if you gave ASM the same amount of cache as ext3, but on the other hand didn't give the database very much cache, that's wrong too, because that's not following Oracle's best practices. With Oracle the database cache has always been more important than the file system cache. Oracle has always recommended (before ASM) that you set up a small file system cache and let Oracle cache the data in the database buffer cache instead. There reasoning has always been that they are redundant and Oracle's cache is more intelligent because it keeps better track of each block the users need. In fact the buffer cache has increasingly gotten better at this with each new version, because Oracle knows what rows/blocks a user is going to need. It's not a simple LRU list anymore.

I've just realized my first paragraph in my previous post shows my lack of experience with ASM, since the ASM instance doesn't have a database cache. It does have a cache but "show sga shows" the database cache is zero. I don't even know if you could change the cache it does have.

However, I have just a little more to add to my second paragraph.
Oracle has always told me that, if you could, you would not even use the file system cache (I never could get rid of it entirely on HP-UX), because that would give you more RAM for the database cache, which is the more important of the two redundant caches. We're assuming of course that the server is dedicated to Oracle. Given this, maybe ASM is relying on the Oracle database cache to be it's "file system" cache.

By keeping the SGA size small you are giving more memory to the file system cache. This can cause the file system to perform better than ASM. If the memory from the file system cache was given to the SGA buffer cache, then ASM would perform better than ext3. The SGA buffer cache with ASM is more CPU efficient since there is no need to copy to/from the file system cache when doing I/O. The file system cache tends to duplicate a lot of the data that is in the SGA buffer cache, so it uses memory less efficiently. A file system will start performing better than RAW when its buffer cache is significantly larger than the database buffer cache. It would be interesting to know the sizes of the file system cache and the database cache. If the database is cached in the file system cache then the results are as equally meaningless as if the database was cached in the SGA.

For this article and particular tests I used Oracle's new ASMlib version 2.0 - which I mention at the top of the article as one of the new technologies. However, I've also tested (although not part of this article or my first one) using RAW devices with ASM. The reason being is that ASMlib is not yet fully sanctioned for use with certain "EMC" SAN arrays - has something to do with unreliability of writing headers across multiple spindles or something. So in that case I had to use RAW with ASM - and I saw similar performance issues as when doing simple RAW benchmarks - even when making sure to use ASYNCH_IO.

If you read the first paper (referred back to at the beginning), I did benchmark RAW versus cooked versus ASM. RAW performance on Linux for Oracle is not too good. If you refer back to the first apper - I even tried RAW with twice the Orcle cache versus file system with half the cache to offset the memory issues you mention. But RAW still lost by a wide margin. So it made no sense to benchmark it again. The DBA's who'v been asking have been asking about simply file system versus ASM. So that's where I put the focus.

what you forget is to explain how you are comparing a filesystem that caches data in the pagecache versus asm which goes straight thruogh to disk. I am sorry to say but this benchmark has nothing much to do with asm vs lvm but more a raw diskaccess method versus filesystem.

if you have a 1gb database, then depending on how much ram you have, you could be caching the entire file. so read only operations can be hugely faster. and you should give that memory to oracle not to the filesysetm cache.

anyways, it's an interesting comparison but unless you do full DIRECT IO on ext3, your benchmark comparisons on lvm vs asm have not much if any relevance at all.

there are documents written around comparing ocfs versus ext3 versus raw and how the size of the oracle buffercache makes a huge difference depending on filesystem cache used in particular for read data..

if you were to redo your tests, and make sure you compare lvm to asm, eg, either do directio with oracle on ext3, or use raw devices on lvm rather than ext3. then we can talk. until then, this is not very useful information.

Please refer to my other reply. Yes - I know it's not an apples to aples comparison. But the DBA's who read the first article (where I did benchmark RAW and it lost really big time), have simply been asking about file system versus ASM. Basically, most Oracle DBA's seem hesitant to adopt RAW for all the management complexities or headaches it adds. So 95% or more simply see the choice as OS file system versus ASM. So since that's what people asked about after the first article, that's where I placed the focus. Plus - let's not forget the huge limitation with RAW - just 255 devices. For a big database then - RAW cannot scale to size. Neither LVM nor ASM face this shortcoming .....

I did some tests using a 1TB database using ASM and ext3 using ASMLIB and external redundancy (RAID 10 )

What I have seen is in a 32 bit environment when you try to use filesystem , linux takes all the available ram for file caching and severe performance degradation happens later when system start paging (mainly PGA ).

But by using ASM, DIRECT IO happens always and I could see some performance advantage and no paging at all . This was part of a DW migration from 9i to 10g and I was running all kinds of IO testes to verify which storage I should use on 10g

We have RHEL 2.1 on IBM 365's (4 ways) running against an EMC CX700. We are using QLogic 2340 HBAs.

No matter what we do, we cannot achieve greater than about 70 MB/S read throughput on our array using file system tests (with big enough reads to blowout the cache on the array). We are getting the same performance from a single 4+1 Raid 5 group and 9x4+1 Raid 5 groups in a single metalun.

We have tried testing on Redhat RHEL 3 and have recieved similar results.

EMC has been working closely with us and has clearly stated to us that they believe that RHEL will not read more than 80 MB/s (sequential) from a single lun. They have tests in their lab to back that up. The interesting thing is that they say that they can get 180 MB/s doing sequential writes.

I'm currently doing a roll-out on a DELL (EMC) CX700 with 30 disks - and yes, I'm using ASMlib - no real choice in my mind. As I said in the article, if all you care about is speed - and you can somehow ignore the numerous management advantages of ASM (which I cannot), then there is a minor advantage to using file system. But on the EMC array I'm using - EMC says do NOT use ASMlib, not even 2.0 - not supported yet. So I have to use RAW with ASM. And for whatever reason, RAW on Linux is nowhere near the speed demon as RAW on all the other UNIX platforms I've worked on. As for the file system using too much memory - that's all tunable via configuration parameters - so I've avoided that problem. But I did not go into those details because I wanted to stay at a higher level - more Oracle related.

i attended amazon's oracle openworld presentation on asm performance comparisons ... good stuff! read it - it doesn't leave much room for debate. these guys are living those results on their 16node linux cluster.

Trending Topics

Webinar: 8 Signs You’re Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
11am CDT, April 29th

Join Linux Journal and Pat Cameron, Director of Automation Technology at HelpSystems, as they discuss the eight primary advantages of moving beyond cron job scheduling. In this webinar, you’ll learn about integrating cron with an enterprise scheduler.