If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

James Morle performance whitepaper

"Solid State Disks in an Oracle Environment" White paper excerpt
The New Rules of Deployment
by James A. Morle, Scale Abilities Ltd

About The Author
James Morle is the founder of Scale Abilities Ltd, a specialist consulting company offering both high-end consulting, and unique software products. With over 10 years experience in architecting some of the world’s largest Oracle systems, James is a well respected member of the Oracle community. James is also the author of the critically acclaimed book, Scaling Oracle8i: Building Highly Scalable OLTP System Architectures.

1.0 Introduction
This is yet another paper about solid-state disks (SSDs). So, why should you read such a paper? The reason is simple: One should always reassess the suitability of technology over time. For example, a 1930s Theremin audio synthesizer probably didn’t fit too well, either physically or musically, into orchestras of the time, and yet the entire 80s music genre was dominated by music of the synthesized variety! Things change, and none more quickly than the industry of computing.

2.0 Background
For this paper, I used an SSD from Imperial Technology (http://www.imperialtech.com). Imperial believe strongly in SSDs, and they’ve been making them longer than anybody else. After reading my Sane SAN whitepaper [Morle 2001], Imperial graciously lent me some of their equipment (en-route to a trade show), namely a MegaRAM 1000 with 3GB of storage, in order for me to ascertain the place for SSDs in modern Oracle deployments.

The attribute which has not changed a great deal is the performance of physical disk [Morle 2001]. Combined with the increase in speed of CPUs, this fact makes the performance gulf between processing data and storing data wider than it has ever been. To address this issue, there are two alternatives: caching, and SSDs.
The most common approach with modern hardware is to adopt a large cache in the storage tier. Ranging in size from a few MB to 32GB+, this cache is designed to accelerate both writes (by deferring the physical writes), and reads (by servicing reads from memory). However, there are a few downsides to this:

1. Writes can only occupy a (typically small) percentage of the total cache before they must be flushed to disk to make space for subsequent writes. As soon as this happens, write performance is comparable to physical disk performance.
2. Read caching relies on temporal and spatial locality of reference:
a. Temporal locality is where accesses to the same data are made within a short period.
b. Spatial locality is where accesses are made to data that is close to recently read data.
3. In a multi-cache hierarchy [Morle 1999] such as Oracle database servers, most of the useful temporal and spatial cache hits are already ‘stolen’ by the Oracle buffer cache.

Clearly, there are cases where caching does not provide the required speedup, so let’s look into whether there is a place for SSDs in this modern world.

4.0 Uses for Solid-State Disk
All this talk is very well and good, but what does this make them useful for in this modern Oracle environment? Here’s a few examples, in order of generalized impact, where the pros of SSD can be maximized, whilst limiting the cons. All configurations discussed do not include redundancy, for the sake of clarity.
4.1 Redo Logs!
Big surprise, exactly as they have always been used. However, it’s worth looking at how they help save money and increase performance in a modern environment. In a high-performance database server, I always advocate the use of at least two physical disk drives (plus mirrors), 100% dedicated for use as redo logs [Morle 1999]. The idea behind this is that one log should be written sequentially by the LGWR process, and then switch to the alternate drive on each log switch. This leaves the previous drive free (until the next switch) for the ARCx processes to read for the purposes of archive logs. This technique maximizes the available bandwidth of the drives, and minimizes head movement, thus optimizing write latency and improving commit performance to the end user. Often, this involves doing a very unpopular thing: wasting significant capacity on the drives. There are not many sites that require 146GB+ (2x 73GB drives) of online redo log, and even if they did, it’s likely they would need more bandwidth and less latency than a single physical drive (remember, we only use one at a time) could provide. A more likely scenario is that perhaps 10GB of each drive is used.

I am also a big advocate of RAID 1+0 (mirrored then striped) disk layouts for general high performance use. The problem is, allocating these two disks for redo logs often upsets the applecart on the nice, uniform disk layout.

This is where SSD can help out. By deploying a single SSD, all redo logs can be located away from the RAID 1+0 array, whilst providing low latency writes and high bandwidth reads (for archiving). There is no contention between the log writer and the archiver in this scenario, regardless of whether, for example, operational difficulties have prevented archiving for a while. It gets better…
Server consolidation is becoming popular. Put all your storage into one big SAN, and run your databases off domains of a single machine. Perhaps you run lots of Linux or Windows servers? It’s likely that a SAN is still the centre of the storage universe, even though you operate many small servers. In any of these cases, it’s likely that the same single SSD can support the redo logs of every database. That’s the beauty of zero contention and high bandwidth.
There is another scenario where multiple threads of redo are used and high performance is needed: Real Application Clusters. If the promise of capacity-on-demand for RAC ever becomes the norm, there’s an awful lot of redo threads to find homes for. Not only that, but any one of these redo logs can each become a limiting factor in the performance of the whole cluster. Therefore, a ten-node RAC cluster would need a minimum of twenty physical spindles for redo alone (excluding protection), of which a potentially very small amount of the capacity would be used. Using SSD, a single (small) SSD could provide the service for all nodes, and at a price point that compares well to the standard disk!

As a final word on redo logs: The cache management of many cached disk arrays are optimized to be ‘fair share’ – that is, not allow any one system or disk object to dominate the shared cache resource. In reality, however, certain database objects, namely redo logs, would benefit from having an unfair share of this cache, used only as a write cache. Achieving the same effect on a cached disk array can be a complex and non-intuitive affair!

4.2 Non-Data Tablespaces
SSD can be an effective tool in the ‘support’ tablespace types:
· TEMPORARY
· UNDO (or Rollback Tablespaces prior to 9i)
Both of these tablespace types contain special types of data, for the purpose of supporting Oracle in the provision of its primary function. The TEMPORARY-type of tablespace is used as a staging area during large sorts, and the UNDO-type of tablespace is used to store the undo, or rollback, information for DML operations. They have the following attributes which make them strong SSD candidates:

I consider these tablespace types to be in order of merit, though this will vary between applications. TEMPORARY tablespaces stand to benefit the most, because of the direct write pathology, whilst UNDO are somewhat buffered by the Oracle buffer cache. However, if you are running a pure OLTP system that never sorts on disk, the UNDO tablespace would likely be the best candidate. Checking your I/O statistics in v$filestat and v$tempstat will quickly demonstrate the area demanding the most attention!

HOWEVER, the reality of most commercial (as opposed to scientific) databases is that there is a very definite working set of data that is much smaller than the total size of the database. If this set can be identified, and operationally staged on the SSD, then 80% of the gain will be realized with significantly lower cost than putting the whole database onto SSD. For example, consider the following potential SSD targets:
· The ‘current/active’ partitions of large OLTP systems. When a partition becomes less current (old reservations, for example), they could be copied (using Transportable Tablespaces) to a standard disk area, freeing the SSD for a newer set of data.
· Specific datafiles involved in large batch runs. By using a level 8 10046 trace file, and looking at the ‘db file %’ wait events, in can be easily determined how much speedup certain files (the p1 value) would contribute if placed on SSD.

6.0 Summary
The deployment of Solid-State Disk can be an extremely cost-effective solution to gaining faster, more predictable performance from a SAN. It would be my preference to implement SSD in conjunction with an existing disk-based SAN, using the guidelines in this paper to maximize the benefits and value.