Oracle's enormous 11g release rumbles with an impressive array of performance and management aids, elegant application testing, standbys that earn their keep, and the promise of lower storage requirements

I like to define a five-point touch system for my database upgrades. If the new version doesn't change my life in five ways, then it's not a significant upgrade. I'll typically quantify my need by approximating how many hours I spend each week performing certain tasks, and then estimate how much time the upgrade will save me. If I spend five hours every week dealing with resource usage and the new release will do it automatically, then I figure the upgrade will save me five hours a week. Now all I have to do is quantify four other features the same way, and I can sell it to management.

I expect most Oracle Database shops will find at least five of these life changers in Oracle Database 11g. But there's one feature, Real Application Testing, that's so compelling, it's almost enough reason to upgrade on its own. There's not a shop out there that doesn't make code changes, and they all need a solid way of reproducing production workloads to certify those changes without affecting the production environment. Real Application Testing does the trick.

Combining Database Replay and SQL Performance Analyzer, Real Application Testing allows you to capture a workload and its performance stats and replay it, either on the same box or on another box, and compare the performance results. This level of insight into comparative workloads is something that most database vendors are still struggling with.

Active Data Guard is a bit hampered by difficult installation and poor documentation, at least on Windows, but it's a must-have for any DBA wanting to put idle standby databases to good use. Similarly, Data Guard's new Snapshot Standby feature will help DBAs get a grip on change control or application testing.

I would be cautious in implementing Advanced Compression, though. I don't think it's implemented as well as it could be, and it's expensive enough that the cost benefits won't be realized by the largest portion of customers. However, handled correctly, Advanced Compression can help a DBA achieve a high level of data de-duplication.

Result Cache is another new feature that will give you exactly what you ask for, so it should be used with caution. But the tests I ran showed that beautiful results are possible if you understand the technology, clearly define your goals, and work within the bounds of your system.

Automatic health monitoring and data corruption checks also highlight this release. Oracle Database 11g is both proactive and decisive when the possibility for corruption arises.

Overall, this is a very strong release. It has a couple of bumps, but nothing that should hold anyone back from an upgrade. And there are so many new features, there's bound to be something for everyone -- large and small shops alike.

Tuning and health Are developers getting worse as databases mature, or are they just not able to grow their skills at the same rate the requirements are exploding? Either way, Oracle is doing a lot to help DBAs discover and deal with resource-hogging code.

For starters, Database 11g adds self-learning capabilities to Oracle's automatic SQL tuning. Now the engine detects high-load SQL statements and saves them for tuning during a maintenance window. It can apply some automatic fixes or suggest structural changes, such as indexes. If you're leery of letting the engine do too much tuning, you can let it simply capture your queries and suggest fixes. Then, if the engine turns out to be right most of the time, you can start trusting it more. One problem is that Oracle can't take your entire workload into account, so it may suggest changes that would damage your normal flow for a query that gets run only every so often.

There's more than one way to quantify a runaway workload, and Oracle has risen to the occasion here as well. When workloads go awry, it's typically in one of the three main areas of server performance: CPU, memory, or disk I/O. Usually resource governors quantify a runaway workload by measuring CPU or memory, but Oracle Database 11g has also instituted per-session I/O limits.

These I/O limits allow you to specify a maximum value (either in I/O requests or in megabytes) of work that connections are allowed to perform on the server. I/O limits are a very important addition (especially in the case of large warehouses) because these systems can become disk-bound very easily, and CPU or memory resource capping doesn't adequately address disk contention.

I/O caps can also help DBAs put the kibosh on long-running queries. Because there's no way to natively define a policy that checks whether a query has used 20 percent of the CPU for 20 minutes, for example, DBAs will often write their own checks, and then have the code do something based off of the results. Being able to cap total I/O consumption means no longer having to manage these checks by hand.

And as usual, you can move these long-running queries to a lower-priority resource group as they become a problem. It's like being put in the penalty box at a hockey game. If you're going to write SQL that drags the system down, then you'll be given fewer resources so that you can't affect others. And of course, as a result, your query will take a lot longer.

Result CacheResult Cache is a feature that can make or break you, depending on who's holding the wheel. Result Cache allows you to effectively pin a query result into a special buffer in memory, therefore bypassing the disk lookup for subsequent calls to the same query. You can cache entire queries, subqueries, or even PL/SQL functions. Of course, the query call has to be the same as the cached version -- and therein lies the rub. Quite often, queries differ only in the parameters being passed (the case with most OLTP queries), so you'll have a lot of individual queries being cached, but very little reuse.

To that end, Oracle gives you three different levels in which you can specify your result cache: the database level, the session level, and the query level. There's a lot of power in those options, for both good and evil, so I would suggest sticking with the query or session level unless you've thoroughly tested it at the database level and you know exactly what to expect. At the database level, all query results will be cached, including those that don't need it. And by default, the result cache is assigned a fairly small portion of the buffer, so you're not likely to see tremendous results unless you expand the allocation.

Oracle has implemented Result Cache pretty well, and it does exactly what it's told. In my initial tests, I had no problems seeing results. I tested against just a couple queries to get the feel for it, and I did see vast improvements in my query times. After I bumped up the number of queries and parameterized them, however, I didn't see nearly the same changes as I did the first time. That's not unexpected at all and in no way outlines any kind of shortcoming with the feature. I'm merely pointing out that you need to plan carefully and test thoroughly before you implement it in production.

It should also go without saying that this feature is aimed at improving performance on disk-bound systems and isn't going to do any good on memory-bound systems. If you're already experiencing memory pressure, carving out a buffer in an already constrained system is only going to make things worse. I'd also like to point out that in general, this isn't going to help a great deal with your OLTP workloads. And if you use it on your decision support workloads, be mindful of the query results you're caching. It's easy to forget that even 64-bit systems have memory limits, and that caching a 200-million-row result might not be the best use of your memory resources.

I like Result Cache for what I call nuisance queries, which are lookups you do to satisfy joins or to pull results that rely on table scans -- things of that nature. But Result Cache is a good feature that will let you do pretty much whatever you want to do.

Fighting corruptionOracle Database 11g puts just as much effort into monitoring and recovering from corruption as it puts into query performance. Automatic health monitoring is a really big feature in this category because it runs reactive checks or manual checks. Reactive checks get run when a critical error occurs. These checks can examine database structure, data block integrity, redo integrity, and other conditions. When Oracle runs these checks, it creates a report and quite often suggests fixes.

Automatic quarantine of corrupt undo segments is just plain cool. When Oracle discovers a corrupt undo tablespace, it will quarantine the segment and not allow future transactions to use it. This allows Oracle to contain the damage and keep it from spreading.

The Fast Analyze feature will cure a big headache for DBAs of very large databases. Fast Analyze allows you to scan for table-index corruption much faster than before. This is really important because much of this type of maintenance is done during scheduled maintenance windows, and if the analysis operation can't even complete during the window, then neither can the fix.

Snapshot StandbyData Guard is Oracle's technology for providing a transactionally complete standby database in case of disaster. Data Guard protects against all kinds of system and network failures, and it isn't constrained by location. The standby database can be in the same room or thousands of miles away. But like other fail-over solutions, including remote mirroring and local clustering, the standby is completely idle and unavailable while the primary is online.

A terrific new Data Guard feature, called Snapshot Standby, lets you put the standby database into a temporary read/write mode, allowing you to test database changes while still providing the original HA/DR protection. This feature alone could change the way companies manage best practices around database development, change control, benchmarking, application upgrades, and related tasks.

For example, say you need to make a change to a major stored procedure on your production database. The problem is that, without testing it against your production workload, you have no idea how much, if at all, the change will improve your system performance. Combining Snapshot Standby with Database Replay (see below), you can test limitless scenarios. All you have to do is record your production workload on the primary database, put your standby database into read/write mode, and implement the code changes on the standby. Then you can replay your workload on the standby and compare the performance counters from the replay with those from your initial capture.

When you put your standby into Snapshot Standby mode, it stops applying logs from the primary. (The logs are still sent across, they're just not applied.) When you've finished testing, you can put the standby back in read mode. The standby will then automatically discard all of your changes, return to the state it was in before you tested the new code, and apply the logs that are waiting to be applied. Your standby is never physically out of sync with the primary, only logically.

There are many other scenarios where a Snapshot Standby can come in handy, from troubleshooting production issues to index tuning to disk placement and partitioning. You can use Snapshot Standby to test backups and index reorgs while under heavy user loads as well. The possibilities are practically endless.

There's another practical purpose. One of the biggest problems for DBAs is keeping analysts, developers, and others out of the production system. They all have legitimate reasons for reading data out of the system, but for performance and compliance reasons, you'll typically want to limit their access as much as possible. Typically you'd put together another server for their use and keep it as in sync with the production database as you can, generally through backup/restore or replication. With Snapshot Standby, you can do it all in one system.

Active Data GuardLike Snapshot Standby, Active Data Guard lets you have your standby and use it to. A new option introduced in Database 11g, Active Data Guard allows you to throw a standby instance into read mode to support real-time queries, solving one of the biggest issues businesses have with their OLTP databases: namely, how to separate OLTP and read activity while providing near-real-time reporting.

Active Data Guard not only allows you to make the standby readable, but you can do so with minimal configuration changes on your client. You can point reporting applications at the standby directly, or you can create what is essentially a read service on the primary server that will act as a broker that routes workload based on the type of activity. When the application connects to one service, it writes to the local OLTP database, and when it connects to the other service, the queries get routed to the standby to fulfill the reporting requests. This is powerful functionality that makes it easy for shops to report from their OLTP data.

The downside is that you can't make any permanent changes to the standby, so you're stuck with the indexing and other configurations of the OLTP database. As anyone who has ever put a reporting database together knows, OLTP index requirements are rarely the same as reporting index requirements.

Plus, configuring Active Data Guard, at least on Windows, is no picnic. I spent weeks in e-mail, on the phone, and in online conference with Oracle reps trying to get Active Data Guard up and running. The directions given online are incomplete, and there's no guidance on troubleshooting at all. Had I been alone trying to set this up, it never would have happened.

That said, there's nothing in the Active Data Guard setup that couldn't be done by a wizard, and in fact a wizard will be coming in Grid Control 11g, according to Oracle. Built-in monitoring and alerts would also be an improvement. As it stands, you have to set up your own scripts and alerts to monitor whether your logs are being shipped and applied.

Caveats aside, Active Data Guard is a huge leap forward, and DBAs will love what it can do. After I got the standby up and running, it seemed to perform very well and did exactly what it's supposed to do. I could hit it with queries, write to it, and it would resync with the primary when I switched back. It's also easy to use. It takes just a single statement at the command line to switch between modes.

Real Application TestingReal Application Testing, an option comprising Database Replay and SQL Performance Analyzer, is a new feature in Oracle Database 11g that allows you to capture a workload, replay it on the same system or a different one, and then compare the results. Database Replay will replay your workload exactly as it happened complete with concurrency and timing, allowing you to fully test system changes against your actual production workload. Thus, you can see the true impact of changes to the database (index changes, percentage free, table partitioning, and so on) before introducing those changes into production.

In my tests, Database Replay was easy to configure, and it performed exactly as expected. You have to learn just a couple simple concepts related to setting it up -- such as how to create the directory object in the database to capture the workload and how to start the replay from the command line -- but once you get past them, it's smooth sailing.

My test was a 50-user read/write mixed workload. In between tests, I deleted indexes from my tables so that I could see changes in the report numbers. As expected, dropping the indexes did increase the performance of the writes, while decreasing the performance of the reads. I was able to verify that the replay mechanism re-created all my threads and ran them flawlessly.

Setting up a capture is a simple four- or five-step process. You typically restart the database, set up options for the capture (which parts of the workload to include or exclude), create the directory to save the capture files, and set the capture start time and duration or manually start it and stop it. If you don't want to capture the system-level activity such as background processes or indexing operations, you can easily filter them out. Or if you want to limit the capture to just a specific application or a specific piece of code you were having a problem with, you can isolate them to make it easier to look at the deltas. The GUI makes configuring exclusion or inclusion filters easy to do.

You can replay the workload on the host system itself (as I did) or from one or more clients. If you capture a large workload, Database Replay has a calibration tool that will tell you how many replay clients you will need.

The reports produced are replete with result numbers comparing the capture workload and the replay workload. You can get literally hundreds of different calculations, but while all of these different metrics are available, they're not very pretty. In fact, they're just one table of numbers after another. It would be nice to have some high-level graphs and such to point you in the right direction. You can export the data, however, so you should be able to do something else with it if you wish.

Another way to go about analyzing your workload, complete with graphs, is to simply record the data in SQL Performance Analyzer as you run your replay workloads. The database doesn't know the difference between a workload being fed by 100 users or by a generator hitting it with 100 virtual users, so you can simply capture the live data in Performance Analyzer while you're doing your replay tests.

The easiest way to get bang for your Real Application Testing buck is to use Database Replay in conjunction with Data Guard's Snapshot Standby feature, discussed above, which saves the state of your standby database and allows you to restore it later (along with logs that have been shipped in the meantime). The combination of Database Replay and Snapshot Standby gives you a powerful and flexible way to test changes to your applications. You simply capture the workload on the primary using Database Replay, set your standby in read/write mode, make the changes you need to make, and then replay the workload on the standby. When you've finished testing, you can record your results and put the standby back into a consistent state with the primary.

Figuring out how to accurately test changes in your own real-world environment without doing it in production is one of the biggest problems in database management. Database Replay solves this problem in an easy and elegant way.

Advanced CompressionAnother separately licensed option is Advanced Compression. A way of writing data to disk so that it takes up less space, Advanced Compression can reduce storage costs, reduce memory and network bandwidth requirements, and even improve query performance.

Advanced Compression saves storage space by replacing duplicate values (like the date in orders placed throughout a given day) in a data block with symbols for those values. In other words, Advanced Compression doesn't actually compress the values, but normalizes the data just like database design does. The more repetitive your data is, the more success you're going to have with Advanced Compression. Also, because Advanced Compression works at the block level, the compression ratio will depend on how you order the data in your tables.

In addition to storing more data on disk, Advanced Compression can improve I/O performance. Because more data is being squeezed into less space, it's easier for the database to satisfy queries. The database is physically pulling less data from disk for the same information, so it can do it faster. The catch is that these benefits only extend to table scans, not indexed queries.

So Advanced Compression will improve the performance of some queries, but not others. And it will dramatically reduce storage requirements for some kinds of data, but not others. Further, even for the same kinds of data, overall compression ratios will differ depending on how often the data repeats and how you order (index) the data.

Finally, percent free (PCTFREE), the percentage of a page left empty for inserts or updates, will also be a factor. The better the data "compresses," the more of it will fit on a page, and the fewer pages the engine must traverse to satisfy the query. These gains could be seen everywhere from slightly faster queries to reduced blocking.

I tested Advanced Compression against two separate databases. In the first case, the database was generated from the TPC-C order entry benchmark, which simulates a complete environment for online transaction processing. In the second case, the database was the OLTP Table Compression Test Kit provided by Oracle.

You would expect from Advanced Compression that different tables and different data will behave differently, and this is just what my two tests showed. Two tables in the Oracle Test Kit not only compressed to a quarter of their original size, but reaped big gains in query performance and even improvements in write speeds. The TCP-C database tables compressed far less, ranging from 15 to 57 percent savings, and saw no benefit to query or write performance. In fact, performance took a hit. (See details of the two tests and tables of the results.)

Naturally, the results I got in my tests, and the results that Oracle gets in its tests, will not be the same as the results you get in your tests. Expect your compression results to differ from one table to another as well, sometimes dramatically. And be aware that making changes to the base-level table can change your compression ratio. If business or reporting requirements change (as they often do) and you have to order your table differently, you could lose a good deal of your compression. Worst-case scenario, you might suddenly find that you no longer have enough disk space.

Also note that Advanced Compression could impose a cost in system resources. When I set up the read tests for the TCP-C data, I ordered the tables to cluster all the like data together and then queried from that data specifically. I expected that clustering like data would greatly improve scan performance. But that's not what happened. So unless you test thoroughly and with your real production data, you may be surprised by the results.

Advanced Compression can be a huge asset for the smart DBA who manages it correctly and has the forethought to pull it off and maintain it well, but it adds a level of complexity to making changes that you have to be careful about. And even before you implement it, you'll need to do plenty of regression testing to ensure that your apps will work as before.