Cache Everything Without Breaking the Bank

I’m often surprised at how small the buffer caches allocated to Oracle databases are. When I started as a DBA at Amazon in the early 2000′s, we were moving our Oracle databases to 32-bit Linux. I shudder to recall all the crazy things we used to do, such as fiddling with the stack address space, to get an SGA over 1.2 GB or so. It’s alarming to me that these techniques are still documented for Oracle 11gR2 – I hope no one is still doing this – if you are, you have my sympathies.

In 64-bit architectures, it would seem that there is no reason to have such a small buffer cache. RAM prices have dropped precipitously, the datasets within databases themselves have grown like kudzu. Nonetheless, I still find small Oracle buffer caches wherever I go. The median buffer cache size I observe in our customer base is 1,040 megabytes – which would fit in my 2002 32-bit Linux machine without too much bother.

Here is a histogram of customer buffer cache sizes I’ve encountered since 2011:

The implication of unchecked dataset growth and relatively static buffer cache sizes is that a smaller and smaller proportion of the database is cached. Here is a histogram showing the percentage of a database that can be cached, from the same customers as above (although here I’ve excluded databases smaller than 64 GB):

How much performance impact would increasing these tiny caches have? It is hard to say with certainty – but there are some clues. Oracle has a cache advice feature that populates the v$db_cache_advice table. This feature forecasts the reduction (or increase) in physical reads that would come from growing (or shrinking) the buffer cache. The table includes forecasts for buffer cache sizes up to around twice the current size.

The results are often shockingly non-linear, here is an example table from a customer database showing the forecast proportion of physical reads as the buffer cache increases to twice the current size of 928 MB:

CACHE SIZE – MB

PERCENTAGE OF PHYSICAL READS RELATIVE TO BASELINE

928

100%

960

99.91%

1,040

99.80%

1,120

99.72%

1,200

99.65%

1,280

99.59%

1,360

99.56%

1,440

99.52%

1,520

99.23%

1,600

6.43%

You can see that at around 1,600 MB of buffer cache physical reads drop off dramatically. Evidently, the working set of the database would basically fit inside 1,600 MB, but not 1,520 MB. Increasing our buffer cache from 928 to 1520 gets us basically nothing (less than 1% reduction in physical reads), while going from 1,520 to 1,600 would eliminate over 90% of physical reads.

When I look at the output from v$db_cache_advice on customer databases I see a wide range of impacts from doubling the buffer cache, from basically no impact, to eliminating pretty much all read I/O:

It’s hard to know exactly what the working set size of a database will be, and you can see from the table above that the differences between caching nearly the entire working set and the entire working set can be enormous. As discussed in a prior blog post, reads accounted for 87% of database I/O on the sample I have access to.

Of course, an upper bound on the size of the working set would be the size of the database.

Why wouldn’t you just cache the entire database?

Doing so would eliminate ~87% of database I/O to the SAN in typical cases.

One obvious reason is cost. Assuming I’m interpreting this price list correctly, the cost difference for a single storage engine in a VMAX 40K with 48 GB of RAM, versus one with 256 GB of RAM, is $222,165 – or $1.04 per megabyte. It’s not likely you’ll be able to spend a cool million in order to cache your 1 TB database.

Delphix uses cheap server RAM to cache database data. HP, Cisco, and Dell all offer servers that can house 1 TB or more of RAM. You can price out a Dell PowerEdge M820 with 32 cores and 1.0 TB or RAM for around $45,000.

Delphix’s cache consolidation feature stretches that server RAM a long way, getting much more benefit than you would if you placed it in the database servers themselves. Just as Delphix consolidates shared physical storage blocks among the databases it manages, it’s cache also consolidates the blocks in memory.

Consider a typical scenario with a ~1 TB production database, that has several copies for:

Development

QA

Staging

Production firefighting

Offloading production reports

If each of these copies were managed by the same Delphix Server with 1 TB of RAM, you would be able to cache the entire working set for all the systems:

As an aside, another nice feature of this kind of caching is that each database benefits from the warm Delphix cache created by the others.

vSphere 5.1 has a 1 TB limit for the RAM available to a single VM. Because Delphix is deployed as a virtual machine, it also inherits this 1 TB limit on its cache size.

This 1 TB limit is not very constraining as it turns out. 72% of databases I see at our customers are 1 TB or less, so this approach can cache the entirety of a substantial portion of all databases out there.

Summary:

A decade ago RAM was expensive and 32 bit architectures (when used) limited Oracle buffer cache sizes. Despite exponential decreases in the cost of RAM, and the elimination of architectural barriers to assigning more RAM to Oracle, buffer cache sizes remain tiny as a proportion of the total database size.