Idle Thoughts – SSD, Redo Logs, and Sector Size

18122011

December 18, 2011

It seems that I have been quite busy lately with computer related tasks that are not associated with Oracle Database, making it difficult to slow down and focus on items that have a foundation in logic. Today I have had some spare time to dig further into the recently released “Oracle Core” book. A Note on page 123 (obviously a logical page number) gave me a moment to pause. Quoting a small portion of the Note:

“Recently disk manufacturers have started to produce disks with a 4KB sector size… From 11.2 Oracle lets the DBA choose which is the lesser of two evils by allowing you to specify the block size for the log file as 512 bytes, 1KB, or 4KB.”

Do you see it yet, the reason to pause?

–

–

–

I am typing this blog article on a laptop that is a bit over a year old with two 256GB Crucial SSD drives in a RAID 0 arrangement. Even though the laptop supports 3Gb/s transfer speeds from the internal drives rather than the more recent 6Gb/s transfer speeds, the drive arrangement is very quick (quick and potentially subject to a very bad data loss if one drive in the RAID 0 array fails), hitting a speed of 539 MB per second in a parallel full table scan.

I believe that the Crucial SSD drives have a 4 KB sector size, much like many other SSD drives. It seems that some of Intel’s higher-end single layer drives use a 16 KB sector size (4000h = 16,384, see page 17 for the X25 series and page 22 for the 510 series). I recall from reading the Oracle Database documentation the following statement:

“Unlike the database block size, which can be between 2K and 32K, redo log files always default to a block size that is equal to the physical sector size of the disk. Historically, this has typically been 512 bytes (512B).

Some newer high-capacity disk drives offer 4K byte (4K) sector sizes for both increased ECC capability and improved format efficiency. Most Oracle Database platforms are able to detect this larger sector size. The database then automatically creates redo log files with a 4K block size on those disks.”

Do you see it yet, the reason to pause?

–

–

–

I recently read an interesting article titled De-Confusing SSD (for Oracle Databases) that inspired several interesting comments. I think that the following two statements in the article contributed to the comment count:

SSD’s base memory unit is a cell, which holds 1 bit in SLC and 2 bits in MLC. Cells are organized in pages (usually 4k) and pages are organized in blocks (512K). Data can be read and written in pages, but is always deleted in blocks. This will become really important in a moment.

* Placing redo logs on SSD is not recommended. Exadata now has a “Smart Flash Logging” feature that uses redo logs on SSD. Note that it uses redo logs *also* on SSD. This feature allows Oracle to write redo in parallel to a file on SSD and a file on the magnetic disk, and finish the operation when one of the calls is successful.

Do you see it yet, the reason to pause?

–

–

–

So, the “Oracle Core” book states that as of Oracle Database 11.2 the DBA is able to select a 512 byte, 1024 byte, or 4096 byte (4 KB) block size for the redo log files. Idle thinking… I have Oracle Database 11.2.0.2 installed on this laptop for educational purposes, let’s try executing a simple SQL statement that retrieves the block size for the redo log files:

So, if the sector size of my Crucial SSD drive is 4 KB, I have a (default) block size for the redo log files that potentially conflicts with the documentation, and if this is not an isolated issue, it might explain (in part) why the De-Confusing SSD (for Oracle Databases) article states that placing redo logs on SSD is not recommended.

Do you see it yet, the reason to pause?

–

–

–

I wonder if in testing redo log performance on SSD drives, if those drives were provided a “fair” test environment, where the BLOCKSIZE of the redo log files was sized appropriately for the write characteristics of SSD drives? Any thoughts?

Thanks for sharing your performance results regarding 512 byte and 4096 byte blocksizes for the redo logs – the results are interesting. I wonder if the native sector size of the SSD drives in an Exadata system are 4 KB or if they are actually larger than that size (for example, the apparent 16 KB sector size of some Intel SSD drives).

In your article, you state that, “elapsed time reduced by 70% [when using the 4096 byte redo log blocksize]”. A marketing person could probably spin your results a couple of different ways:

Regardless of the above number spinning, based on your test results, SSD in the system is still slower than the spinning rust (serial attached SCSI hard drives). Now if only Oracle Database supported a 16 KB blocksize for the redo logs…

Oh boy! I can see the next ROT (rule of thumb) already: “Use 4KB for file system size for redo logs on SSD!”. Regardless of release level, setup, storage architecture, etc.
Agree entirely: I see heaps of reasons to pause. And T-H-I-N-K!
As usual: ECC applies – Experiment! Check! Compare!

Before we’re all saddled with another 10 years of ROT setting in…

I’m curious on how long it’ll be before someone will claim SSD on SAN Fast Provisioning needs a 4K file system block size…
Mark my words: it’ll happen!

You are probably right that someone will take the results out of context and then formulate a general rule. Time to super-size your redo log block size. Let’s see, the table states that I need a 16 KB block size for this Intel SSD… I just type these commands and …

I want to make certain that people understand that I am not recommending to use or not use a 4096 byte block size for redo on SSD. It was but an idle thought while I awaited a LOG FILE SYNC wait to complete. ;-)

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:

<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: