Sunday, December 28, 2008

Analyzing the impact of the Vxfs filesystem block size on Oracle

I am usually asked as to what should be the ideal Vxfs filesystem block size for an Oracle DB block size of 16K. I always reply - 8K (maximum on Vxfs).

All along, my reasoning was that if using say 1K filesystem block size, then a 16K oracle block read would end up as 16/1K IO requests to the filesystem and the same for writes. With a filesystem block size of 8K, you would be reduced from 16/1K requests to 16/8K requests - or so I thought..

I decided to test to see what exactly was happening and it proved that I was wrong – at least with respect to Vxfs.

Firstly some background about Vxfs –

Vxfs is an extent based filesystem – meaning it allocates space to files not as blocks, but as extents. Extents are contiguous set of filesystem blocks. Extent sizes vary and also the method of creation of a file greatly influences extent sizing. As a file grows, more extents are added to the file.

The interesting part about Vxfs and extents is that IO is never split across extents and a request for contiguous set of blocks within an extent is satisfied with a single request. If split across extents, then it will result in multiple IO requests – quite similar to how db file scattered read would split a request between oracle extents. From the Vxfs guide -

"By allocating disk space to files in extents, disk I/O to and from a file can be done in units of multiple blocks. This type of I/O can occur if storage is allocated in units of consecutive blocks. For sequential I/O, multiple block operations are considerably faster than block-at-a-time operations. Almost all disk drives accept I/O operations of multiple blocks."

So coming back to Oracle – some test scenarios

I decided to test and see for myself.

The environment is Solaris 9 on a E4900 with Storage Foundation for Oracle Enterprise Edition. Oracle is 10.2.0.3 using VRTS ODM.

I created 2 tablespaces – one on a 1K filesystem and the other on a 8K filesystem. Each had 1 datafile of size 5g.

Identical tables with ~1000 rows were created on both the tablespaces. Indexes were created on both tables on relevant columns.

A vxtrace showed that oracle was issuing requests for 16K or bigger sized requests and they were single IOs. They were not broken up into smaller IO requests as one would have normally expected. I could not use truss because IO requests show up as ioctl calls when using ODM. There was no read I/O smaller than 32 blocks (16K) thus confirming that IOs are not split based on filesystem blocks.

So the reads behave exactly like how it is documented. Oracle will do reads only in multiples of db block sizes. On either a 1K or 8k Vxfs block filesystem, a 16K or multiples of 16K reads would be sequential reads of contigous blocks and hence be satisfied from within a single IO request - as long as the IO request can be met from a single extent.

So from an IO perspective, it really does not matter if using 1K or 8K.

Now there is other aspect to this - file system overhead, fragmentation, extent sizing and space management.

1K filesystem block size would reduce space wastage at a cost of having to manage a lot many blocks (filesytem overhead) whereas 8K filesystem block size would be ideal for an oracle instance using a DB block size of 8K or higher.

From a filesystem management perspective, using 8K filesystem block size makes better sense as Oracle would not ever store data in a size less than the DB Block size. An 8K filesystem block size reduces the number of blocks and correspondingly the filesystem overhead in maintaining these blocks. I do not know if anyone uses a 4K DB Block size any more. All I have seen are 8K and higher.

To reduce fragmentation, it is best if the datafile is using a single extent (as will be when created on a database using VRTS ODM). The extent here refers to the Vxfs Extents and not Tablespace extents. To maintain as a single Vxfs extent, datafiles should never be extended and always new datafiles should be added to increase tablespace capacity.

You can find out the extents allocated to a file by running vxstorage_stats - it is an invaluable tool. Fragmentation status can be identified by running fsadm. Normally when using ODM, fragmentation should be minimal.