Last night the RVaLUG (Richmond, Va Linux Users Group) had Jared Smith, the Fedora Project Leader. We had a great time, with a full room enjoying the talk and discussion afterward along with pizza provided by 5AM.

Below is the video* of Jared's talk and part of the discussion. Just like Linux, our technology will continue to improve as we continue to videotape our speakers and meetings.

In the hopes of keeping the video quality as high as possible, we uploaded the video to veoh.com, which allows larger file sizes than YouTube or Vimeo. It does, however, require the download of a browser plugin to see the full videos. This may not be the final video solution, but is our first effort.

Innovation in IT -- a great talk

The performance of Postgresql: SSD vs consumer HDD -- Summer of 2010

Abstract

I benchmark the database-processing performance of two commercially-available SSDs vs. two traditional SATA HDDs.

Introduction

Here in the 5AM IT department, we are all about server virtualization. We deploy almost everything into either our own virtual server infrastructure or to any number of 3rd party cloud providers. We just LOVE virtual machines.

While virtualization has solved many problems, it’s also created a few issues. One obvious but huge challenge is that each virtual server we add creates yet another system (with new responsibilities like monitoring system status, log aggregation, etc) to manage.

As our virtual server count increases, we’ll need to build a system to reliably collect vital operational data. And hundreds of devices and servers generate thousands of data samples per minute that need to be captured, stored, and reported.

The quest for a low-power server with high-performance I/O

By definition, any server that can handle tens of thousands of pieces of operational data per minute must handle a huge I/O load. This will require a RAID 5, 6, or 10 subsystem. But adding a RAID to a server involves significant cost, complexity, size, and power requirements.

This is something we would love to avoid.

The dream: two redundant, low-power (and low cost) identical 1U servers at each site

1U server types have room for only one (or two) standard SATA hard drives. But SATA drives tend to crumble under any rigorous database load.

So until recently, low-power 1U servers were inadequate because they fail in disk I/O.

Along comes the SSD

The word on the street is that SSD offers RAID-class performance. If true, the introduction of low-cost SSD storage devices makes using low-power 1U servers as a high I/O appliance feasible.

But really, how much better is does the SSD perform over a traditional hard drive (HDD) for an application like a high read/write relational database? Google was unhelpful in locating generic studies on SSD performance in RDBMS environments. Searches for both MySQL and Postgresql benchmark data didn't yield much either.

So I decided to conduct my own small apples-to-apple comparison benchmarks of SSD vs traditional desktop hard drives.

Who's who in this benchmark?

Lets introduce the cast of characters:

Storage Devices to be evaluated:

1.5 TB 7200 RPM Drive – Seagate ST31500341AS

300 GB 10,000 RPM Drive – Western Digital WD30000HLFS

128 GB SSD – Patriot PKA128GS25SSDR

100 GB SSD – Patriot PI100GS25SSDR

Host DB:

Hardware

SG45H7 Shuttle desktop

8 GB RAM

Q9650 3.0 GHz quad core

Host OS: Ubuntu 10.4 running the 2.6.32-24 kernel

Filesystem: EXT4

Client 1:

Hardware

ASUS GS1

4 GB RAM

T7500 2.2 GHz dual core

Host OS: Xubuntu 10.4 running the 2.6.32-24 kernel

Client 2:

Hardware

Dell Vostro 220S

4 GB RAM

E7400 2.8 GHz dual core

Host OS: Windows 7 Professional

Network: Switched gigabit ethernet

Database: Stock Ubuntu Postgresql 8.4.4

How about some pre-test tests?

Before I started the actual database benchmarks, I decided to establish the independent raw read/write performance of each drive. SSD manufacturers always tout these numbers as one of the main ways to judge SSD performance. So my goal was to (a) discover what each device was able to do, and (b) see if the raw read/write claims of the SSD manufacturer bears any resemblance to reality.

Luckily, Ubuntu added a raw performance benchmark tool to its Disk Utility program that allowed me to conduct these raw I/O tests with ease.

Armed with that, I tested each drive.

1.5 TB Seagate:

This hard drive over the past year and a half is the standard 1.5TB hard drive for desktops. The manufacturer claims a sustained data transfer rate of 135MB/sec and an average latency of 4.16 msec.

How does it really perform?

The average read and write rates start good, but end up below what the manufacture says is the maximum (102.9 MB/s read and 118.3 MB/s for write).

The average access time, 15.3 ms in these tests, is nowhere as fast as the 4.16 ms claimed spec.

300 GB Western Digital:

This hard drive was THE standard 10K RPM drive for hardcore gamers and performance workstations. I was not able to locate manufacture claims on a sustained data transfer rates, but they do claim an average seek time of 4.2 msec.

The average read and write rates appear to be more consistent than the 1.5 TB drive but rate about the same overall.The average access time, 7.2 ms, is a lot closer to the manufacture spec of 4.2 ms than what the 1.5 TB drive achieved.

128 GB SSD:

This drive was purchased a few months ago for its explicit compatibility with Mac OS X. The manufacturer claims the drive has the transfer rates of 260 MB/s read and 180 MB/s for write. The manufacturer makes no seek time claims.

The average read and write rates are super consistent of about 130 MB/s but are well below the vendor claims of 260 MB/s read and 180 MB/sfor write . The average access time of .2 ms is an order of magnitude faster than what the mechanical drives can achieve.

100 GB SSD:

I purchased this drive from my local Fry’s because it had the highest read/write rate of any other drive they had in stock (yet was the cheapest price for that class). The manufacturer claims the drive has transfer rates of 285 MB/s read and 275 MB/s for write,and they do not make seek time claims.

The average read and write rates are pretty consistent and come close to the manufacture claims of 285 MB/s read and 275 MB/s for write. Awesome.

The average access time of .1 ms is an order of magnitude faster than the mechanical drives and even faster than the 128 BG SSD drive.

Postgresql benchmark background --- pgbench

The benchmark system used in this evaluation is standard issue with most Postgresql distributions, pgbench (http://developer.postgresql.org/pgdocs/postgres/pgbench.html). In these series of tests, pgbench is being utilized in a mostly default fashion. In that "default" form, pgbench roughly emulates the industry standard TPC-B transactional benchmark.

The rough set of operation performed per transaction on a pre-seeded table set are:

The average transactions per second for each storage device (averaged out over 10 runs):

7200RPM - 34

10K RPM - 47

SSD128 - 245

SSD100 - 184

This was a little unexpected: the 'slow' SSD drive is 1.3 times faster than the 'fast' SSD drive.

I ran this test 10 times over the course of two hours, so these results are confirmed. Weird.

The expected (but pleasing) results, of course, are both SSDs are five to seven times faster than the 7200 RPM hard drive.

Postgresql benchmark #2 – ten million rows, two client machines

In the last test I pre-seeded the test table with ten million 'account' rows with the following command:

client-1:~$ pgbench -h 192.168.101.120 -U userone -i -s 1000 userone

..and..

client-2:~$ pgbench -h 192.168.101.120 -U usertwo -i -s 1000 usertwo

The approximate on disk footprint of that pre-seeded data was around 45 GB of data.

The purpose of pre-seeding two different databases under the same instance of Postgresql was this:

when I ran two different client machines, I wanted each client machine to draw from different account tables. This would make sure that the Postgresql server could not utilize cached data between pgclient sessions. I wanted to make sure data was being pulled directly from disk as much as possible.

The average transactions per second for each storage device (averaged out over 10 runs):

7200RPM - 14

10K RPM - 23

SSD128 - 74

SSD100 - 122

NOTE: These are the tps for only client machine 1. These numbers are not an aggregation of the reported tps from both client machines.

This is more like it. The 'fast' SSD only dipped down 33% from when only the single client machine was running the benchmark. The 'slow' SSD dipped 70% from when only the single client machine was running the benchmark.

Both SSDs are still five to seven times faster than the 7200 RPM hard drive.

Conclusion

While I am not completely shocked, the results are pretty impressive: Without any tuning, modern SSD storage devices are five to fourteen times faster than their rotational brothers using a default 8.4.X Postgresql configuration. Come to think of it, in my 20+ years experience of working with databases, I have never seen the overall performance of database increase by half or an order of magnitude with so little work involved.

Ok, maybe I am a little shocked.

Anyway, here are some followup conditions I may test:

More rows: I plan to bump up the account row count to 400 million. This will create roughly a 90GB footprint for the database. It will be interesting to see how things scale there.

More physical clients: I have a Mac Mini and MacBookPro on my gigabit LAN being lazy. I will interesting to see how things scale with more clients slamming the DB.

RAID: It would great to throw in some tps numbers for a RAID-based system just to get more context for the SSD results.

Different file systems: EXT4 is not the only game in town. My next tests may include XFS and Btrfs as alternative file systems.

UPDATE #1(2010-08-08):I've started a thread on the Postgresql performance mailing list regarding this post. Check it out.