I'm currently having trouble with a database server for a DB with lots of small writes and a few in comparison big reads. Read performance is more important because people is involved, writes are performed by automated clients. This database is currently 30GB and will grow up until a hundred or so.

4 Answers
4

32GB of RAM is a nice typical round number for most Server CPU's but if you are looking at Xeon 5500 (Nehalem) CPU's then remember that they are configured optimally in banks of 3 DIMM's per CPU, and multiples of 6 DIMMs for dual socket servers so you will see better performance with 24\48GB RAM rather than 32GB.

There are plenty of servers that will take 10 disks but whether you would consider them reasonably priced or not I can't say. I doubt that you will find any 1U like the R300 that will take 10 disks - HP's 360 G6 can cram 8 2.5" SFF drives into 1U for a basic list price of just over $2K. Dell's R610 is roughly equivalent (it's also a 1U dual socket Xeon 5500 system) but only gets 6 drives internally. Even if you bump up to the DL380\R710 2U class servers

Rackable towers like the HP ML370 G6 run for around $2.5k basic but can take up to 24 drives. Dell's T710 runs to around 16.

The prices above are the minimum list prices for the chassis with 1 CPU, no disks and virtually no RAM. A configured HP ML370 with those drives, 24 GB of RAM and dual Xeon E5540 CPU's will have a list price of around $10k. The bulk of that comes from your drive choices - you could probably save about $1500\$2K by opting for a single socket variant but the marginal cost of the RAM might increase as the 4GB DDR3 DIMM's are about 50% more expensive per GB than 2GB modules.

Edited to add some idea of performance comparisons.

A Xeon 5540 2.53Ghz CPU will be about 50% better clock for clock compared to your existing CPU. A dual socket setup with SQL 2005 should scale out reasonably efficiently so in pure CPU terms a dual socket system like those listed above will have about 3x the cpu power of your R300.

Set up with balanced DDR3 RAM @ 1066Mhz you should see about 3x the memory bandwidth, possibly more since I suspect that your current R300 setup isn't optimal.

In terms of disk performance you're increasing your effective random IO capacity by at least a factor if 5 (obviously enough) but by adding in a decent advanced RAID controller and properly segregating the functionality you will probably see substantially better performance improvements over and above that.

The Tylersburg chipset used in (almost) all of the current Xeon 5500 based systems also separates Memory IO and the rest which is harder to quantify but will invariably be of some benefit.

These systems are many times faster than your existing setup but whether that actually translates into improved performance for your users depends on what your current bottlenecks are. If it's a poorly designed app or network congestion beefing up the server alone isn't going to result in such clear gains.

If you have lots of disks, e.g. in some form of SAN, then RAID10 is the obvious choice as it combines speed and fault tolerance. However in most cases there is a limit on the number of disks that can be fitted in a server, and in that case you need to consider RAID5 as an alternative.

Your 4 disk RAID10 arrays are about the same speed as a 2 disk RAID0, or a bit less than twice the speed of an unraided disk, so they won't be that fast. Using a 4 disk RAID5 instead would give an array with faster streaming read and write speed, but a bit slower random access write speed (I've tested this on a Dell Perc5/i controller and I assume it's true for other controllers).

I would consider using your 10 disks as a 6 disk RAID10 and a 4 disk RAID5. Split the RAID5 into a small C: partition for the OS and the rest as a D: partition for the log files. Logs tend to be sequentially written so the slower random access speed isn't such an issue. Put the data on the RAID10 where they'll benefit from the high random access speed, particularly the high random access write speed.

NB I'm not recommending you organise the disks as RAID10 and RAID5, however I am recommending that you test using the disks in this format and compare the speed to your suggestion of RAID1 + RAID10 + RAID10.

Before I started buying hardware, I would do some investigating with Performance Monitor to make sure that the disks are lagging. If the performance problems are caused by blocking by those "automated clients", you may find that performance on the new server isn't very much different than what you have now.

Edit #1:
I like to look at the Avg. Disk sec/transfer numbers. These are, essentially, a measure of the latency of the drive. I used to use the queue numbers, but you were supposed to divide them by the number of spindles in the RAID array. You don't always know that number, particularly with SAN storage, and the Microsoft Ninjas told me to give up on the queue depth numbers, back in 2002 or so.

(Normally, I watch the sec/transfer numbers, the read and writes per sec numbers (they give a nice feel for how the disk is going along) and the logical and physical page read values (helps to spot when some other, non-SQLServer thing is using the disk and when SQL is just scanning lots of data out of RAM). The last value is under the SQL counters, not the disk counters. And plain old processor utilzation, of course, for each available core (rather than "total").

With the Avg. Disk sec/tranfer values, I just look for average values below 50 ms for data drives and 20 ms for log drives. Note that your current set up has the OS and logs on the same spindles, meaning that they will fight for control of the drive heads and this will drive up your latencies.

Note that high read rates for databases may merely mean bad indexing. It's pretty hard to work through indexing issues on a web forum. With so much RAM in the new box, you may wind up storing almost all of the data in RAM. This will reduce the load on the disks, but you may find yourself with high processor utilization because SQL still needs to scan through all of that data; it's just in RAM, rather than on disk. Having it in RAM will be faster than on disk, but it's still going to be slower than a well-indexed database.

Also, I'd try to keep my data and tempdb on different sets of spindles because when you are doing heavy writes to tempdb, it generally means that you are doing heavy reads from the data (something like select * into #report from huge_table, or possibly use of distinct or group by for large result sets). If your application doesn't use tempdb too much it might not matter very much.

For a well tuned SQL Server, you shouldn't need to store the the entire database in RAM. I've got servers with 12 GB of RAM that server 100's of GB of data.

Also, you could probably save a little money by going with 10KRPM drives for the OS. Once SQL is up and running, it shouldn't touch the boot drive very much at all.

If I were in your spot, I'd be most worried about the lack of RAID redundancy with your current set up, followed by (probably) long backup times on that USB drive.

If I was stuck with that USB drive for a while, I might look at doing a FULL backup once a week, followed by DIFFERENTIAL backups once a day. Depending on how much of your data really changes on any given day, that ought to cut down on the amount of data you have to move to the USB drive.

Is this a dedicated SQL Server? (it should be!) 32GB RAM is fine and CPU seems fine...

Is this server only hosting this single database or is it being used by many applications with different databases?

what is the workload of the server? How many transactions per second (read/write)?

How critical is the data? If it's very important (ie - must not lose it) then bias the build to redundancy. If it has to be 100% available (no downtime) then bias to resiliency (and get a bigger budget). If it's JUST a performance database (i.e. a data warehouse) then you can optimise for performance.

You must look at the performance counters to find out where the current bottleneck is. High disk I/O and disk queues? High CPU? (often caused by limited RAM and context trashing)

Is the SQL code written right? I've seen big machines (and yours is quite big) brought to their knees by poor SQL code and tiny machines (1 CPU, 1GB RAM) run load's of well written and busy databases.

Generally, keep away from RAID 5 (your database size is small enough not to need to extra capacity), RAID10 is best.

Separate DATA and TEMP on different drive arrays.

You should get away with a twin drive RAID 1 for LOGS, a twin drive for TEMP (create as many TEMP datafiles as CPU's across the two drives) and then the DATA can be on a larger 4 to 6 RAID10 array.