Im trying to either prove or disprove that the slow front-end (web ui) performance is, in some part, attributed to disk bottleneck. I've run SQLIO to gather some metrics, but they'll always be just numbers without context. For example:

How can I provide context to the above to state definitely that the disk subsystem is underperforming to support a multi client UI tool. All of the articles I've read online only seem to use SQLIO as a tuning tool (optimize within existing hardware scope), not as a tool to determine whether the architecture is suitable or not.

Anyone have any ideas? Am I using the right tool? How can I either make this case or move onto something else?

Alright, as you have stats showing that there is latency from the disk system I would try and prove it.

You're right when you initially said that figures showing latency are hard to completely confirm that the disk system is the bottleneck. You don't have a baseline so it is impossible to determine that that is the problem (the latency from the disk could have always been at those figures and something else is causing the problem).

I would try and prove it by running a query that would retrieve a large number of records from the database.

Initially (if the table has not been queried recently), SQL Server would have to retrieve the data from disk and copy into memory. Once that has been done, subsequent executions of that query would return quickly. This should prove that as the initial execution was longer, the disk system is the bottleneck.

Should be quite a good way of demonstrating the problem by showing the run times of the first query against subsequent queries.

If your system has many different queries executed against it, this would be a good demonstration of where the problem is. However if you have multiple executions of the same query, the problem lies elsewhere.

I would caution against running this on your live system as it would affect performance.

This sounds good and is kinda what I've been doing. Do you know of any MS recommendation white papers that I can use to compare my latency results against to present to non-technical individuals? That would really go a long way to kinda prove that we're trying to get blood from a stone.

Your SQLIO test file size is far too small to be a valid test. I prefer to make it something close to the size of the database or at least 20 GB. Run a short throwaway test to create the test file, and use it for the remaining tests. This makes sure you are reading/writing data from/to the disk, instead of the disk controller cache.

Also, you should really run a series of tests, where you add to the workload on each test until you reach the limit of the acceptable level of Avg Latency, say 40 MS. You can do this by increasing the number of threads on each test.

You also need to run tests for sequential read, sequential write, random read, and random write. Random read and random write are probably the most important for SQL Server.

Also, run these tests with different block sizes, say 4K, 8K, 16K, 32K, and 64K. 64K is probably the most important for SQL Server.

One final suggestion: setup a standard test suite and perform it on every new SQL Server before you put it into production. Make sure the IO performance is OK before you go live. Save the test results for each server so that you can refer to them later and compare them to a new series of tests if there are problems later. An Excel worksheet is a good place to store the data.

I know this seems like a lot of work, but it can save a lot of trouble in the long run, and it gives you hard numbers to show there is a problem. I have managed hundreds of servers and I can tell you from experience that disk IO performance problems are not uncommon, especially with SAN.