Search results matching tag 'SAN'http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&tag=SAN&orTags=0Search results matching tag 'SAN'en-USCommunityServer 2.1 SP2 (Build: 61129.1)Is the SAN dying???http://sqlblog.com/blogs/rick_heiges/archive/2012/12/11/is-the-san-dying.aspxWed, 12 Dec 2012 03:55:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:46614RickHeiges<p>Is the SAN dying?</p><p>The reason that I ask this question is that MSFT has unleashed technologies this year that point in that direction</p><ul><li>Always ON Availability Groups shuns shared storage</li><li>Windows 2012 has Storage Replication Technology that does not require a SAN</li><li>Windows 2012 has Hyper-V Replica Technology that does not require a SAN</li><li>PDW v2 continues to reinforce the approach to avoid shared storage</li></ul><p>I'm not saying that SAN technology does not have its place or does not have benefits inherent to the beast.&nbsp; I'm just pointing out that MSFT has made investments in technology that diminish the need for SANs.&nbsp; </p><p>Thoughts?<br></p>New Project Starting. Got Gas?http://sqlblog.com/blogs/merrill_aldrich/archive/2012/09/10/new-project-starting-got-gas.aspxMon, 10 Sep 2012 22:32:02 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:45118merrillaldrich<p>“Storage is just like gasoline,” said a fellow DBA at the office the other day.</p> <p>This DBA, Mike is his name, is one of the smartest people I know, so I pressed him, in my subtle and erudite way, to elaborate.</p> <p>“Um, whut?” I said.</p> <p>“Yeah. Now that everything is shared – VMs or consolidated SQL Servers and shared storage – if you want to do a big project, like, say, drive to Vegas, you better fill the car with gas. Drive back and forth to work every day? Gas. Same for storage.”</p> <p>This was a light-bulb-above-my-head moment.</p> <p>Now that everything is consolidated onto shared infrastructure, all the way down to complete servers, the way we think about funding IT projects has to change too. It used to be that if you wanted to do a project, you would enumerate what the systems would cost, then price and go buy them. It was like this: this new project will need a <strong>bulldozer</strong> and an <strong>excavator</strong>, and maybe a <a href="http://www.askmen.com/top_10/entertainment_250/285_top_10_list.html">Super-Zooper-Flooper-Do</a>, let’s buy them for the project, and then they will arrive on a truck and we will install them, and the project will move forward. Many people are still thinking this way, but it’s now officially backward. We don’t buy discrete items for projects anymore, we buy a slice of shared infrastructure. And planning for that infrastructure has to change, or you will be, as many organizations are, forever, endlessly, exasperatingly short of it.</p> <h2>Gas Up Early</h2> <p>Imagine you and your friends are cruising down the road on a beautiful day, and someone decides you need to, simply MUST drive to Southern California. Do you at that point look around at each other and say “OK, who has gas money?” Perhaps. But hopefully not if you run a large business.</p> <p>Worse, do you just start driving that direction, and when you get down to 1/8 of a tank, <em>then</em> ask everyone in the car? Again, maybe, but not too many people travel this way who are over 25. I think, anyway.</p> <p>So the obvious question is, and I see this in many companies, why do we pile projects onto shared infrastructure like SAN storage and VM clusters without planning what infrastructure will be required to take us where we want to go? Answer: the organizations haven’t finished shifting their thinking. They think, hey presto, now we don’t <em>need</em> to buy those unique pieces of equipment any more, we just get “free” VMs and databases and storage from that magic bottomless pool. But that’s only the first stage. They haven’t realized yet, at an organizational level, who fills that pool of resources up, and how quickly, and how much it costs.</p> <h2>Watch the Gauge</h2> <p>Part of the difficulty is there’s no single “gas gauge” to tell an organization how the shared infrastructure is doing – you need some clever, forward thinking administrators to do that, and they, in turn need some tools. Further, it’s pretty hard today to estimate what “slice” of shared infrastructure a project will need, and how or whether to literally charge back for that resource. That means you have one arm making plans for all the places the organization will drive, with no idea how much gas is in the tank, and perhaps another arm with its eye on the fuel level, but which doesn’t know what the travel plans are. If you just start driving, at some point someone’s going to be standing by the side of the road with a thumb out and a gas can.</p> <p>And here’s another gotcha – you can’t, from a practical point of view, keep on filling this tank one gallon at a time, while always near empty. It’s not safe or economical. Do you really want to buy disks or shared servers and try to install them monthly? Weekly?</p> <p>So start thinking about your servers and storage as a commodity, and do it now. Try to get your organization to make this simple shift – we don’t buy pieces of equipment for projects anymore. We buy a platform, then estimate how much more of that platform we need for all upcoming work, and to sustain growth, then implement it.</p>Scandalous II: Shh! I am De-duplicating Compressed Backupshttp://sqlblog.com/blogs/merrill_aldrich/archive/2011/04/23/scandalous-ii-shh-i-am-de-duplicating-compressed-backups.aspxSat, 23 Apr 2011 08:01:50 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:35122merrillaldrich<p>This is part II of <a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2011/03/25/scandalous-i-virtualization-is-a-workaround-duck.aspx">two Scandalous posts</a>. Watch, mouth agape, as I run with scissors, right up against prevailing wisdom! Unfollow me now, before it’s too late!</p> <p>Here’s the thing. There are two really outstanding posts out there on the ‘tubez that explain in vivid detail the problems with sending compressed data into a de-duplicating appliance. And these guys are both absolutely right. Everything in their posts is correct, and I would ask that, if you haven’t, you please read them before mine:</p> <p>First, Brent Ozar:</p> <p><a title="http://www.brentozar.com/archive/2009/11/why-dedupe-is-a-bad-idea-for-sql-server-backups/" href="http://www.brentozar.com/archive/2009/11/why-dedupe-is-a-bad-idea-for-sql-server-backups/">http://www.brentozar.com/archive/2009/11/why-dedupe-is-a-bad-idea-for-sql-server-backups/</a>&#160;</p> <p>(And, may I say, well done on the Numero Uno Google result for that post. Very nice!)</p> <p>Next Denny Cherry:</p> <p><a title="http://itknowledgeexchange.techtarget.com/sql-server/sql-backup-compression-and-backup-dedup-are-mortal-enemies/" href="http://itknowledgeexchange.techtarget.com/sql-server/sql-backup-compression-and-backup-dedup-are-mortal-enemies/">http://itknowledgeexchange.techtarget.com/sql-server/sql-backup-compression-and-backup-dedup-are-mortal-enemies/</a>&#160;</p> <p>(A very respectable #3 on the Google-ometer.)</p> <p>Now, I’m not kidding. These guys know their stuff, and they are right. Stop reading right now.</p> <p>&#160;</p> <p>&#160;</p> <p>&#160;</p> <p>&#160;</p> <p>Still here? Ok, now come closer.</p> <p><font size="1">Closer.</font></p> <blockquote> <p><font size="1">Shh.</font></p> </blockquote> <blockquote> <p><font size="1">I studied this whole thing very carefully, and I do it anyway.</font></p> </blockquote> <p><font size="2">While it’s true that de-duplication works poorly with compressed data, and if you compare the de-dupe ratios for “usual” uncompressed files with the de-dupe ratios for compressed files, the compressed data looks very, very bad. But there’s even more to this story, so much more that we decided to, in a limited way, stuff the compressed files into our DDR anyway.</font></p> <p><font size="2">Here’s why:</font></p> <p><font size="2">Both SQL Server backups and file compression are a deterministic process. If you back up the same database twice, and it has the same data pages in it, and those pages are largely unchanged, then the backup files will be substantially the same. This is true if you compress both files with the same algorithm and settings, too – the data in the compressed files will be largely identical. It will not be like any OTHER files on your network, but the two files will be similar to one another.</font></p> <p><font size="2">If you change a small percentage of the data pages in the data file, that will still be true: a compressed backup of the database on, say, Monday will be mostly the same as a compressed backup of the same database, with modest changes, on Tuesday.</font></p> <p><font size="2">What that means is that if I have a 1 TB database, which I do, that produces a 250 GB compressed backup file, and that database receives mainly incremental changes from day to day or week to week, then each successive backup will be similar to the previous one. And if I copy them into a de-duplicating store (at least the one I have to work with) then, while the first file will be basically 100% net new data, the second will de-dupe against the first. It’s not as effective as other types of files, but it does help. Let’s say, for argument, that I get 75% de-duplication of only the two files, instead of the normal 85%+ across many instances of other files, I am still getting 75% de-duplication, and that can be very useful.</font></p> <p><font size="2">Useful how? Well, we have SAN replication married to our de-duplicating store for offsite backup and disaster recovery. That means that each night I have to transmit a LOT of SQL backup data across a WAN to another site. What’s a lot? For me, that just means the pipe is small and the data is much bigger. And that process would go a lot faster if, somehow, by magic, a whole lot of the data were already at the other end of the pipe before I start.</font></p> <p><font size="2">See where I’m going with this? With de-duplicated files, as days and weeks pass, each time we replicate new files from one site to the other, a whole lot of the data <em>is already there at the other site</em>. We only have to transmit the net new data. Even if that’s only 50% (a very poor performance number for de-duplicated storage in most people’s minds) that’s still cutting the data in <em>half.</em> Which is pretty good. Plus it’s compressed, which helps every <em>other</em> aspect of the backup story.</font></p> <p><font size="2">So we have what I think is a good compromise, born out by internal testing:</font></p> <ol> <li><font size="2">Keeping compressed SQL Backups in de-duplicated storage <em>indefinitely,</em> as a replacement for tapes, is impractical. It’s just too expensive. So we keep the SQL Backups in there only for the purpose of DR, and we have a pretty aggressive purge schedule to be rid of old files. The sweet spot seems to be to keep only a week or two.</font></li> <li><font size="2">We use tapes too, for archival purposes, and they have longer retention.</font></li> <li><font size="2">We back up to local (DAS or SAN) disks first at the SQL Server and then copy into the de-duplicating store, so that the backup process performs well and isn’t bottlenecked at the network or at the speed the appliance can receive the files. So backups go to disk, then get copied into the de-dupe store, cancel against whatever is in there, and then it replicates them off site.</font></li> </ol> <p><font size="2">This is not a cheap setup, but it works great. I love it. That 250 GB file I mentioned is available at my other site in a couple of hours, because it’s always mostly there already. Your mileage may vary depending on all the specifics of the technology you have, and, as I said, Brent and Denny are right.</font></p> <p><font size="1">* Professional driver on a closed course; don’t try this at home; no animals were de-duped in the production of this post. </font></p> <p><font size="2"></font></p> <p><font size="2">&#160;</font></p>T-SQL Tuesday #004: Real World SSD’shttp://sqlblog.com/blogs/merrill_aldrich/archive/2010/03/08/t-sql-tuesday-004-real-world-ssd-s.aspxTue, 09 Mar 2010 07:22:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:22981merrillaldrich<p>A contribution for <a target="_blank" href="http://www.straightpathsql.com/archives/2010/03/invitation-for-t-sql-tuesday-004-io/">T-SQL Tuesday #004</a>, hosted by the illustrious Mike Walsh!</p> <p>In the past few weeks I had some correspondence with <a href="http://kendalvandyke.blogspot.com/">Kendal Van Dyke</a> leading up to his SQL Saturday presentation on SSDs, and he got me fired up to share a little of my team’s experience with a real implementation. Over the past four months or so, our IT group at work has deployed a new disk array incorporating enterprise-class fibre channel SSDs for database functions, and I am happy to report it’s been a success. I do have a few nuggets that I learned along the way, which might be useful if you are contemplating a move like this. Some of these tips I’ve talked about before, but they are doubly true in the rarefied (read: expensive) air surrounding SAN SSD’s. You really don’t want to sink this kind of money and not get the maximum possible return!</p> <h2>Project</h2> <p>In the Fall our organization faced a large data migration to a second datacenter, to enhance our disaster recovery and business continuity position. This entailed the purchase of some new infrastructure, and for various reasons the group decided a storage array including SSD’s made strategic sense for the next several years. I had had one eye on SSD’s when they started becoming cost-effective enough to make it into the smaller scale TPC benchmarks – indicating the cost/performance ratio was starting to make sense. Our SAN admin had also been watching them too, for all kinds of reasons, and I think I might have seen her drooling a little when this notion came up.</p> <p>Anyway, we were able to obtain an EMC Symmetrix-class array with three tiers of storage, from SSD’s at the high end to 15k Fibre Channel disks to SATA disks. The array was incorporated into our existing SAN, and attached to a combination of new and repurposed servers, to create production environments for both Oracle and SQL Server. We tested the complete environment as pre-production for several weeks, mainly to validate the IO performance of this new architecture relative to what we had been running, and then have progressively cut production systems over to the equipment.</p> <p>I can say that it’s performed really well – the Oracle and SQL Server systems that have SSD storage on the new array definitely saw a net boost in performance. Most of the time, our Oracle DBA’s report that they see about a 2/3 increase in performance, with some exceptions. I am running a data warehouse function on the SSD’s, and I can say we shaved perhaps 25% off of our ETL/load window by moving to the new hardware. At the same time, I would like to caution that, though it might seem like these things are supernatural in terms of performance, they are not a make-all-your-problems-disappear silver bullet. They are dramatically faster for some particular tasks, but not for everything.</p> <h2>Method</h2> <p>With help from EMC, EMC professional services consultants and our internal staff, we went through this overall process to design and deploy:</p> <ol> <li>We came up with a general size and performance profile of all the existing systems we planned to move and/or consolidate onto the new array. Importantly, we got the data from real performance stats collected off our existing servers and arrays, so we could describe to EMC what the real IO requirements looked like, as well as the capacity needed. IO usually drives the cost of storage more than capacity in the database world, and we had to take pains to make sure our execs and other parties didn’t get sidetracked into looking only at capacity (so many TB) instead of performance.</li> <li>EMC came back with a general design showing how many of what type of disks would be in the array. That&nbsp; design set most of our budget parameters :-). The new array ended up being sort of mid-range, with 46 200 GB STEC Fibre Channel SSD’s, 124 450 GB 15k RPM FC Disks, and 36 1TB 7200 RPM SATA disks.</li> <li>While the array was in the manufacturing and testing queue at EMC, we did a deep dive into the specifics of how to arrange the data on all the available devices. We took about a month’s worth of performance counters from all our existing storage and <a target="_blank" href="http://sqlblog.com/blogs/merrill_aldrich/archive/2009/10/29/using-historical-perf-counters-for-storage-planning.aspx">mashed that into aggregate IO targets</a> for the new servers on the new array. EMC professional services then took that data and proposed a design showing exactly how the LUNs and files would physically lay out. We did a couple of back-and-forth iterations before settling on the final design.</li> <li>The basic guiding principle around the SSDs was to use them for the files that would benefit most: the read-intensive ones. We literally found the read-intensive files or disks on our existing systems from the IO counters and placed those, only, on the SSD tier. The rest of the files are on conventional 15K FC disks. All the SSD’s are set up as RAID5 (reads being the priority) and all the 15K disks are mirrored pairs, for performance.</li> <li>Then The Beast, as I like to call it, was delivered, put together and health-checked by EMC.</li> <li>We moved our pre-production data to the new system, and ran automated performance tests for about a month, just so we knew how to expect this thing to act.</li> <li>Finally, we moved production systems over one at a time.</li> </ol> <h2>Take-Aways</h2> <p>I learned a lot going through this process – and it was really fun, to boot. Here are some basic ideas to think about if you are looking at SSDs in SAN:</p> <ol> <li>If you don’t have a system for gathering IO performance counters on a rolling basis I strongly recommend, as I have <a target="_blank" href="http://sqlblog.com/blogs/merrill_aldrich/archive/2009/10/29/using-historical-perf-counters-for-storage-planning.aspx">before</a>, getting or building one. This capability is vital to right-size your infrastructure and spend money wisely. I think it’s most cost-effective just to buy a monitoring system (and put your valuable working hours into something business-specific that can’t be purchased), but if you can’t afford a monitor, it’s not very hard to build a basic counter-collection system. Worst case, it is possible to log this data with perfmon, but that takes extra effort and can push your schedule out, as you have to gather the data when needed rather than just having it already in a repository.</li> <li>Resist the idea that SSD’s are just so miraculously fast that it’ll be like RAM, or make all your performance worries disappear. In fact they solve mainly one problem – an important problem, but one only. They avoid the penalty of random access to disks, which requires mechanical moving parts to get to some region of a disk, then pass the data on the disk past the disk head. If you suffer from that problem, they will help – a lot. If not, then you should dig deeper and see if SSD’s will really be an advantage for your workload. Writes might not be that much faster, nor sequential reads, depending on the details. You might be best off with a hybrid solution that uses two kinds of storage, and it’s worth considering a design like that. Be realistic about their true performance characteristics and set expectations appropriately.</li> <li>You still have to RAID the devices together. SSD’s can fail just as disks can fail, and if you are running a production system then you will want the redundancy. That obviously has a cost impact. On the other hand, the arithmetic might go like this: if you need, say, 48 disks in RAID 1+0 to get good performance for an existing server (and those disks are probably mostly empty) but SSD’s could get you the same performance (imaginary numbers here) with only 6 or 8 devices in RAID5, then maybe the budget starts to look attractive.</li> </ol> <p>Still, it was really fun last week when I spied our little data warehouse doing over 7,900 IOPS! Sweet!</p>SAN 101 for the DBAhttp://sqlblog.com/blogs/merrill_aldrich/archive/2010/02/16/san-101.aspxWed, 17 Feb 2010 01:45:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:22362merrillaldrich<P>As will become apparent from this post, I am no Storage Admin. My apologies for offending the sensibilities of those who know this topic better than I do!</P>
<P>I get asked occasionally about placing SQL Server data on SAN storage, and I've done it with a few systems, and a lot of smart people helping me,&nbsp;so here's a SAN 101 crib sheet for DBAs who may be new to this.</P>
<H2>SAN Basics</H2>
<OL>
<LI>SAN stands for Storage Area <EM>Network</EM>, like LAN or WAN. The first thing of note is that it's a network, and not a device. Often it's a fiber-optic network, for speed and bandwidth, but sometimes it can be a copper network. It's easy to lose sight of the fact that its a network, because it's not what we all think of as "the network," and because the storage admins are usually the only ones that play there. Hence the colloquial term "a SAN" referring to one storage device. One device on the network is really a Storage Array (SA?), not a Storage Array Network (SAN). Tomato, tom-AH-to, I suppose, but it's important in larger environments to talk about one array separately from the storage network itself.</LI>
<LI>Because it's a network, there's a network protocol for devices to communicate. Usually that is FCP for fiber or iSCSI for copper (with exceptions.) The protocols really send SCSI commands, the same kind many hard disk devices use within a single server. That means, for practical purposes, that devices a server "sees" on the SAN look and act like hard disks rather than network devices. There can be switches in a SAN, and so on, just like other networks.</LI>
<LI>Servers typically connect to a fiber SAN with one or two (maybe more)&nbsp;Host Bus Adapters (HBAs). An HBA just acts as an interpreter, making the devices visible on the storage&nbsp;network appear to the host bus, transparently,&nbsp;as if they were disks. So, if you are building such a setup, you'll have a server with a couple of HBAs, those are connected to&nbsp;a switch or two&nbsp;on the fiber network, which is then connected to one or more storage arrays. If a chunk of storage on one of the arrays is allocated for your server, then you see that storage space&nbsp;as if you'd stuck a hard&nbsp;disk in. Partition, format, write. Simple!</LI></OL>
<H3>Rule 1: There is no Magic</H3>
<P>Storage vendors do have a reputation (they are selling things, after all) to pitch their stuff in a way that makes it seem like all your performance worries will vanish in&nbsp;some kind of&nbsp;magic fiber-optic {poof}. Not so, I'm afraid. Enterprise arrays are good, but they are still subject to the laws of physics. The disks in an array are still disks, the cache is just a cache. This is good for the DBA, in a way, because if you know how to size direct attached storage by spindle count and IOPS, then - whooee! - it's not really that different. There are things that help performance some, like a huge write cache with write re-ordering and fault tolerance, but you don't have to throw out what you know and start over. Lots of random IO still means you need lots of random IO capacity, which still means disks.</P>
<H3>Rule 2: Performance Costs More than Space&nbsp;</H3>
<P>The fundamental mistake I have seen is to hand a SAN administrator (or your SAN vendor) a spec about how much <EM>space</EM> you need for your data. That is a recipe for disaster. What is important is what the <EM>performance</EM> profile is, and, just as with plain ol' disks, IOPS will determine what the hardware has to look like in an array. And you want to do that before you get the price quote, because the performance, not the space, will drive the cost. Ask for a terabyte? You <EM>might</EM> get one 1 TB SATA disk. That won't be happy at all.</P>
<H3>Rule 3: Yes, Direct Attached Storage is Cheaper ... But</H3>
<P>OK, so there's a whole economics problem to be aware of. SAN storage is expensive, sometimes very expensive. In some circumstances, it's still worth it, but the price tag is undeniable. Examples: if you have the same amount of storage (or in the&nbsp;SQL world, amount of storage performance)&nbsp;in direct attached disks, and in SAN storage, the SAN flavor is going to cost a lot more up front. It's pretty easy to spend $1,000 per disk for 15k fiber channel disk drives in an array.</P>
<P>However, with real IT in real organizations of any size, things are more complicated&nbsp;- there isn't one, unchanging and monolithic system. In fact, things, servers, storage requirements, circumstances, applications all change at a breakneck pace. What we do, most of the time, is try to accomodate change and keep things "up." That is life for many IT pros. So it is <EM>not</EM> cheaper to have piles and piles of <EM>unutilized</EM> or <EM>overspec'd </EM>direct attached storage sitting around, married to single servers. If you had to buy twice what you&nbsp;needed, where's the savings? When you buy into a SAN what you are buying is, or should be,&nbsp;flexibility.&nbsp;The flexibility to use it fully, all the time, and to change the storage allocation for all your servers without physically scrapping storage hardware and buying new. If you have a SAN, and it's not providing that value, then you probably wasted your money.</P>
<P>So, if you have one relatively static database server and you're looking at storage options, direct-attached might save you a pile of money. Don't build a SAN for performance around a couple of servers -- there's no compelling performance-per-dollar argument. On the other hand, if you find that you look around your enterprise and see tens or hundreds of servers where there's a huge waste factor for storage in some places, and full disks in others,&nbsp;there's your SAN opportunity. This server over here is low on disk space? OK, let's just add some. That one had too much, and we're wasting space? Take some away. Storage over here is too slow? Migrate it to a&nbsp;set if disks&nbsp;with more performance.</P>
<P>In other words:</P>
<BLOCKQUOTE>
<P>Best case: one SQL Server or cluster&nbsp;with correct direct attached storage, fully utilized. Nice, but this is the real world!</P></BLOCKQUOTE>
<BLOCKQUOTE>
<P>Next best: a heterogeneous, changing&nbsp;environment that uses SAN storage to best advantage</P>
<P>Worse: a collection of over-spec'd DAS hardware that is under 50% utilized, but can't realistically be changed. It seemed cheap at the time, but there's so darn much of it!</P>
<P>Worst: a big, expensive, under-utilized SAN</P></BLOCKQUOTE>
<H3>Rule 4: You Need a Good Relationship with the SAN Admin</H3>
<P>I've blogged about this <A class="" href="http://sqlblog.com/blogs/merrill_aldrich/archive/2009/07/26/san-disk-array-performance-beware-lun-concatenation.aspx" target=_blank>before</A>, but suffice it to say that bad communication with the SAN admin = FAIL. SQL Server often has unique and demanding IO requirements that don't go away just because you have a fancy array. You have to be able to work that out with the storage admins, if you have them, or the vendor, if you are in&nbsp;a smaller shop. Together you will have to talk through the need to separate logs, data and backups, and what the performance profile of each "virtual" disk system needs to be, backed by perf counter data, to prevent the SAN nightmare: "We spent our $5,000,000 and the VP wants to know why it's SLOW."</P>Using Historical Perf Counter Data For Storage Planninghttp://sqlblog.com/blogs/merrill_aldrich/archive/2009/10/29/using-historical-perf-counters-for-storage-planning.aspxFri, 30 Oct 2009 02:42:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:18361merrillaldrichLately I'm faced with a fairly ambitious data center move, and at the same time with an initiative to consolidate sprawling SQL Servers onto centralized clusters. It's a chunk of work, but these two notions have fit together pretty well: as long as we're moving SQL services and touching everything, it seems to be easier to make the consolidation argument to application owners/stakeholders and get that work done at the same time. So it's looking like consolidation and relocation in the same effort. Hopefully it'll be a win squared.
<P>But to come to the point: this collection of projects has led to the purchase of a sizeable new disk array (one chunk of the new storage is even solid state - yee haw!) and that means serious performance and capacity planning. I currently use Spotlight on SQL Server Enterprise to monitor and collect stats about our servers, and it has a nice, automated repository for historical trending baked right in.</P>
<BLOCKQUOTE>
<P>What's that? You don't collect performance stats from your servers over time? Stop burning valuable minutes reading this silly blog and go get (or even build, if you must) a tool! Now! BinGoogle, in no particular order: Idera, Quest, SQL Sentry, Red Gate, and work from there. Sheesh!</P></BLOCKQUOTE>
<P>I happen to have Spotlight available out of the whole group of monitoring products, so I thought I'd share some techniques I have used to project storage requirements from a pile of existing servers onto a new, consolidated set of clusters. The same techniques work for other calculations, like CPU use and so on, I just happen to be laser-focused on storage at the moment. The same techniques probably also work with minor variation against any tool that collects this perf counter data.</P>
<P>The Spotlight repository I have set up is a SQL database that ends up containing hourly performance counter samples for about 30 days. The list of available counters is quite comprehensive, but for this exercise I need the ones the SAN engineers will need to design the LUN layout on our new array. That is:</P>
<BLOCKQUOTE>
<P>IO operations per second, per disk</P>
<P>Bandwidth required (quantity of data read and written per disk, rather than number of operations)</P></BLOCKQUOTE>
<P>Also handy, Spotlight provides Disk Queue values for past time periods, so you can get some idea of which existing disks in your environment are busy, or maybe too busy. We're also able to cross-reference this with the performance counters from our exising SAN arrays, as a double-check.</P>
<P><B>Chunking the Spotlight Repository Data</B></P>
<P>The Spotlight repository schema is fairly easy to understand, especially if you take a peek at some of the provided reporting procs that come with the product. In essence, there's a table full of performance counter samples, and then ancillary tables that list your servers ("monitored objects") and friendly names of counters ("statistic_names," "statistic_keys"). A series of joins will provide a simple output of time, server, and counter value. I went ahead and encapsulated this in a view, so that I could use it for a variety of specific queries:</P><PRE><CODE><SPAN style="COLOR:blue;">CREATE VIEW </SPAN><SPAN style="COLOR:black;">[dbo].[reportingPerfData] <BR></SPAN><SPAN style="COLOR:blue;">AS <BR> SELECT </SPAN><SPAN style="COLOR:black;">a.timecollected</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">ds.datasource_name</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">mo.monitored_object_name</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">sc.statistic_class_name</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">sn.statistic_name</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">sk.statistic_key_value</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">a.raw_value <BR> </SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:black;">spotlight_perfdata a <BR> </SPAN><SPAN style="COLOR:blue;">INNER JOIN </SPAN><SPAN style="COLOR:black;">spotlight_stat_classes sc <BR> </SPAN><SPAN style="COLOR:blue;">ON </SPAN><SPAN style="COLOR:black;">a.statistic_class_id </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">sc.statistic_class_id <BR> </SPAN><SPAN style="COLOR:blue;">INNER JOIN </SPAN><SPAN style="COLOR:black;">spotlight_stat_names sn <BR> </SPAN><SPAN style="COLOR:blue;">ON </SPAN><SPAN style="COLOR:black;">a.statistic_name_id </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">sn.statistic_name_id <BR> </SPAN><SPAN style="COLOR:blue;">INNER JOIN </SPAN><SPAN style="COLOR:black;">spotlight_datasources ds <BR> </SPAN><SPAN style="COLOR:blue;">ON </SPAN><SPAN style="COLOR:black;">sc.datasource_id </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">ds.datasource_id <BR> </SPAN><SPAN style="COLOR:blue;">INNER JOIN </SPAN><SPAN style="COLOR:black;">spotlight_monitored_objects mo <BR> </SPAN><SPAN style="COLOR:blue;">ON </SPAN><SPAN style="COLOR:black;">a.monitored_object_id </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">mo.monitored_object_id <BR> </SPAN><SPAN style="COLOR:blue;">INNER JOIN </SPAN><SPAN style="COLOR:black;">spotlight_stat_keys sk <BR> </SPAN><SPAN style="COLOR:blue;">ON </SPAN><SPAN style="COLOR:black;">a.statistic_key_id </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">sk.statistic_key_id
GO</SPAN></CODE></PRE>
<P>The only tricky bit I found is that some of this is built on the EAV model (labeled rows in a generic table with a variant value column, instead of separate, clearly named columns). EAV just implies the first thing you have to write is a pivot to get the values you need into appropriate columns, cast to the right types. I wish Quest hadn't set it up this way, but it's a small peeve, and just having the data at all is great.</P>
<P>I'm looking for disk stats, so I have a second query that is built on the first, but limits results to disk performance, and pivots the counter values into separate columns. Again, defining this in a view facilitates reuse:</P><PRE><CODE><SPAN style="COLOR:blue;">CREATE VIEW </SPAN><SPAN style="COLOR:black;">dbo.reportingdiskperfdata <BR></SPAN><SPAN style="COLOR:blue;">AS <BR> SELECT </SPAN><SPAN style="COLOR:black;">[Year]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[Month]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[Day]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">daypart</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[Hour]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[Minute]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[server]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">drive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">SUM</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">pctbusy</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">pctbusy</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">SUM</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">readspersec</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">readspersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">SUM</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">writespersec</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">writespersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">SUM</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">mbyteswrittenpersec</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">mbyteswrittenpersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">SUM</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">mbytesreadpersec</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">mbytesreadpersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">SUM</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">iopersec</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">iopersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">SUM</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">queuelength</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">queuelength <BR> </SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:blue;">SELECT </SPAN><SPAN style="COLOR:magenta;">YEAR</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">[Year]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">MONTH</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">[Month]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">DAY</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">[Day]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">CASE <BR> </SPAN><SPAN style="COLOR:blue;">WHEN </SPAN><SPAN style="COLOR:black;">0 </SPAN><SPAN style="COLOR:gray;">&lt;= </SPAN><SPAN style="COLOR:magenta;">DATEPART</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">HOUR</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">) <BR> AND </SPAN><SPAN style="COLOR:magenta;">DATEPART</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">HOUR</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">) &lt; </SPAN><SPAN style="COLOR:black;">6 <BR> </SPAN><SPAN style="COLOR:blue;">THEN </SPAN><SPAN style="COLOR:black;">1 <BR> </SPAN><SPAN style="COLOR:blue;">WHEN </SPAN><SPAN style="COLOR:black;">6 </SPAN><SPAN style="COLOR:gray;">&lt;= </SPAN><SPAN style="COLOR:magenta;">DATEPART</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">HOUR</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">) <BR> AND </SPAN><SPAN style="COLOR:magenta;">DATEPART</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">HOUR</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">) &lt; </SPAN><SPAN style="COLOR:black;">12 <BR> </SPAN><SPAN style="COLOR:blue;">THEN </SPAN><SPAN style="COLOR:black;">2 <BR> </SPAN><SPAN style="COLOR:blue;">WHEN </SPAN><SPAN style="COLOR:black;">12 </SPAN><SPAN style="COLOR:gray;">&lt;= </SPAN><SPAN style="COLOR:magenta;">DATEPART</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">HOUR</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">) <BR> AND </SPAN><SPAN style="COLOR:magenta;">DATEPART</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">HOUR</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">) &lt; </SPAN><SPAN style="COLOR:black;">18 <BR> </SPAN><SPAN style="COLOR:blue;">THEN </SPAN><SPAN style="COLOR:black;">3 <BR> </SPAN><SPAN style="COLOR:blue;">WHEN </SPAN><SPAN style="COLOR:black;">18 </SPAN><SPAN style="COLOR:gray;">&lt;= </SPAN><SPAN style="COLOR:magenta;">DATEPART</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">HOUR</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">) <BR> AND </SPAN><SPAN style="COLOR:magenta;">DATEPART</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">HOUR</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">) &lt; </SPAN><SPAN style="COLOR:black;">24 <BR> </SPAN><SPAN style="COLOR:blue;">THEN </SPAN><SPAN style="COLOR:black;">4 <BR> </SPAN><SPAN style="COLOR:blue;">END AS </SPAN><SPAN style="COLOR:black;">daypart</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">DATEPART</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">HOUR</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">[Hour]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">DATEPART</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">MINUTE</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">[Minute]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">monitored_object_name </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">[server]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">statistic_name</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">CAST</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">statistic_key_value </SPAN><SPAN style="COLOR:blue;">AS VARCHAR</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">100</SPAN><SPAN style="COLOR:gray;">)) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">drive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">CASE <BR> </SPAN><SPAN style="COLOR:blue;">WHEN </SPAN><SPAN style="COLOR:black;">statistic_name </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:red;">'pctbusy' <BR> </SPAN><SPAN style="COLOR:blue;">THEN </SPAN><SPAN style="COLOR:magenta;">CAST</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">raw_value </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">FLOAT</SPAN><SPAN style="COLOR:gray;">) <BR> </SPAN><SPAN style="COLOR:blue;">ELSE </SPAN><SPAN style="COLOR:black;">0.0 <BR> </SPAN><SPAN style="COLOR:blue;">END AS </SPAN><SPAN style="COLOR:black;">pctbusy</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">CASE <BR> </SPAN><SPAN style="COLOR:blue;">WHEN </SPAN><SPAN style="COLOR:black;">statistic_name </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:red;">'readspersec' <BR> </SPAN><SPAN style="COLOR:blue;">THEN </SPAN><SPAN style="COLOR:magenta;">CAST</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">raw_value </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">FLOAT</SPAN><SPAN style="COLOR:gray;">) <BR> </SPAN><SPAN style="COLOR:blue;">ELSE </SPAN><SPAN style="COLOR:black;">0.0 <BR> </SPAN><SPAN style="COLOR:blue;">END AS </SPAN><SPAN style="COLOR:black;">readspersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">CASE <BR> </SPAN><SPAN style="COLOR:blue;">WHEN </SPAN><SPAN style="COLOR:black;">statistic_name </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:red;">'writespersec' <BR> </SPAN><SPAN style="COLOR:blue;">THEN </SPAN><SPAN style="COLOR:magenta;">CAST</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">raw_value </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">FLOAT</SPAN><SPAN style="COLOR:gray;">) <BR> </SPAN><SPAN style="COLOR:blue;">ELSE </SPAN><SPAN style="COLOR:black;">0.0 <BR> </SPAN><SPAN style="COLOR:blue;">END AS </SPAN><SPAN style="COLOR:black;">writespersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">CASE <BR> </SPAN><SPAN style="COLOR:blue;">WHEN </SPAN><SPAN style="COLOR:black;">statistic_name </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:red;">'byteswrittenpersec' <BR> </SPAN><SPAN style="COLOR:blue;">THEN </SPAN><SPAN style="COLOR:magenta;">CAST</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">raw_value </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">FLOAT</SPAN><SPAN style="COLOR:gray;">) / (</SPAN><SPAN style="COLOR:black;">1024 </SPAN><SPAN style="COLOR:gray;">* </SPAN><SPAN style="COLOR:black;">1024</SPAN><SPAN style="COLOR:gray;">) <BR> </SPAN><SPAN style="COLOR:blue;">ELSE </SPAN><SPAN style="COLOR:black;">0.0 <BR> </SPAN><SPAN style="COLOR:blue;">END AS </SPAN><SPAN style="COLOR:black;">mbyteswrittenpersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">CASE <BR> </SPAN><SPAN style="COLOR:blue;">WHEN </SPAN><SPAN style="COLOR:black;">statistic_name </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:red;">'bytesreadpersec' <BR> </SPAN><SPAN style="COLOR:blue;">THEN </SPAN><SPAN style="COLOR:magenta;">CAST</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">raw_value </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">FLOAT</SPAN><SPAN style="COLOR:gray;">) / (</SPAN><SPAN style="COLOR:black;">1024 </SPAN><SPAN style="COLOR:gray;">* </SPAN><SPAN style="COLOR:black;">1024</SPAN><SPAN style="COLOR:gray;">) <BR> </SPAN><SPAN style="COLOR:blue;">ELSE </SPAN><SPAN style="COLOR:black;">0.0 <BR> </SPAN><SPAN style="COLOR:blue;">END AS </SPAN><SPAN style="COLOR:black;">mbytesreadpersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">CASE <BR> </SPAN><SPAN style="COLOR:blue;">WHEN </SPAN><SPAN style="COLOR:black;">statistic_name </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:red;">'iopersec' <BR> </SPAN><SPAN style="COLOR:blue;">THEN </SPAN><SPAN style="COLOR:magenta;">CAST</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">raw_value </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">FLOAT</SPAN><SPAN style="COLOR:gray;">) <BR> </SPAN><SPAN style="COLOR:blue;">ELSE </SPAN><SPAN style="COLOR:black;">0.0 <BR> </SPAN><SPAN style="COLOR:blue;">END AS </SPAN><SPAN style="COLOR:black;">iopersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">CASE <BR> </SPAN><SPAN style="COLOR:blue;">WHEN </SPAN><SPAN style="COLOR:black;">statistic_name </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:red;">'queuelength' <BR> </SPAN><SPAN style="COLOR:blue;">THEN </SPAN><SPAN style="COLOR:magenta;">CAST</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">raw_value </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">FLOAT</SPAN><SPAN style="COLOR:gray;">) <BR> </SPAN><SPAN style="COLOR:blue;">ELSE </SPAN><SPAN style="COLOR:black;">0.0 <BR> </SPAN><SPAN style="COLOR:blue;">END AS </SPAN><SPAN style="COLOR:black;">queuelength <BR> </SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:black;">dbo.reportingperfdata <BR> </SPAN><SPAN style="COLOR:blue;">WHERE </SPAN><SPAN style="COLOR:black;">datasource_name </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:red;">'windows' <BR> </SPAN><SPAN style="COLOR:gray;">AND </SPAN><SPAN style="COLOR:black;">statistic_class_name </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:red;">'logicaldisks' <BR> </SPAN><SPAN style="COLOR:gray;">AND </SPAN><SPAN style="COLOR:black;">statistic_name </SPAN><SPAN style="COLOR:blue;">IN </SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:red;">'readspersec'</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:green;"> <BR> </SPAN><SPAN style="COLOR:red;">'pctbusy'</SPAN><SPAN style="COLOR:gray;">,</SPAN></CODE><CODE><SPAN style="COLOR:red;">'writespersec'</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:red;">'byteswrittenpersec'</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:green;"> <BR> </SPAN><SPAN style="COLOR:red;">'bytesreadpersec'</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:green;"> <BR> </SPAN><SPAN style="COLOR:red;">'iopersec'</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:red;">'queuelength'</SPAN><SPAN style="COLOR:green;"> <BR> <BR> </SPAN><SPAN style="COLOR:gray;">) <BR> AND </SPAN><SPAN style="COLOR:black;">timecollected </SPAN><SPAN style="COLOR:gray;">&gt;= </SPAN><SPAN style="COLOR:magenta;">DATEADD</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:magenta;">DAY</SPAN><SPAN style="COLOR:gray;">,-</SPAN><SPAN style="COLOR:black;">30</SPAN><SPAN style="COLOR:gray;">,(</SPAN><SPAN style="COLOR:blue;">SELECT MAX</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">) <BR> </SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:black;">dbo.spotlight_perfdata</SPAN><SPAN style="COLOR:gray;">))<BR> </SPAN></CODE><CODE><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">pivotperfvals <BR> </SPAN><SPAN style="COLOR:blue;">GROUP BY </SPAN><SPAN style="COLOR:black;">[Year]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[Month]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[Day]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">daypart</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[Hour]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[Minute]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[server]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">drive <BR><BR>GO </SPAN>
</CODE></PRE>
<P>If you look closely, there are two other transformations: as the values are pivoted out into labeled columns and there is an expansion of the timecollected value into a hierarchy of year / month / date / time, and four time periods per day ("daypart," meaning early morning, morning, afternoon, evening) along the lines of an Analysis Services / BI hierarchy. This is just to facilitate reporting later with an Excel Pivot Chart.</P>
<P>With these two views we are almost to the point where we've made information out of our data. </P>
<P><B>From History to Plans</B></P>
<P>I am doing relocation, but also consolidation. That means that in many cases two or more disks from the performance history I have will need to be added together to form a projection of how busy a new, consolidated LUN will be, and then what performance will be required of the disk array for that LUN.</P>
<P>In order to get to the final numbers, I next made a small&nbsp; mapping table right in the Spotlight repository database, and used it to map all the existing disks from all the existing servers onto the planned layout of our new SQL Server clusters.</P><PRE><CODE><SPAN style="COLOR:blue;">CREATE TABLE </SPAN><SPAN style="COLOR:black;">dbo.diskmigrationplan </SPAN><SPAN style="COLOR:gray;">( <BR> </SPAN><SPAN style="COLOR:black;">existingserver </SPAN><SPAN style="COLOR:blue;">VARCHAR</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">128</SPAN><SPAN style="COLOR:gray;">), <BR> </SPAN><SPAN style="COLOR:black;">existingdrive </SPAN><SPAN style="COLOR:blue;">VARCHAR</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">128</SPAN><SPAN style="COLOR:gray;">), <BR> </SPAN><SPAN style="COLOR:black;">newserver </SPAN><SPAN style="COLOR:blue;">VARCHAR</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">128</SPAN><SPAN style="COLOR:gray;">), <BR> </SPAN><SPAN style="COLOR:black;">newdrive </SPAN><SPAN style="COLOR:blue;">VARCHAR</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">128</SPAN><SPAN style="COLOR:gray;">)) <BR><BR></SPAN><SPAN style="COLOR:blue;">INSERT INTO </SPAN><SPAN style="COLOR:black;">dbo.diskmigrationplan <BR> </SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">existingserver</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">existingdrive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">newserver</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">newdrive</SPAN><SPAN style="COLOR:gray;">) <BR></SPAN><SPAN style="COLOR:blue;">SELECT DISTINCT </SPAN><SPAN style="COLOR:black;">[server]</SPAN><SPAN style="COLOR:gray;">,<BR> </SPAN><SPAN style="COLOR:black;">drive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[server]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">drive <BR></SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:black;">dbo.reportingdiskperfdata <BR></SPAN><SPAN style="COLOR:blue;">WHERE </SPAN><SPAN style="COLOR:black;">drive </SPAN><SPAN style="COLOR:gray;">NOT </SPAN><SPAN style="COLOR:blue;">IN </SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:red;">'c:'</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:red;">'d:'</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:red;">'q:'</SPAN><SPAN style="COLOR:gray;">,</SPAN><SPAN style="COLOR:red;">'x:'</SPAN><SPAN style="COLOR:gray;">) <BR></SPAN><SPAN style="COLOR:red;"><BR></SPAN><SPAN style="COLOR:black;">GO<BR></SPAN></CODE></PRE>
<P>This is just a simple, four-column table that lists the existing server name, existing disk and the planned server name and planned disk. Its purpose is to allow creation of a query that can first sum the performance counters for separate disks that will be consolidated, and then average the performance of the two disks together, over time, to estimate the demand on the consolidated disk. Changing the mapping changes the plan for consolidation, so I can fine tune the old-to-new server plans.</P>
<P>Once I have that mapping table filled in, then edited to relate existing disks to new disks, I can get to a query that will show the projected demand in the new server environment:</P><PRE><CODE><SPAN style="COLOR:green;">-- Sum the counter values from old servers/drives to new servers/drives <BR></SPAN><SPAN style="COLOR:blue;">CREATE VIEW </SPAN><SPAN style="COLOR:black;">reportingdiskperfprojection <BR></SPAN><SPAN style="COLOR:blue;">AS <BR> SELECT </SPAN><SPAN style="COLOR:black;">[Year]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[Month]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[Day]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">daypart</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[Hour]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[Minute]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">drive </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">mp.newserver </SPAN><SPAN style="COLOR:gray;">+ </SPAN><SPAN style="COLOR:red;">' ' </SPAN><SPAN style="COLOR:gray;">+ </SPAN><SPAN style="COLOR:black;">mp.newdrive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">pctbusy </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:magenta;">SUM</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">pctbusy</SPAN><SPAN style="COLOR:gray;">), <BR> </SPAN><SPAN style="COLOR:black;">readspersec </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:magenta;">SUM</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">readspersec</SPAN><SPAN style="COLOR:gray;">), <BR> </SPAN><SPAN style="COLOR:black;">writespersec </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:magenta;">SUM</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">writespersec</SPAN><SPAN style="COLOR:gray;">), <BR> </SPAN><SPAN style="COLOR:black;">mbyteswrittenpersec </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:magenta;">SUM</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">mbyteswrittenpersec</SPAN><SPAN style="COLOR:gray;">), <BR> </SPAN><SPAN style="COLOR:black;">mbytesreadpersec </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:magenta;">SUM</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">mbytesreadpersec</SPAN><SPAN style="COLOR:gray;">), <BR> </SPAN><SPAN style="COLOR:black;">iopersec </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:magenta;">SUM</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">iopersec</SPAN><SPAN style="COLOR:gray;">), <BR> </SPAN><SPAN style="COLOR:black;">queuelength </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:magenta;">SUM</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">queuelength</SPAN><SPAN style="COLOR:gray;">) <BR> </SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:black;">dbo.reportingdiskperfdata pd <BR> </SPAN><SPAN style="COLOR:blue;">INNER JOIN </SPAN><SPAN style="COLOR:black;">dbo.diskmigrationplan mp <BR> </SPAN><SPAN style="COLOR:blue;">ON </SPAN><SPAN style="COLOR:black;">mp.existingserver </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">pd.[server] <BR> </SPAN><SPAN style="COLOR:gray;">AND </SPAN><SPAN style="COLOR:black;">mp.existingdrive </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">pd.drive <BR> </SPAN><SPAN style="COLOR:blue;">GROUP BY </SPAN><SPAN style="COLOR:black;">[Year]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[Month]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[Day]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">daypart</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[Hour]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[Minute]</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">mp.newserver </SPAN><SPAN style="COLOR:gray;">+ </SPAN><SPAN style="COLOR:red;">' ' </SPAN><SPAN style="COLOR:gray;">+ </SPAN><SPAN style="COLOR:black;">mp.newdrive <BR><BR>GO</SPAN>
</CODE></PRE>
<P><B>Chart it Out</B></P>
<P>I've been able to use the output from this last view in two reporting techniques. In order to make a friendly, detailed picture of the performance profile of each disk, I plugged Excel 2007's Pivot Chart straight into the view. This allows me to examine each disk and look at a graph of each counter over the whole time span, to get an idea of when the disk is busy, and how peak usage compares to average usage, etc. It's as simple as clicking the Pivot Chart button, linking the data source for the chart up to the view in SQL Server, then pulling the data columns into the standard pivot table rows / columns / values areas.</P>
<P><IMG height=466 src="http://home.comcast.net/~merrill.aldrich/DiskActivityGraph.GIF" width=600></P>
<P><B>Aggregate</B></P>
<P>Lastly, while the charts showing each disk are handy, it's pretty time consuming to plow through <I>each</I> disk and jot down estimates. Instead, I thought a final bit of SQL work could give me an overview of target numbers for all the disks. The trick here is how to aggregate the results. I don't care so much when a disk is idle, so periods of low use just don't really matter - but they do have to be excluded. I also don't care about spikes or outliers on the high end of the spectrum. One thing I took away from the "engineering lite" classes I had in Architecture school: when you design a building air conditioning system, you don't design around the very hottest days that might happen (that'd be a waste of money). You design around the average of most of the hot days, so that the system keeps the building cool practically all the time, but you haven't wasted money with extra capacity just for the one hottest day in five years.</P>
<P>In order to do this, I need to take the consolidated data for each planned disk (that is, add up the perf counters for disks that will land on consolidated servers) and then compute the average and max values for the normal busy periods. With row_number() ranking, this is pretty easy to do: I have a query rank the 720 available samples of a counter for the whole month in descending order, then from that list I discard the top 5 or so to eliminate spikes or outliers, and then average together the next 100 samples. I'm sure there's a statistics rule and a term for this, like "difference of mean sum square deviation over moving time average," but here I have to confess to knowing practically nothing about statistics beyond what I put in layman's terms above :-).</P>
<P>The query for one such aggregation:</P><PRE><CODE><SPAN style="COLOR:blue;">SELECT </SPAN><SPAN style="COLOR:black;">rankedioperseccounters.drive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">AVG</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">iopersec</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">avgiopersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:blue;">MAX</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">iopersec</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">maxiopersec <BR></SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:blue;">SELECT </SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">drive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[rank] </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">Row_number</SPAN><SPAN style="COLOR:gray;">() </SPAN><SPAN style="COLOR:blue;">OVER</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">PARTITION </SPAN><SPAN style="COLOR:blue;">BY </SPAN><SPAN style="COLOR:black;">drive </SPAN><SPAN style="COLOR:blue;">ORDER BY </SPAN><SPAN style="COLOR:black;">iopersec </SPAN><SPAN style="COLOR:blue;">DESC</SPAN><SPAN style="COLOR:gray;">), <BR> </SPAN><SPAN style="COLOR:black;">iopersec <BR> </SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:black;">dbo.reportingdiskperfprojection</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">rankedioperseccounters <BR></SPAN><SPAN style="COLOR:blue;">WHERE </SPAN><SPAN style="COLOR:black;">[rank] </SPAN><SPAN style="COLOR:gray;">BETWEEN </SPAN><SPAN style="COLOR:black;">5 </SPAN><SPAN style="COLOR:gray;">AND </SPAN><SPAN style="COLOR:black;">105 <BR></SPAN><SPAN style="COLOR:blue;">GROUP BY </SPAN><SPAN style="COLOR:black;">drive<BR></SPAN></CODE><CODE><SPAN style="COLOR:green;"> </SPAN></CODE></PRE>
<P>Then, a beastly looking query can do them all at once. This is just that same query repeated for each metric:<BR></P><PRE><CODE><SPAN style="COLOR:green;">-- Top 101 of 720 counter values from all samples, clipping max 5 <BR></SPAN><SPAN style="COLOR:blue;">SELECT </SPAN><SPAN style="COLOR:black;">iopersec.drive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">avgiopersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">maxiopersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">avgmbytesreadpersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">maxmbytesreadpersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">avgmbyteswrittenpersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">maxmbyteswrittenpersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">avgpctbusy</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">maxpctbusy</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">avgqueuelength</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">maxqueuelength <BR></SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:blue;">SELECT </SPAN><SPAN style="COLOR:black;">rankedioperseccounters.drive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">AVG</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">iopersec</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">avgiopersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:blue;">MAX</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">iopersec</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">maxiopersec <BR> </SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:blue;">SELECT </SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">drive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[rank] </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">Row_number</SPAN><SPAN style="COLOR:gray;">() <BR> </SPAN><SPAN style="COLOR:blue;">OVER</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">PARTITION </SPAN><SPAN style="COLOR:blue;">BY </SPAN><SPAN style="COLOR:black;">drive </SPAN><SPAN style="COLOR:blue;">ORDER BY </SPAN><SPAN style="COLOR:black;">iopersec </SPAN><SPAN style="COLOR:blue;">DESC</SPAN><SPAN style="COLOR:gray;">), <BR> </SPAN><SPAN style="COLOR:black;">iopersec <BR> </SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:black;">dbo.reportingdiskperfprojection</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">rankedioperseccounters <BR> </SPAN><SPAN style="COLOR:blue;">WHERE </SPAN><SPAN style="COLOR:black;">[rank] </SPAN><SPAN style="COLOR:gray;">BETWEEN </SPAN><SPAN style="COLOR:black;">5 </SPAN><SPAN style="COLOR:gray;">AND </SPAN><SPAN style="COLOR:black;">105 <BR> </SPAN><SPAN style="COLOR:blue;">GROUP BY </SPAN><SPAN style="COLOR:black;">drive</SPAN><SPAN style="COLOR:green;"> <BR> </SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">iopersec <BR> </SPAN><SPAN style="COLOR:magenta;">LEFT </SPAN><SPAN style="COLOR:blue;">JOIN </SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:blue;">SELECT </SPAN><SPAN style="COLOR:black;">rankedmbytesreadperseccounters.drive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">AVG</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">mbytesreadpersec</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">avgmbytesreadpersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:blue;">MAX</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">mbytesreadpersec</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">maxmbytesreadpersec <BR> </SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:blue;">SELECT </SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">drive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[rank] </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">Row_number</SPAN><SPAN style="COLOR:gray;">() <BR> </SPAN><SPAN style="COLOR:blue;">OVER</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">PARTITION </SPAN><SPAN style="COLOR:blue;">BY </SPAN><SPAN style="COLOR:black;">drive </SPAN><SPAN style="COLOR:blue;">ORDER BY </SPAN><SPAN style="COLOR:black;">mbytesreadpersec </SPAN><SPAN style="COLOR:blue;">DESC</SPAN><SPAN style="COLOR:gray;">), <BR> </SPAN><SPAN style="COLOR:black;">mbytesreadpersec <BR> </SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:black;">dbo.reportingdiskperfprojection</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">rankedmbytesreadperseccounters <BR> </SPAN><SPAN style="COLOR:blue;">WHERE </SPAN><SPAN style="COLOR:black;">[rank] </SPAN><SPAN style="COLOR:gray;">BETWEEN </SPAN><SPAN style="COLOR:black;">5 </SPAN><SPAN style="COLOR:gray;">AND </SPAN><SPAN style="COLOR:black;">105 <BR> </SPAN><SPAN style="COLOR:blue;">GROUP BY </SPAN><SPAN style="COLOR:black;">drive</SPAN><SPAN style="COLOR:green;"><BR> </SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">mbytesreadpersec <BR> </SPAN><SPAN style="COLOR:blue;">ON </SPAN><SPAN style="COLOR:black;">iopersec.drive </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">mbytesreadpersec.drive <BR> </SPAN><SPAN style="COLOR:magenta;">LEFT </SPAN><SPAN style="COLOR:blue;">JOIN </SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:blue;">SELECT </SPAN><SPAN style="COLOR:black;">rankedmbyteswrittenperseccounters.drive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">AVG</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">mbyteswrittenpersec</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">avgmbyteswrittenpersec</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:blue;">MAX</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">mbyteswrittenpersec</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">maxmbyteswrittenpersec <BR> </SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:blue;">SELECT </SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">drive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[rank] </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">Row_number</SPAN><SPAN style="COLOR:gray;">() <BR> </SPAN><SPAN style="COLOR:blue;">OVER</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">PARTITION </SPAN><SPAN style="COLOR:blue;">BY </SPAN><SPAN style="COLOR:black;">drive </SPAN><SPAN style="COLOR:blue;">ORDER BY </SPAN><SPAN style="COLOR:black;">mbyteswrittenpersec </SPAN><SPAN style="COLOR:blue;">DESC</SPAN><SPAN style="COLOR:gray;">), <BR> </SPAN><SPAN style="COLOR:black;">mbyteswrittenpersec <BR> </SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:black;">dbo.reportingdiskperfprojection</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">rankedmbyteswrittenperseccounters <BR> </SPAN><SPAN style="COLOR:blue;">WHERE </SPAN><SPAN style="COLOR:black;">[rank] </SPAN><SPAN style="COLOR:gray;">BETWEEN </SPAN><SPAN style="COLOR:black;">5 </SPAN><SPAN style="COLOR:gray;">AND </SPAN><SPAN style="COLOR:black;">105 <BR> </SPAN><SPAN style="COLOR:blue;">GROUP BY </SPAN><SPAN style="COLOR:black;">drive</SPAN><SPAN style="COLOR:green;"> <BR> </SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">mbyteswrittenpersec <BR> </SPAN><SPAN style="COLOR:blue;">ON </SPAN><SPAN style="COLOR:black;">iopersec.drive </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">mbyteswrittenpersec.drive <BR> </SPAN><SPAN style="COLOR:magenta;">LEFT </SPAN><SPAN style="COLOR:blue;">JOIN </SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:blue;">SELECT </SPAN><SPAN style="COLOR:black;">rankedpctbusycounters.drive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">AVG</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">pctbusy</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">avgpctbusy</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:blue;">MAX</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">pctbusy</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">maxpctbusy <BR> </SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:blue;">SELECT </SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">drive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[rank] </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">Row_number</SPAN><SPAN style="COLOR:gray;">() <BR> </SPAN><SPAN style="COLOR:blue;">OVER</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">PARTITION </SPAN><SPAN style="COLOR:blue;">BY </SPAN><SPAN style="COLOR:black;">drive </SPAN><SPAN style="COLOR:blue;">ORDER BY </SPAN><SPAN style="COLOR:black;">pctbusy </SPAN><SPAN style="COLOR:blue;">DESC</SPAN><SPAN style="COLOR:gray;">), <BR> </SPAN><SPAN style="COLOR:black;">pctbusy <BR> </SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:black;">dbo.reportingdiskperfprojection</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">rankedpctbusycounters <BR> </SPAN><SPAN style="COLOR:blue;">WHERE </SPAN><SPAN style="COLOR:black;">[rank] </SPAN><SPAN style="COLOR:gray;">BETWEEN </SPAN><SPAN style="COLOR:black;">5 </SPAN><SPAN style="COLOR:gray;">AND </SPAN><SPAN style="COLOR:black;">105 <BR> </SPAN><SPAN style="COLOR:blue;">GROUP BY </SPAN><SPAN style="COLOR:black;">drive</SPAN><SPAN style="COLOR:green;"><BR> </SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">pctbusy <BR> </SPAN><SPAN style="COLOR:blue;">ON </SPAN><SPAN style="COLOR:black;">iopersec.drive </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">pctbusy.drive <BR> </SPAN><SPAN style="COLOR:magenta;">LEFT </SPAN><SPAN style="COLOR:blue;">JOIN </SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:blue;">SELECT </SPAN><SPAN style="COLOR:black;">rankedqueuelengthcounters.drive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:magenta;">AVG</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">queuelength</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">avgqueuelength</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:blue;">MAX</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">queuelength</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">maxqueuelength <BR> </SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:blue;">SELECT </SPAN><SPAN style="COLOR:black;">timecollected</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">drive</SPAN><SPAN style="COLOR:gray;">, <BR> </SPAN><SPAN style="COLOR:black;">[rank] </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">Row_number</SPAN><SPAN style="COLOR:gray;">() <BR> </SPAN><SPAN style="COLOR:blue;">OVER</SPAN><SPAN style="COLOR:gray;">(</SPAN><SPAN style="COLOR:black;">PARTITION </SPAN><SPAN style="COLOR:blue;">BY </SPAN><SPAN style="COLOR:black;">drive </SPAN><SPAN style="COLOR:blue;">ORDER BY </SPAN><SPAN style="COLOR:black;">queuelength </SPAN><SPAN style="COLOR:blue;">DESC</SPAN><SPAN style="COLOR:gray;">), <BR> </SPAN><SPAN style="COLOR:black;">queuelength <BR> </SPAN><SPAN style="COLOR:blue;">FROM </SPAN><SPAN style="COLOR:black;">dbo.reportingdiskperfprojection</SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">rankedqueuelengthcounters <BR> </SPAN><SPAN style="COLOR:blue;">WHERE </SPAN><SPAN style="COLOR:black;">[rank] </SPAN><SPAN style="COLOR:gray;">BETWEEN </SPAN><SPAN style="COLOR:black;">5 </SPAN><SPAN style="COLOR:gray;">AND </SPAN><SPAN style="COLOR:black;">105 <BR> </SPAN><SPAN style="COLOR:blue;">GROUP BY </SPAN><SPAN style="COLOR:black;">drive</SPAN><SPAN style="COLOR:green;"><BR> </SPAN><SPAN style="COLOR:gray;">) </SPAN><SPAN style="COLOR:blue;">AS </SPAN><SPAN style="COLOR:black;">queuelength <BR> </SPAN><SPAN style="COLOR:blue;">ON </SPAN><SPAN style="COLOR:black;">iopersec.drive </SPAN><SPAN style="COLOR:blue;">= </SPAN><SPAN style="COLOR:black;">queuelength.drive <BR></SPAN></CODE></PRE>
<P>Voila! Real figures for moving a large collection of existing sprawl to a few, hopefully tidy clusters. Wish me luck!</P>SAN Disk Array Performance: Beware LUN Concatenationhttp://sqlblog.com/blogs/merrill_aldrich/archive/2009/07/26/san-disk-array-performance-beware-lun-concatenation.aspxMon, 27 Jul 2009 02:17:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:15503merrillaldrich<p>I'd like to pass along a couple of tips for those new to using SAN storage for SQL Server. SAN Storage is quite expensive, and doubly so if your storage doesn't deliver on the performance front. SAN disk arrays are not magic, and sadly they don't just automagically perform well, marketing to the contrary. These are some items I have found helpful in configuring them:</p>
<ul>
<li>Create and maintain a good relationship with your SAN admin. You need to know the details of how the storage is laid out for your SQL Server, and how to describe what you need in a way that she can understand. Don't treat it like a black box.</li>
<li>Find out about the disk offset / alignment issue. I won't cover that here, as it's well documented <a href="http://msdn.microsoft.com/en-us/library/dd758814.aspx">elsewhere</a>. Just be sure to use DISKPART and the ALIGN parameter on Windows 2003 or earlier.</li>
<li>When you spec anything that is demanding performance-wise, to your SAN administrator, don't just ask for a quantity of space -- include a performance threshold in IO's per second. The number of physical disks required in the array, just as with direct attached storage, will most likely be driven by your performance needs and not the capacity you ask for. DB's that need a lot of disks on DAS also need a lot of disks in a SAN attached array, for all the same reasons. (Here's a great recent post about performance-centric design from <a href="http://www.rodcolledge.com/rod_colledge/2009/05/dbas-behaving-badly-410-storage-configuration.html">Rod Colledge</a>.)</li>
</ul>
<p>With larger LUNs, it's common practice to build them by ganging together smaller groups of disks - for example, if you need 24 disks, they might be organized first into LUNs of 6 disks each, then grouped into the 24-disk set (sometimes called a "meta-LUN"). The details of how that is done are vital to performance.
</p>
<p>For the LUN spec, it's fairly straightforward if you have an existing system that you can use as a benchmark (even if it's underperforming). Do project the quantity of storage that you'll need, but don't stop there -- use Perfmon to record the physical disk counters describing IO per second disk performance with the system under load. If the existing system's disk performance isn't adequate, apply a multiplier to get to the requirements for your new LUN. Provide the SAN administrator the quantity of storage, the IO measure in Reads and Writes per Second and the bandwidth you need in MB per second. It often happens that the IO performance measures are the limiting factor, and that they are limited by physical disk hardware, implying that the SAN Admin will have to provide more disks than would be required for raw capacity.</p>
<p>Next, be aware of the issue with Concatenated LUNs vs. Striped Luns. To take the example of the 24 disk LUN above, there are two ways to gang together those groups of 6 disks: striping and concatenation. Concatenation will chain them together in sequence, such that the first six disks work in concert to provide the first 1/4 of the total storage, then when those are full, the second set provides the next 1/4 of the storage, and so on. That may be OK for another app, like file storage, but this is exactly the wrong effect for a database! If you imagine those flashing busy lights on the disk array, it means that as you start using the LUN, the first six disks will light up and the others will be essentially idle. The resulting IO performance will be about that of a 6 disk array, despite all the empty space provided.</p>
<p>Striping, on the other hand, means that the data will be "banded" across all the disks right from the start, and you'll get the corresponding performance as all the disks work together. This is essential, or else the large quantity of (expensive!) disks is basically wasted.</p>
<p>Worth noting: our EMC arrays' management software issues an ominous message when creating a striped meta-LUN like this, warning that it could take a long time, maybe more than a week, to stripe. The key here is that, as far as I can tell, that only applies if you are dealing with LUNs that already have data on them, and that need to be reorganized. Striping a large, empty volume of space doesn't really take that long.</p>
<p>Finally, test the IO performance of the LUN before putting it into production, and make sure it'll deliver what you require. This can be as simple as a <a href="http://sql-server-performance.com/Community/forums/t/2337.aspx">stress-testing script</a> or a utility like <a href="http://support.microsoft.com/kb/231619">SQLIOSim</a>, combined with Performance Monitor's physical disk counters.</p>