Search results matching tag 'sql'http://www2.sqlblog.com/search/SearchResults.aspx?o=DateDescending&tag=sql&orTags=0Search results matching tag 'sql'en-USCommunityServer 2.1 SP2 (Build: 61129.1)Finally, SSMS will talk to Azure SQL DWhttp://www2.sqlblog.com/blogs/rob_farley/archive/2016/07/12/finally-ssms-will-talk-to-azure-sql-dw.aspxTue, 12 Jul 2016 11:54:14 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:61592rob_farley<p>Don’t get me started on how I keep seeing people jump into Azure SQL DW without thinking about the parallel paradigm. SQL DW is to PDW, the way that Azure SQL DB is to SQL Server. If you were happy using SQL Server for your data warehouse, then SQL DB may be just fine. Certainly you should get your head around the MPP (Massively Parallel Processing) concepts before you try implementing something in SQL DW. Otherwise you’re simply not giving it a fair chance, and may find that MPP is a hindrance rather than a help. Mind you, if you have worked out that MPP is for you, then SQL DW is definitely a brilliant piece of software.</p> <p>One of the biggest frustrations that people find with SQL DW is that you need (or rather, needed) to use SSDT to connect to it. You couldn’t use SSMS. And let’s face it – while the ‘recommended’ approach may be to use SSDT for all database development, most people I come across tend to use SSMS. </p> <p>But now with the July 2016 update of SSMS, <strong>you can finally connect to SQL DW using SQL Server Management Studio</strong>. Hurrah!</p> <p>…except that it’s perhaps not quite that easy. There’s a few gotchas to be conscious of, plus a couple of things that caused me frustrations perhaps more than I’d’ve liked.</p> <p>First I want to point out that at the time of writing, SSMS is still not a supported tool against PDW. You’ve always been able to connect to it to write queries, so long as you can ignore some errors that pop up about NoCount not being supported, but Object Explorer simply doesn’t work, and without Object Explorer, the overall experience has felt somewhat pained.</p> <p>Now, when you provision SQL DW through the Azure portal, you get an interface in the portal that includes options for pausing, or changing the scale, as per this image:</p> <p><img title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_5844B315.png" width="569" height="300" /></p> <p>And you may notice that there’s an option to “Open in Visual something” there. Following this link gives you a button that will open SSDT, and connect it to SQL DW. And this works! I certainly had a lot more luck doing this than simply opening SSDT and putting in some connection details. Let me explain…</p> <p>In that image, notice the “Show database connection strings” link. That’s where you can see a variety of connection strings, and from there, you can extract the information you’ll need to make a connection in either SSDT or SSMS. You know, in case you don’t want to just hit the button to “Open in Visual something”.</p> <p><img title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_4E15A959.png" width="428" height="455" /></p> <p>When I first used these settings to connect using SSDT (rather than using the “Open in…” button), it didn’t really work for me. I found that when I used the “New Query” button, it would give me a “SQLQuery1.sql” window, rather than a “PDW SQLQuery1.dsql” window, and this wasn’t right. Furthermore, if I right-clicked a table and chose the “View Code’ option, I would get an error. I also noticed that when I connected using the “Open in…” button, it would tell me I was connected to version 10.0.8408.8, but when I tried putting the details in myself, it would say version “12.0.2000”. I’ve since found out that this was my own doing, because I hadn’t specified the database to connect to. And this information turned out to be useful for using SSMS too.</p> <p>There is no “Open in SSMS” button in Azure. But you can connect using the standard Connect to Database Engine part of SSMS.</p> <p><img title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_13BCC429.png" width="676" height="466" /></p> <p>And it works! Previous versions would complain about NOCOUNT, and Object Explorer would have a bit of a fit. There’s none of that now – terrific.</p> <p>And you get to see everything in the Object Explorer too, complete with an icon for the MPP database. But the version says 12.0.2000.8 if you connect like this.</p> <p><img title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;margin:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_7BBB7471.png" width="554" height="274" />&#160;</p> <p>To solve this, you need to use the “Options &gt;&gt;&gt;” button in that Connect to Server dialog, and specify the database. Then you’ll make the right connection, but you’ll lose the “Security” folder in Object Explorer.</p> <p><img title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_625601E8.png" width="515" height="571" /></p> <p><img title="clip_image001" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="clip_image001" src="http://sqlblog.com/blogs/rob_farley/clip_image001_1C675F79.jpg" width="632" height="232" /></p> <p>Now, it’s not perfect yet.</p> <p>When I look at Table Properties, for example, I can see that my table is distributed on a Hash, but it doesn’t tell me which column it is. It also tells me that the server I’m connected to is my own machine, rather than the SQL Azure instance. </p> <p><img title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;margin:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_36C9E341.png" width="680" height="616" /></p> <p>I can see what the distribution column is within the Object Explorer, because it’s displayed with different icon, but still, I would’ve liked to have seen it in the Properties window as well. It’s not going to get confused by having a golden or silver key there, as it might in a non-parallel environment, because those things aren’t supported. If they do become supported, I hope they manage to come up with another way of highlighting the distributed column.</p> <p><img title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_05CCC345.png" width="234" height="202" /></p> <p>One rather large frustration is the very promising link on the database to “Open in Management Portal”,</p> <p><img title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_39974A47.png" width="360" height="325" /><sub></sub><sub></sub><sub></sub><sub></sub><sub></sub></p> <p>, which opens a browser within SSMS (not exactly my preferred browser, but it seems like a good use for that feature). I’m okay with this, but following the link to the Query Performance Insight page, I’m immediately disappointed:</p> <p><img title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_2AEF2F13.png" width="550" height="483" /></p> <p>I get that SSMS doesn’t host the most ideal browser for this kind of thing, and that I’m probably going to be running a separate browser anyway, but I’m would like this to be addressed in a future update.</p> <p>Probably my biggest frustration is that when I start a new query, I get this set of warnings:</p> <p><img title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_59CAE7E9.png" width="478" height="203" /></p> <p>…which suggests that it doesn’t really know about SQL DW. I can tell them to be suppressed, so that the dialog doesn’t re-appear, but I don’t like the feeling that the system is attempting them at all.</p> <p>It’s certainly a lot less painful than it was in the past though. I love the fact that I can use the Object Explorer window. I love that I can script objects, in a way that feels way more natural to me than in SSDT.</p> <p>This is SSDT:</p> <p><img title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_73C138BC.png" width="303" height="249" /></p> <p>This is SSMS:</p> <p><img title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_52D056C6.png" width="731" height="254" /></p> <p>, although oddly the SSMS script includes the USE statement at the top, which isn’t supported in SQLDW (I’m sure this won’t be the case for much longer).</p> <p><img title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;margin:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_223F69BF.png" width="357" height="185" /></p> <p>Overall, I’m really pleased that the team has put things in place to make SSMS talk to SQL DW at all. I was beginning to think that SSMS wasn’t going to come to this particular party. This release, despite having some way to go just yet, suggests that I’ll soon be using SSMS more when I’m using SQL DW.</p> <p>And therefore, this topic worthy for Chris Yates’ <a href="https://chrisyatessql.wordpress.com/2016/07/06/t-sql-tuesday-080/" target="_blank">T-SQL Tuesday blog party this month</a> – celebrating the new things that have come along in the SQL world recently.</p> <p><a href="https://chrisyatessql.wordpress.com/2016/07/06/t-sql-tuesday-080/" target="_blank"><img title="TSQL2sDay150x150" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="TSQL2sDay150x150" src="http://sqlblog.com/blogs/rob_farley/TSQL2sDay150x150_5F632544.jpg" width="154" height="154" /></a></p> <p><a href="http://twitter.com/rob_farley" target="_blank">@rob_farley</a></p>Exploring custom visuals in Power BIhttp://www2.sqlblog.com/blogs/rob_farley/archive/2016/05/10/exploring-custom-visuals-in-power-bi.aspxTue, 10 May 2016 04:38:10 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:61206rob_farley<p>One of my favourite things about Power BI is its extensibility. For ages I would hear about how good some other self-service reporting platforms were, and how things in the Microsoft space were lacking. Power View, in particular, was frustratingly limited in what you could do with it, and I felt somewhat disappointed. It was good as a data exploration tool, but simply wasn’t a good reporting environment if you wanted something that was customisable.</p> <p>But in recent times, Power BI has really stepped up, with custom visualisations offering the ability to extend capabilities much further.</p> <p>I hadn’t explored much in the way of custom visuals in Power BI until a while back, even though I was very much aware of the <a href="http://community.powerbi.com/t5/Best-Visual-Contest/con-p/best_visual_contest/tab/entries" target="_blank">competition that was held in September</a>. It had been on my list to explore some of what was possible. And this month, the T-SQL Tuesday topic (hosted by <a href="http://wendyverse.blogspot.com/" target="_blank">Wendy Pastrick</a> – <a href="http://twitter.com/wendy_dance" target="_blank">@wendy_dance</a>) was to <a href="http://wendyverse.blogspot.com.au/2016/04/its-time-for-t-sql-tuesday-78-may-2016.html" target="_blank">learn something new</a> and to blog about it. So it seemed a good idea to learn how to make my own custom visualisation!</p> <p>Now, creativity isn’t exactly my thing. I find it really hard to write songs, for example. I know how to do it – but I quickly become self-critical and get stuck. Writing is easier, because it feels less ‘creative’, and appeals more to the teacher / preacher in me (and I know that takes creativity, especially if you’ve ever seen me present, but it’s different). So sitting down and coming up with a new way of visualising data wasn’t something I was going to do.</p> <p>But I found the <a href="http://community.powerbi.com/t5/Best-Visual-Contest/Synoptic-Panel-by-SQLBI/cns-p/7669" target="_blank">Synoptic Panel</a> visualisation, which is actually the one that won the competition, and learned some of how to use this. This one lets you make your own visualisations based on your own images. It’s really neat.</p> <p>There are two aspects to it. </p> <p>1. <strong>The .pbiviz file</strong>. This is what you add as your custom visualisation to the Power BI Desktop. You can get it from the <a href="https://app.powerbi.com/visuals" target="_blank">Power BI Visuals Gallery</a>, along with a bunch of other cool visuals. It’s the one that looks like a room with lots of colours.</p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_65F512B4.png" width="736" height="406" /></p> <p>2. <strong>Your image, with a bunch of extra mark-up. </strong>This is where the magic happens. </p> <p>You head over to <a title="http://synoptic.design/" href="http://synoptic.design/" target="_blank">http://synoptic.design/</a> where there’s a tool for creating it all. You drop in your image (I went with the LobsterPot logo, which was my own design, in a moment of creativity), and picking the “magic wand” icon, click on the various areas of your image.</p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_247F81BD.png" width="429" height="499" /></p> <p>Here I’ve made just two areas, by clicking on the two parts of the logo. I can name them if I want, but already I’m pretty much done. There’s a big button on the right that says “Export to Power BI”, which lets me download an SVG file. Interestingly, I already had an SVG image of the company logo, but I needed the right markup, so I needed the exported one.</p> <p>Now over in the Power BI Desktop tool, I wanted to see what this looked like. I started by importing the custom visualisation. I clicked on the ellipsis at the end of the “Visualizations” pane, and got the extra one added.</p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_65B01BC5.png" width="202" height="248" /></p> <p>Clicking this to add it to my report, I got a window that looked like this:</p> <p><img title="image" style="margin:5px;display:inline;" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_3BCFA790.png" width="310" height="336" /></p> <p>And I expected to see some way of getting my LobsterPot claw to display. No such luck. I clicked all over, and saw nothing.</p> <p>But it turned out <strong>I just needed to add some data to it first</strong>. Once I’d done that, I got some options:</p> <p><img title="image" style="margin:5px;display:inline;" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_5FF792D0.png" width="295" height="316" /></p> <p>Hitting “SELECT MAP”, I could find my SVG file and I got my claw!</p> <p>The data I had was just two values. I wanted to be able to colour each section a different colour. But the areas of my image were already red, so I coloured my areas in white, setting the saturation to be less for the higher colours (less white, therefore more red), and more for the lower colours (more white, so less red). I made a negative measure to help with this.</p> <p>But quite quickly, I had my claw, with a value of 90 showing quite red, and a value of 50 showing a fainter pink colour.</p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_7D6C748D.png" width="294" height="386" /></p> <p>In hindsight, I should’ve edited my image before I started, making the two areas white, and then I could’ve easily coloured them in red. But in my exploration, I was able to learn some of the capabilities of this useful control (tip: read <a title="https://powerbi.microsoft.com/en-us/blog/visual-awesomeness-unlocked-using-the-synoptic-panel/" href="https://powerbi.microsoft.com/en-us/blog/visual-awesomeness-unlocked-using-the-synoptic-panel/">https://powerbi.microsoft.com/en-us/blog/visual-awesomeness-unlocked-using-the-synoptic-panel/</a>), and I figure that the availability of this visualisation may mean that I never worry about creating my own from scratch.</p> <p>I wonder whether I will one day come up with a visualisation of my own. Perhaps, although for the time being I’ll leave it up to the experts. I can use this one for all kinds of things, I suspect.</p> <p><a href="http://twitter.com/rob_farley" target="_blank">@rob_farley</a></p>How to make text searching go fasterhttp://www2.sqlblog.com/blogs/rob_farley/archive/2016/03/07/how-to-make-text-searching-go-faster.aspxTue, 08 Mar 2016 03:39:01 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:60765rob_farley<p>...but first, let’s look at one oft-forgotten reason why finding a particular piece of text can be slow, if not impossible: collation. This will then provide a useful platform for making it go faster.</p> <p>I say ‘impossible’, but of course it’s never impossible to find something in a database (assuming it’s there). It might take longer, but you can always scan the column for it, starting on the first page and going until you’ve found it. Various things like Full Text Search can help make things easier, but all-too-frequently we see code that searches for SomeText%, or worse: %SomeText%. This is the thing I want to look at – finding non-indexed strings patterns.</p> <p>First let’s remember that if we are hoping to use an index, we need to know what language we’re in. I have <a href="http://bit.ly/RFCollation" target="_blank">spoken before</a> about how I picked up a map in Sweden to find Västerås, but couldn’t find it listed in the map’s index. I didn’t realise that in Swedish, ‘ä’ and ‘å’ are not the same as ‘a’, and found at a different part of the alphabet. When I searched using an English alphabet, I couldn’t find the entry. I might think that ‘Västerås’ and ‘Vasteras’ are the same, but a Swedish person would tell me otherwise. It’s like if I refer to a game as ‘football’, you would need to understand my personal collation setting to know what I was talking about. When Michael Palin sung (as a lumberjack) about wearing high-heels, suspenders and a bra, he wasn’t referring to anything that held his trousers up, despite what people using an American collation setting might think.</p> <p>But this is about making searches for text go faster. If we’re comparing two strings in a different collation we get an error, but let’s think about speed.</p> <p>Consider that I’m looking for rows in a table <font face="Consolas">WHERE CommentText LIKE '%Farl%'</font>. Right away, I’m sure you appreciate that no amount of regular indexing on CommentText would let me perform an ordinary b-tree index search to find that row. I could improve it by using other technologies that will allow the individual words in my text to be found, but I’m just looking for a particular piece of text. It’s not even a whole word.</p> <p>For my experiment, I’m using a table on SQL 2014 on my Surface Pro 2. It’s a larger version of AdventureWorks2012’s Person.Address with 19 million rows. There is a column called AddressLine1, which has collation SQL_Latin1_General_CP1_CI_AS and has type nvarchar(120). You can create it using code like this:</p> <blockquote> <p><font face="Consolas">CREATE TABLE [Person].[Address_Big]( <br />&#160;&#160;&#160; [BigAddressID] [int] IDENTITY(1,1) NOT NULL, <br />&#160;&#160;&#160; [AddressID] [int] NOT NULL, <br />&#160;&#160;&#160; [AddressLine1] [nvarchar](60) NOT NULL, <br />&#160;&#160;&#160; [AddressLine2] [nvarchar](60) NULL, <br />&#160;&#160;&#160; [City] [nvarchar](30) NOT NULL, <br />&#160;&#160;&#160; [StateProvinceID] [int] NOT NULL, <br />&#160;&#160;&#160; [PostalCode] [nvarchar](15) NOT NULL, <br />&#160;&#160;&#160; [SpatialLocation] [geography] NULL, <br />&#160;&#160;&#160; [rowguid] [uniqueidentifier] NOT NULL, <br />&#160;&#160;&#160; [ModifiedDate] [datetime] NOT NULL <br />); <br />go</font></p> <p><font face="Consolas">insert Person.Address_Big <br />select * from Person.Address; <br />go 100</font></p> </blockquote> <p>I ran this query quite a few times, and it took about 40 seconds to tell me there were no rows returned.</p> <blockquote> <p><font face="Consolas">select * <br />from Person.Address_Big <br />where AddressLine1 like N'%Farl%' <br />option (maxdop 1); <br /></font></p> </blockquote> <p>Obviously no one lives on 203 Farley Avenue, or 1 Farlabulous Drive. But nor do they live at 711 Gofarles Street. You see, despite the fact that I had specified ‘Farl’ with a capital F and lower-case ‘arl’, it didn’t care about that at all. My collation setting told it explicitly to ignore case. That’s what the CI was for in SQL_Latin1_General_CP1_CI_AS. In fact, if we query <font face="Consolas">select * from fn_helpcollations() where name = 'SQL_Latin1_General_CP1_CI_AS';</font> we see it says “Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data”. So it’s not only case-insensitive, it’s also kanatype-insensitive and width-insensitive too. </p> <p>Clearly there is a lot more work for it to do, when scanning large amounts of text looking for a group of consecutive characters that could match inexactly.</p> <p>Now, without changing my table at all, I changed my query like so, telling it to use a binary collation for the search. To search exactly rather than inexactly.</p> <blockquote> <p><font face="Consolas">select * <br />from Person.Address_Big <br />where AddressLine1 like N'%Farl%' collate Latin1_General_BIN <br />option (maxdop 1);</font></p> </blockquote> <p>I could’ve used the SQL collation SQL_Latin1_General_CP437_BIN, but I find it easier to remember the Windows collation, and in Australia, the default settings for SQL are to use Windows collations. They’re a little better than the SQL collations <em>[<a href="http://store-xkcd-com.myshopify.com/collections/everything/products/citation-needed-sticker-pack" target="_blank">citation needed</a>]</em>.</p> <p>But anyway – this query returned in just 7 seconds. I re-ran the original one – 40 seconds. I re-ran this one – 7 seconds. It really was significantly faster. The plan is the same. There is no sneakiness going on. <strong>The search for the binary text was simply faster</strong>.</p> <p>This makes sense. If I’m looking for a particular string, it’s going to be quicker if I can just look for the exact bits, and not have to consider what the text might be in a different case, or if width needs to play a part, and so on.</p> <p>Now you might think “Great – I’m going to add that to all my string searches”, but you should understand that there is potential for the results to be different. If there were someone in my table who lived in FARLEY BOULEVARD (in all caps, in the way that French people often write their surnames, for example), then that would have been found in my case-insensitive-collation search, but not in my binary-collation search for the lower-case letters ‘arl’. It’s useful if the data in your system is only stored in capitals, in which case (ha!) you could actually change the collation of your column, but it’s definitely worth considering the benefits of asking for a collation-specific search.</p> <p>And what about grouping, you ask? Ok, maybe I didn’t hear you ask that, but let’s pretend you did.</p> <p>If there’s an index on the column you’re grouping, then changing the collation is going to hurt a bit. Grouping could take advantage of a Stream Aggregate under our indexed collation, but changing the column is like throwing it away the index order (ORDER BY doesn’t get handled well by changing the collation) means a Hash is required. But comparing two query plans that both use Hash Match (Aggregate), one on a case-insensitive collation and one on a binary collation, then I found the latter was slightly faster. Not as drastic a change as searching, but still 10-30% better. One would run in about 12 seconds, and one in about 10.</p> <blockquote> <p><font face="Consolas">select City, count(*) <br />from Person.Address_Big <br />group by City <br />option (maxdop 1);</font></p> <p><font face="Consolas">select City collate Latin1_General_BIN, count(*) <br />from Person.Address_Big <br />group by City collate Latin1_General_BIN <br />option (maxdop 1);</font> <br /></p> </blockquote> <p>Considering what’s going on with a hash function and non-exact strings is actually pretty interesting. HASH(Value) must produce the same value for any two values that are considered equal – such as ‘FARLEY’ and ‘Farley’ in my CI collation. For this to happen, it obviously can’t hash the actual values, it must have to convert the values into a common form that will hash the same way regardless of case, kana, and width. But this is work that is hidden from the query plan. We can see the impact of it through the query speed, but not anywhere in the plan. This will become yet another thing for me to investigate – but not this week before T-SQL Tuesday comes around and I need to publish this post. <a href="https://twitter.com/SQLBob/status/706141720728215553" target="_blank">New father</a> <a href="http://www.bobpusateri.com/" target="_blank">Bob Pusateri</a> (<a href="http://twitter.com/sqlbob" target="_blank">@sqlbob</a>) is hosting this month, about <a href="http://www.bobpusateri.com/archive/2016/02/invitation-to-t-sql-tuesday-76-text-searchingprocessing/" target="_blank">text searching</a>, in case you hadn’t guessed.</p> <p><a href="http://www.bobpusateri.com/archive/2016/02/invitation-to-t-sql-tuesday-76-text-searchingprocessing/" target="_blank"><img title="TSQL2sDay150x150" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="TSQL2sDay150x150" src="http://sqlblog.com/blogs/rob_farley/TSQL2sDay150x150_3B2EC9F9.jpg" width="154" height="154" /></a></p> <p><a href="http://twitter.com/rob_farley" target="_blank">@rob_farley</a></p>A Fitbit report in Power BI for T-SQL Tuesdayhttp://www2.sqlblog.com/blogs/rob_farley/archive/2016/02/15/a-fitbit-report-in-power-bi-for-t-sql-tuesday.aspxTue, 16 Feb 2016 00:20:13 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:60653rob_farley<p>Finding things to publish online for public consumption is something I’m often a bit reluctant to do. Most of my work is for customers, and there’s no way I’m going to share some of their data unless they’ve explicitly allowed it.</p> <p>So when <a href="http://sqlchicken.com" target="_blank">Jorge Segarra</a> (<a href="https://twitter.com/sqlchicken" target="_blank">@sqlchicken</a>) posted a challenge <a href="http://www.sqlchicken.com/2016/02/t-sql-tuesday-75-invitation-jump-into-power-bi/" target="_blank">to publish a Power BI report on the web for T-SQL Tuesday</a>, I had give some thought about what kind of data to show. Luckily, <a href="https://about.me/scottstauffer" target="_blank">Scott Stauffer</a> (<a href="https://twitter.com/sqlsocialite" target="_blank">@sqlsocialite</a>) has been Fitbit-challenging me over recent weeks, and in particular, in some that don’t include some of the big-steppers like <a href="http://stevestedman.com" target="_blank">Steve Stedman</a> (<a href="http://twitter.com/sqlemt" target="_blank">@sqlemt</a>), who should probably change his name to Stepman, considering he has a treadmill desk and does over 100k each week. Anyway – with a group of people who do the same order of magnitude of steps as me, I stretched myself to do better than I had been doing, and figured this could make useful data.</p> <p><a href="http://www.sqlchicken.com/2016/02/t-sql-tuesday-75-invitation-jump-into-power-bi/" target="_blank"><img title="TSQL2sDay150x150" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;margin:5px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="TSQL2sDay150x150" src="http://sqlblog.com/blogs/rob_farley/TSQL2sDay150x150_1DABD076.jpg" width="154" height="154" /></a></p> <p>I started with an export of the last 31 days from Fitbit. That’s as much data as you can pull down from them, and although I could go to the effort of getting extra exports and combining them, I didn’t for this. After all, I’d rather be out getting more steps done than analysing them.</p> <p>I had a bit of cleaning to do first, because the data had an annoying first line. In fact, I found it easier to pull the data in as text, remove the top line, then split the data up by the delimiter. I could then mark the various columns as numbers, which made life a lot easier.</p> <p><img title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;margin:5px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_4B2F80EA.png" width="1108" height="757" /></p> <p>After all this was done, I was ready to throw some stuff onto a report (ok, I also added some measures to show the steps as a percentage of the total steps – they have to be measures to handle the division properly). It was easy to get a chart on there, and a card to show some of the numbers. But I wanted to make it a bit more interesting... so I added an average, to show how my increased steppage made an impact on my average.</p> <p>I added a column: RollinAvgSteps = AVERAGEX(FILTER(fitbit_export_20160214, EARLIER(fitbit_export_20160214[Date])&gt;=fitbit_export_20160214[Date]),fitbit_export_20160214[Steps])</p> <p>...which takes the average of my steps to date. There are a bunch of ways to achieve this, but this is the way that I chose. And you can see that the average line is (happily) improving! Oh, and because I pulled down the extract on the 14th, there’s a dip at the end. My numbers were much healthier by the end of the day, and despite spending way too long NOT walking, I did about 7244 steps that day.</p> <p>You can see the result at <a href="http://bit.ly/RobFitbit" target="_blank">http://bit.ly/RobFitbit</a>, which looks like: </p> <p><a href="http://bit.ly/RobFitbit" target="_blank"><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_40B65540.png" width="776" height="444" /></a></p> <p>Oh, and having done all this, I discovered that fellow <a href="http://sqlsaturday.com/464/eventhome.aspx" target="_blank">SQL Saturday Melbourne</a> precon presenter <a href="http://www.radacad.com/be-fitbit-bi-developer-in-few-steps-first-step-get-data-from-csv" target="_blank">Reza Rad has a series on doing Power BI on Fitbit data</a> – I was pleased to see that he did a similar set of transforms to the data.</p> <p><a href="http://twitter.com/rob_farley" target="_blank">@rob_farley</a></p>Join Effects With UPDATEhttp://www2.sqlblog.com/blogs/rob_farley/archive/2016/01/11/join-effects-with-update.aspxTue, 12 Jan 2016 00:15:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:60419rob_farley<p>A lot of people don’t like UPDATE with a FROM clause. I’m going to explore what’s going on, both logically and within the query plan. And I’m doing this for <a mce_href="http://www.sqlsoldier.com/wp/sqlserver/tsqltuesday74invitationbethechange" target="_blank" href="http://www.sqlsoldier.com/wp/sqlserver/tsqltuesday74invitationbethechange">T-SQL Tuesday #74</a>, hosted by <a mce_href="http://www.sqlsoldier.com" target="_blank" href="http://www.sqlsoldier.com">Robert Davis</a> (<a mce_href="http://twitter.com/sqlsoldier" target="_blank" href="http://twitter.com/sqlsoldier">@sqlsoldier</a>). <a mce_href="http://www.sqlsoldier.com/wp/sqlserver/tsqltuesday74invitationbethechange" target="_blank" href="http://www.sqlsoldier.com/wp/sqlserver/tsqltuesday74invitationbethechange"><img title="TSQL2sDay150x150[3]" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:right;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="TSQL2sDay150x150[3]" width="154" align="right" height="154" src="http://sqlblog.com/blogs/rob_farley/TSQL2sDay150x1503_4D190772.jpg"></a></p> <p>I’m going to use a fresh copy of AdventureWorks2012. I’m using SQL 2012 SP3, but the things I’m looking at should apply to most versions. I’m going to use Production.Product and Production.ProductSubcategory. The Product table has a ProductSubcategoryID column with a foreign key in place, although this column allows nulls, as not every product must be within a subcategory.</p> <p>Our standard UPDATE query doesn’t have a FROM clause. It’s just “UPDATE ... SET ... WHERE ...”, and if we need to hook into other tables, we use sub-queries. Let’s look at why this is significant.</p> <p>The WHERE clause filters rows. A sub-query in the WHERE clause still only filters rows, either by being a scalar expression used in one of the predicates, or being a single-column table expression used in an IN predicate, or a table expression used in an EXISTS clause. Any other tables used in sub-queries in the WHERE clause can only be used to help filter the table being updated – they can’t affect the SET clause at all, or cause a row to be updated multiple times.</p> <p>Some examples are like this:</p> <blockquote> <p><font face="Consolas">UPDATE Production.Product <br>SET DiscontinuedDate = SYSDATETIME() <br>WHERE ProductSubcategoryID = (SELECT s.ProductSubcategoryID FROM Production.ProductSubcategory s WHERE s.Name = 'Tights');</font></p> <p><font face="Consolas">UPDATE Production.Product <br>SET DiscontinuedDate = SYSDATETIME() <br>WHERE ProductSubcategoryID IN (SELECT s.ProductSubcategoryID FROM Production.ProductSubcategory s WHERE s.ProductCategoryID = 4);</font></p> <p><font face="Consolas">UPDATE Production.Product <br>SET DiscontinuedDate = SYSDATETIME() <br>WHERE EXISTS (SELECT * FROM Production.ProductSubcategory s&nbsp; <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; WHERE s.ProductCategoryID = 4&nbsp; <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; AND s.ProductSubcategoryID = Production.Product.ProductSubcategoryID);</font></p> </blockquote> <p>Using other tables in the SET clause generally means something that returns a scalar value, although this could become more complex using CASE. Still though, the logical impact on the overall query is notable. Something in the SET clause cannot be used to filter the values being updated, or to update a value multiple times. The SET clause is a list of “column = &lt;scalar expression&gt;” clauses, in which each column must come from the table (or table expression) being updated (which means I don’t think it should ever have a table alias), and cannot be listed multiple times. As the expression is scalar, it can’t produce multiple rows or columns. </p> <blockquote> <p><font face="Consolas">UPDATE Production.Product <br>SET DiscontinuedDate = (SELECT MAX(s.ModifiedDate) <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FROM Production.ProductSubcategory s&nbsp; <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; WHERE s.ProductSubcategoryID = Production.Product.ProductSubcategoryID) <br>WHERE ProductSubcategoryID IN (SELECT s.ProductSubcategoryID FROM Production.ProductSubcategory s WHERE s.ProductCategoryID = 4);</font></p> </blockquote> <p>Now, all UPDATE statements could be written like this. As an update statement cannot change the number of rows in a table, the net effect on any table is a single row of change (I know triggers could be used to have a larger effect, but that’s a separate topic). I’ve met plenty of people over the years who will argue for never using a FROM clause in an UPDATE clause.</p> <p>You see, a FROM clause can have more of an effect than these sub-queries.</p> <p>Let’s think about what introducing extra tables via a FROM clause can do. For now, let’s start with what’s going on in a SELECT query, when you turn a FROM clause into a FROM ... JOIN clause.</p> <blockquote> <p>1. It can let you access data in the columns of those tables, to use in predicates or expressions.</p> <p>2. It can let you filter the data in the base table, by only allowing rows which match, such as when using an inner join or right outer join.</p> <p>3. It can cause rows in the base table to be returned multiple times, if multiple rows in the joined table match a single row in the base table.</p> <p>4. It can introduce NULL rows, if a full or right outer join is being done (or a left outer join with the base table second) and there are rows in the joined table that don’t match any rows in the base table.</p> </blockquote> <p>The particular combination of these will affect the type of join performed by your SELECT query – such as a Semi Join which does number 2, but none of the others. And if it does none, then the join is redundant and won’t appear in the query plan at all.</p> <p>So how does this work in an UPDATE statement?</p> <p>There are two possible ways that a FROM clause can work – one is to include the table being updated in the FROM clause, and the other is to have it NOT included in the FROM clause. If it doesn’t appear in the FROM clause, then predicates to define the matching criteria must be included in the WHERE clause to avoid updating every row. If it does appear in the FROM clause, then I would recommend using the table alias in the UPDATE clause rather than the name of the table itself.</p> <p>(Interestingly PDW does not support “UPDATE ... FROM ... JOIN”, although “UPDATE ... FROM ... WHERE” is fine.)</p> <p>So this is fine:</p> <blockquote> <p><font face="Consolas">UPDATE Production.Product <br>SET DiscontinuedDate = s.ModifiedDate <br>FROM Production.ProductSubcategory s <br>WHERE s.ProductSubcategoryID = Production.Product.ProductSubcategoryID;</font></p> </blockquote> <p>As is this:</p> <blockquote> <p><font face="Consolas">UPDATE p <br>SET DiscontinuedDate = s.ModifiedDate <br>FROM Production.Product p <br>JOIN Production.ProductSubcategory s <br>ON s.ProductSubcategoryID = p.ProductSubcategoryID;</font></p> </blockquote> <p>But please be careful about:</p> <blockquote> <p><font face="Consolas">UPDATE Production.Product <br>SET DiscontinuedDate = s.ModifiedDate <br>FROM Production.Product p <br>JOIN Production.ProductSubcategory s <br>ON s.ProductSubcategoryID = p.ProductSubcategoryID;</font></p> </blockquote> <p>It works, but I don’t consider it safe. Because you have the potential to update a table which isn’t mentioned in the FROM clause, you could find yourself inadvertently updating every row in Production.Product. There are safeguards to prevent it happening – this next example gives an error:</p> <blockquote> <p><font face="Consolas">UPDATE Production.Product <br>SET DiscontinuedDate = SYSDATETIME() <br>FROM Production.Product p <br>WHERE Production.Product.ProductSubcategoryID IS NOT NULL;</font> <br></p> </blockquote> <p>, although this one doesn’t, and updates every row in the table – after all, we have a CROSS JOIN going on, because I’ve listed the wrong table.</p> <blockquote> <p><font face="Consolas">UPDATE Production.Product <br>SET DiscontinuedDate = SYSDATETIME() <br>FROM Production.ProductSubcategory p <br>WHERE p.ProductSubcategoryID IS NOT NULL;</font></p> </blockquote> <p>If I’m writing queries, it’s generally fine. But if there’s a system which produces dynamic SQL, I start to worry. I’d rather update the alias, and be completely clear about what’s going on.</p> <p>So let’s go with the idea of using the table alias in the UPDATE clause when using the FROM clause, and choosing to always include the table being updated in the FROM clause. Unless we’re using PDW, of course.</p> <p>But the impact of those join effects... let’s look at them.</p> <p>Earlier, we saw this query. An inner join between Product and ProductSubcategory.</p> <blockquote> <p><font face="Consolas">UPDATE p <br>SET DiscontinuedDate = s.ModifiedDate <br>FROM Production.Product p <br>JOIN Production.ProductSubcategory s <br>ON s.ProductSubcategoryID = p.ProductSubcategoryID;</font></p> </blockquote> <p>Now, because s.ProductSubcategoryID is known to be unique (it’s the primary key on s), there is no way that this can cause ‘multiple updates’ to Product. Things are okay here, but filtering could certainly apply. A join is done to get the values from ProdcutSubcategory, and the rows are fed into the Clustered Index Update operator.</p> <p><a mce_href="http://sqlblog.com/blogs/rob_farley/image12_499D0739.png" href="http://sqlblog.com/blogs/rob_farley/image12_499D0739.png"><img title="image[12]" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:0px;display:inline;padding-right:0px;" border="0" alt="image[12]" width="644" height="169" src="http://sqlblog.com/blogs/rob_farley/image12_thumb_2AE8ADFF.png"></a></p> <p>Filters are okay here. UPDATE is happy with filters, whether they’re implemented using the WHERE clause or via an ON clause.</p> <p>But what if the unique index weren’t there? Then we might see duplicate rows – the next join effect.</p> <blockquote> <p><font face="Consolas">ALTER TABLE [Production].[Product] DROP CONSTRAINT [FK_Product_ProductSubcategory_ProductSubcategoryID]; <br>ALTER TABLE [Production].[ProductSubcategory] DROP CONSTRAINT [PK_ProductSubcategory_ProductSubcategoryID];</font></p> </blockquote> <p>Now what does the plan look like – we should see a Table Scan instead of a Clustered Index Scan because we just dropped the PK, but what other differences?</p> <p><a mce_href="http://sqlblog.com/blogs/rob_farley/image16_39B57488.png" href="http://sqlblog.com/blogs/rob_farley/image16_39B57488.png"><img title="image[16]" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:0px;display:inline;padding-right:0px;" border="0" alt="image[16]" width="720" height="169" src="http://sqlblog.com/blogs/rob_farley/image16_thumb_18C72343.png"></a></p> <p>It looks very similar, but now throws a Distinct Sort in there. You see, an Update isn’t going to do multiple updates. It won’t allow it. So it does a Distinct Sort on the PK of the Product table, and uses whichever value it cares to for the update.</p> <p>Another option it could’ve used would’ve been to use an Aggregate operator (because GROUP BY and DISTINCT are essentially the same thing), in which case it would’ve needed to apply an aggregate function to s.ModifiedDate while grouping by the Product PK. Which aggregate? The ANY() aggregate, of course – because it doesn’t care which value to use, it just has to be a valid one. I can get this plan by using an OPTION (FAST 1) query hint, because that will avoid doing the Sort, as a Sort is blocking. It also turns the Hash Match into a Nested Loop, because it really wants to get that first row through as quickly as possible. It’s a slower query, but lets us see the ANY() aggregate.</p> <p><a mce_href="http://sqlblog.com/blogs/rob_farley/image25_3831061C.png" href="http://sqlblog.com/blogs/rob_farley/image25_3831061C.png"><img title="image[25]" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:0px;display:inline;padding-right:0px;" border="0" alt="image[25]" width="720" height="382" src="http://sqlblog.com/blogs/rob_farley/image25_thumb_59C337E5.png"></a></p> <p>So we can see that if a multiple rows are going to be returned by the FROM clause, this will get shrunk down to a single one. This is how that third ‘join effect’ is handled. </p> <p>Be really careful about this. It’s a bad thing, and the reason why purists don’t like to see a FROM clause in an UPDATE statement.</p> <p>The next (and final – yay!) join effect is to have NULLs introduced.</p> <p>Let’s start by putting those constraints back in:</p> <blockquote> <p><font face="Consolas">ALTER TABLE [Production].[ProductSubcategory] ADD&nbsp; CONSTRAINT [PK_ProductSubcategory_ProductSubcategoryID] PRIMARY KEY CLUSTERED <br>([ProductSubcategoryID] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]; <br>ALTER TABLE [Production].[Product]&nbsp; WITH CHECK ADD&nbsp; CONSTRAINT [FK_Product_ProductSubcategory_ProductSubcategoryID] FOREIGN KEY([ProductSubcategoryID]) <br>REFERENCES [Production].[ProductSubcategory] ([ProductSubcategoryID]); <br>ALTER TABLE [Production].[Product] CHECK CONSTRAINT [FK_Product_ProductSubcategory_ProductSubcategoryID];</font></p> </blockquote> <p>...and let’s put a RIGHT JOIN in there (for purely academic reasons – I know you’d never do this in real life, although you might put a LEFT JOIN in with the base table second). This means that our FROM clause will return an extra row for each ProductSubcategory that has no Products. There aren’t any of them in AdventureWorks2012, but the Query Optimizer doesn’t know that.</p> <p>Before I go any further, let’s quickly make something clear. With a right outer join, the result set of the join contains rows that don't exist in the base table. Obviously we can't update those – there's nothing in the base table for those rows. But we’re going to look at how the query plan handles this situation.</p> <blockquote> <p><font face="Consolas">UPDATE p <br>SET DiscontinuedDate = s.ModifiedDate <br>FROM Production.Product p <br>RIGHT JOIN Production.ProductSubcategory s <br>ON s.ProductSubcategoryID = p.ProductSubcategoryID;</font> <br></p> </blockquote> <p>There are two things of note here in the query plan (apart from the fact that it has put the base table second and used a left join):</p> <p><a mce_href="http://sqlblog.com/blogs/rob_farley/image29_4137FA50.png" href="http://sqlblog.com/blogs/rob_farley/image29_4137FA50.png"><img title="image[29]" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:0px;display:inline;padding-right:0px;" border="0" alt="image[29]" width="720" height="154" src="http://sqlblog.com/blogs/rob_farley/image29_thumb_7ADFB59C.png"></a></p> <p>For a start, we still see a Distinct Sort! I can assure you that the unique constraint is in there. If I remove the keyword ‘RIGHT’ I go back to my original version without any distinctifying operator. But the thing that’s new here is that Filter. Let’s look at what the Filter is doing:</p> <p><a mce_href="http://sqlblog.com/blogs/rob_farley/image36_12326C21.png" href="http://sqlblog.com/blogs/rob_farley/image36_12326C21.png"><img title="image[36]" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:0px;display:inline;padding-right:0px;" border="0" alt="image[36]" width="431" height="484" src="http://sqlblog.com/blogs/rob_farley/image36_thumb_1FBFBB2D.png"></a></p> <p>It’s filtering on “[IsBaseRow1001] IS NOT NULL”. That value is coming from the Product table, and is simply checking to see whether the row coming out of the Join operator is a match or not. It’s testing to see if we actually have a row in the base table to update. It could’ve tested ProductID for NULL for this, like we would in a query, but I guess it’s quicker to test IsBaseRow for being NULL than to test ProductID. I don’t know much about IsBaseRow1001, but I can tell that it’s not more than a single byte. The Estimated Row Size on the Scan of Product was 15B in the original query, and is 16B in this query. But I’m just guessing here. Theoretically it’s not needed at all, of course, and for testing, could have been a single bit.</p> <p>Or the Query Optimizer could have turned the join into an inner join. After all, we’re not interested in updating a non-row. As much as it’s interesting to see IsBaseRow1001 coming through, I can’t help but think that turning that join operator into an inner join would’ve done the trick. But as we don’t see <a mce_href="http://sqlblog.com/blogs/rob_farley/archive/2011/10/04/joins-without-join.aspx" target="_blank" href="http://sqlblog.com/blogs/rob_farley/archive/2011/10/04/joins-without-join.aspx">LEFT JOIN + NULL being turned into an Anti Semi Join</a> either, I’m not too surprised that this translation isn’t happening either.</p> <p>Because there could be multiple Subcategories without Products, there is a possibility of non-unique ProductIDs – the NULLs – coming out of the Join operator. But these are the only ones that could be duplicates, because each Product has at most a single matching ProductSubcategoryID in s. Therefore, once the NULLs have been removed by the Filter, the QO should be able to know that the data coming out of the Filter is unique on ProductID, but it doesn’t use this information, and needs a distinctifying operator to be sure.</p> <p>The Distinct Sort is still on ProductID, but a FAST 1 hint turns it into a Hash Aggregate this time instead of a Stream Aggregate. The reason for this is that a Nested Loop over the Product table isn’t going to find the ProductSubcategories that don’t have Products (although it could if it understood the translation to Inner Join). Therefore, it still performs a Hash Aggregate, does the filter for IsBaseRow1001, and then does a Hash Match (Flow Distinct) on ProductID.</p> <p>It’s interesting to see that we have a Build Residual here on ProductID, despite ProductID being an integer. </p> <p><a mce_href="http://sqlblog.com/blogs/rob_farley/image47_43A2AF7E.png" href="http://sqlblog.com/blogs/rob_farley/image47_43A2AF7E.png"><img title="image[47]" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:0px;display:inline;padding-right:0px;" border="0" alt="image[47]" width="720" height="108" src="http://sqlblog.com/blogs/rob_farley/image47_thumb_4026AF45.png"></a></p> <p>You see, normally in a Hash Match on an integer we wouldn’t see a residual because the hash function produces an integer. It’s because ProductID could have been NULL. The nullability of the column coming through obviously wasn’t change by the Filter (and let’s face it – it didn’t test ProductID for NULL, it tested IsBaseRow1001).</p> <p><i>Quick interlude for showing that the hash function produces a 32-bit value, and doesn’t need a residual check when hashing on a non-nullable integer (while a nullable integer needs more than 32 bits):</i></p> <p><i>Compare the plans of these two queries. The Hash Match operator in the first one doesn’t have a Probe Residual, because s.ProductCategoryID doesn’t allow NULLs. The Hash Match operator in the second does have a Probe Residual, because p.ProductSubcategoryID does allow NULLs, and a nullable integer can cause clashes in the hash table.</i></p> <blockquote> <p><i><font face="Consolas">SELECT * <br>FROM Production.ProductSubcategory s <br>JOIN Production.ProductCategory c ON c.ProductCategoryID = s.ProductCategoryID <br>OPTION (HASH JOIN);</font></i></p> <p><i><font face="Consolas">SELECT * <br>FROM Production.Product p <br>JOIN Production.ProductSubcategory s ON s.ProductSubcategoryID = p.ProductSubcategoryID <br>OPTION (HASH JOIN);</font> <br></i></p> </blockquote> <p><i>Also consider the Hash Match operator in the following query:</i></p> <blockquote> <p><i><font face="Consolas">CREATE TABLE dbo.rf_BigInts (id BIGINT NOT NULL PRIMARY KEY);</font></i></p> <p><i><font face="Consolas">SELECT * <br>FROM dbo.rf_BigInts t1 <br>JOIN dbo.rf_BigInts t2 ON t2.id = t1.id <br>OPTION (HASH JOIN);</font></i></p> <p><i><font face="Consolas">DROP TABLE dbo.rf_BigInts;</font> <br></i></p> </blockquote> <p><i>, where we see a Probe Residual on a non-nullable bigint (a lot more than 32 bits). This tells me that bigints can have clashes in the hash table, despite non-nullable integers not showing this.</i></p> <p>Oh yes, we were looking at the Build Residual.</p> <p>If we went back to an Inner Join with FAST 1, where we got a Stream Aggregate, and turn that into a Hash Match on the non-nullable ProductID, we can see that our Build Residual disappears.</p> <blockquote> <p><font face="Consolas">ALTER TABLE [Production].[Product] DROP CONSTRAINT [FK_Product_ProductSubcategory_ProductSubcategoryID]; <br>ALTER TABLE [Production].[ProductSubcategory] DROP CONSTRAINT [PK_ProductSubcategory_ProductSubcategoryID];</font></p> <p><font face="Consolas">UPDATE p <br>SET DiscontinuedDate = s.ModifiedDate <br>FROM Production.Product p <br>JOIN Production.ProductSubcategory s <br>ON s.ProductSubcategoryID = p.ProductSubcategoryID <br>OPTION (HASH JOIN, FAST 1);</font></p> </blockquote> <p><a mce_href="http://sqlblog.com/blogs/rob_farley/image40_5369F695.png" href="http://sqlblog.com/blogs/rob_farley/image40_5369F695.png"><img title="image[40]" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="image[40]" width="720" height="152" src="http://sqlblog.com/blogs/rob_farley/image40_thumb_58DD889B.png"></a></p> <p>Let’s do a bit of a summary...</p> <p>If you’re doing an UPDATE, you can only update each row in that table one time, no matter what kind of impact your FROM clause might want to have on the base table. You may feel like it's going to update some rows multiple times, but that's not how it works&nbsp;</p> <p>Each of the join effects is either applied (in the case of a filter) or mitigated (in the case of duplicates or NULLs), so that you can access the data in other tables without fear of having a bad effect on your UPDATE, but don’t do it! Because you can’t tell which value gets picked up by the ANY() aggregate (or first row by a distinctifying operator), you should avoid duplicates completely, so that your UPDATE finds a single value to update each row with.</p> <p>And I would encourage you to use table aliases in your UPDATE clause if you use a FROM clause with a JOIN – but if you don’t use a JOIN, then make sure you include the match in your WHERE clause.</p> <p><a mce_href="mailto:Now.@rob_farley" target="_blank" href="mailto:Now.@rob_farley">@rob_farley</a></p>A new superpower for SQL query tuners – Number of Rows Readhttp://www2.sqlblog.com/blogs/rob_farley/archive/2015/12/11/a-new-superpower-for-sql-query-tuners-number-of-rows-read.aspxSat, 12 Dec 2015 01:26:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:60217rob_farley
<p>There’s a technique that I’ve been using for some time now, and been looking for going even further back, which has become immediately available to everyone who can see a query plan (version permitting).</p>
<p>Some years back (ok, it was 2010), I started to present at conferences about SARGability and residual predicates. I had <a mce_href="https://sqlbits.com/Sessions/Event7/Understanding_SARGability_to_make_your_queries_run_faster" target="_blank" href="https://sqlbits.com/Sessions/Event7/Understanding_SARGability_to_make_your_queries_run_faster">a session at SQLBits VII</a> about it (one of the first times I’d presented with SQL MVPs from at least five different countries in the room), I presented at the 2011 PASS Summit on “The evils of residualiciousness”, and I even wrote a song with the line “<a mce_href="http://sqlblog.com/blogs/rob_farley/archive/2011/10/16/i-should-ve-looked-the-other-way.aspx" target="_blank" href="http://sqlblog.com/blogs/rob_farley/archive/2011/10/16/i-should-ve-looked-the-other-way.aspx">my predicate’s residual, my seek just runs too slow</a>”. I wrote blog posts about <a mce_href="http://sqlblog.com/blogs/rob_farley/archive/2011/05/19/covering-schmuvvering-when-a-covering-index-is-actually-rubbish.aspx" target="_blank" href="http://sqlblog.com/blogs/rob_farley/archive/2011/05/19/covering-schmuvvering-when-a-covering-index-is-actually-rubbish.aspx">bad covering indexes</a>, and <a mce_href="http://sqlblog.com/blogs/rob_farley/archive/2011/03/22/probe-residual-when-you-have-a-hash-match-a-hidden-cost-in-execution-plans.aspx" target="_blank" href="http://sqlblog.com/blogs/rob_farley/archive/2011/03/22/probe-residual-when-you-have-a-hash-match-a-hidden-cost-in-execution-plans.aspx">TSA security probes</a> (or something like that).</p>
<p>The point of all this is that an index seek, merge join, or hash match is not necessarily effective for quickly locating the rows that you care about. It all comes down to the predicates that are involved, and whether they are SARGable for the index you’re trying to use.</p>
<p>Over at <a title="http://sqlblog.com/blogs/rob_farley/archive/2011/05/19/covering-schmuvvering-when-a-covering-index-is-actually-rubbish.aspx" mce_href="http://sqlblog.com/blogs/rob_farley/archive/2011/05/19/covering-schmuvvering-when-a-covering-index-is-actually-rubbish.aspx" href="http://sqlblog.com/blogs/rob_farley/archive/2011/05/19/covering-schmuvvering-when-a-covering-index-is-actually-rubbish.aspx">my “Covering Schmuvvering” post</a>, I describe an index and query on AdventureWorks like this:</p>
<blockquote>
<p><font face="Consolas">CREATE INDEX rf_ix_Covering ON Production.Product(DaysToManufacture) <br>INCLUDE (Name, ProductNumber, Size, ReorderPoint, Color);</font></p>
</blockquote>
<blockquote>
<p><font face="Consolas">SELECT Name, ProductNumber <br>FROM Production.Product <br>WHERE DaysToManufacture &lt; 4 <br>AND ReorderPoint &lt; 100 <br>AND Color = 'Red';</font></p>
</blockquote>
<p>The plan gives me an Index Seek that returns a single row. Wohoo!</p>
<p><img title="image" border="0" alt="image" width="342" height="114" src="http://sqlblog.com/blogs/rob_farley/image_3B3A3400.png"></p>
<p>...but I explain how this is actually really bad, because all the work is being done in the Predicate, not the Seek Predicate.</p>
<p><img title="image" border="0" alt="image" width="340" height="583" src="http://sqlblog.com/blogs/rob_farley/image_59240828.png"></p>
<p>You see, the “Predicate” is the Residual Predicate (my term – you’ll just see it as “Predicate” here, because it’s the same kind of thing that we see in a Scan, which doesn’t have a concept of the Seek Predicate and the Residual Predicate).</p>
<p>The Residual Predicate is the leftover one, that needs to be checked after the Seek Predicate. You might not have one, if every predicate in your query is handled already by the time the Seek is done. But if you do have one, there is extra checking to do.</p>
<p>So just like how a Scan will start on the first page of the index and keep running until it’s either reached the end or it doesn’t need to keep looking (because the operator on its left has stopped asking for more rows), a Seek will find the rows that satisfy the Seek Predicate, and then have to check each one to see if it satisfies any leftover predicates.</p>
<p>The same applies for Merge Joins, which find things that match using the sort order of each data stream, but then need to apply any residual predicates. Hash Matches can be even worse, as the Probe Residual will include the Probe Keys if the hash function could cause two different values to be assigned to the same bucket. <a mce_href="http://sqlblog.com/blogs/rob_farley/archive/2011/03/22/probe-residual-when-you-have-a-hash-match-a-hidden-cost-in-execution-plans.aspx" target="_blank" href="http://sqlblog.com/blogs/rob_farley/archive/2011/03/22/probe-residual-when-you-have-a-hash-match-a-hidden-cost-in-execution-plans.aspx">Read more on probe residuals here</a>.</p>
<p>Back to Index Seeks though...</p>
<p>When the Query Optimizer creates a plan that has a residual predicate in a Seek or Scan, one of the earlier iterations will have seen this predicate applied in a Filter operator to the left of the Index operation. But before the plan is created, the residual predicate is pushed down into the index operation.</p>
<p>If we apply trace flag 9130 (undocumented, so be wary), we can see the plan without the pushdown.</p>
<p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" width="437" height="325" src="http://sqlblog.com/blogs/rob_farley/image_1361259B.png"></p>
<p>And that thick arrow there shows us 407 rows being produced by the Seek operation, despite the single row being produced in our original plan. <b>That original plan did not contain that information.</b></p>
<p><b>But with Service Pack 3 of SQL Server 2012, all that changes.</b></p>
<p>From now on, Index Seeks and Index Scans have an additional property called “<a mce_href="https://support.microsoft.com/en-us/kb/3107397" target="_blank" href="https://support.microsoft.com/en-us/kb/3107397">Actual Rows Read</a>”, which shows through in SSMS as “Number of Rows Read”.</p>
<p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" width="590" height="299" src="http://sqlblog.com/blogs/rob_farley/image_4F0072B4.png"></p>
<p>Please note that you need to have a recent-enough version of SSMS to see this come through. If you are running SSMS 2014, you’ll need to wait for the next service pack. Or if you’re running SSMS 2012 or SSMS 2016, make sure you’ve applied the latest service pack or CTP.</p>
<p>This feature is amazing though! A significant part of the query tuning I do is to look for ineffective Seek operations. Many people look for Scans, but don’t consider that a poor Seek is just as bad. Now, we can easily see that this Seek took 407 rows to produce just 1, and a different indexing strategy could be recommended. It wouldn’t surprise me to see warnings start to come through in 3rd party products like <a mce_href="http://www.sqlsentry.com/products/plan-explorer/" target="_blank" href="http://www.sqlsentry.com/products/plan-explorer/">SQL Sentry’s Plan Explorer</a> soon as well, because I think that a Seek with 0.25% effectiveness is worth a warning. And without having to use the trace flag either!</p>
<p><i>(Edit: They do now! Check out <a target="_blank" href="http://blogs.sqlsentry.com/aaronbertrand/sql-sentry-v10-index-analysis">http://blogs.sqlsentry.com/aaronbertrand/sql-sentry-v10-index-analysis</a> )</i></p>
<p><b>It’s like you’ve just been given X-ray vision on the Index operators!</b></p>
<p>Other experts in the space understand the significance of the residual predicate, such as <a mce_href="http://www.littlekendra.com/" target="_blank" href="http://www.littlekendra.com/">Kendra Little</a> (<a mce_href="https://twitter.com/Kendra_Little/" target="_blank" href="https://twitter.com/Kendra_Little/">@Kendra_Little</a>), and now you can see this too.</p>
<p><a mce_href="https://twitter.com/Kendra_Little/status/235547168096796672" target="_blank" href="https://twitter.com/Kendra_Little/status/235547168096796672"><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" width="370" height="145" src="http://sqlblog.com/blogs/rob_farley/image_6792A0B5.png"></a></p>
<p>There are a few more things I would like to see to round this feature off. I’d like to see Estimated Rows Read (which I can see using 9130), and I’d like to see similar information for Merge Joins and Hash Matches. If you want to see these implemented, jump onto Connect and vote for these items: <a mce_href="https://connect.microsoft.com/SQLServer/feedback/details/2118542" target="_blank" href="https://connect.microsoft.com/SQLServer/feedback/details/2118542">Estimated Rows Read</a> and <a mce_href="https://connect.microsoft.com/SQLServer/feedback/details/2118547" target="_blank" href="https://connect.microsoft.com/SQLServer/feedback/details/2118547">Merge Join / Hash Match info</a> – and one to <a mce_href="https://connect.microsoft.com/SQLServer/feedback/details/2118561" target="_blank" href="https://connect.microsoft.com/SQLServer/feedback/details/2118561">fix up the naming convention</a>.</p>
<p>...and in the meantime, make sure your local copy of SSMS is up-to-date, and encourage people to upgrade their SQL 2012 boxes to SP3!</p>
<p><a mce_href="http://twitter.com/rob_farley" target="_blank" href="http://twitter.com/rob_farley">@rob_farley</a></p>Automate ETL Testing: Cross-Server Data Comparison with PowerShellhttp://www2.sqlblog.com/blogs/merrill_aldrich/archive/2015/11/22/automate-etl-testing-cross-server-data-comparison-with-powershell.aspxSun, 22 Nov 2015 16:08:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:60056merrillaldrich<p><span style="font-size:13.3333px;">In collaboration with a current client over the past year or so,</span>&nbsp;I've been working on the issue of Test Driven Development for ETL work, especially in the area of automated testing for data. He graciously agreed to allow me to rewrite some of the code we originally developed in Ruby as a set of PowerShell demo scripts and share it through user group and SQL Saturday presentations.</p><p>Attached are the demo scripts and PowerPoint presentation. The demo works based on a copy of AdventureWorks2012, which plays the role of an EDW project's transactional source system, and second empty database on a separate SQL Server instance, which plays the role of a new EDW under development.&nbsp;</p><p>There is a single SSIS package that pulls some trivial demo data from AdventureWorks and places it into a table with a different schema in the warehouse. There is also a SQL script, for demonstration purposes, that introduces some known errors into the warehouse table, in order to validate that we can find them using tests.</p><p>The idea is that we have untransformed source data on one system and post-ETL transformed data in our new warehouse, and we need a TDD-friendly, automated and repeatable way to compare the results of two queries - one from the source system and a second from the new warehouse, where each query is expected to reproduce exactly the same result set.</p><p>The two queries and the data comparison should work regardless of what the source and destination systems are - they could be different servers or even different RDBMS's. The comparison should run in a fairly automated and unattended fashion and output an "exception report" of all the differences in data between the two systems.</p><p>The PowerShell scripts, numbered in demo order, build in complexity from a single script that issues two queries, all the way to two different methods of automated comparison across servers - hash and merge join - that are generalized for any pairs of queries, and can be stored centrally in a source control system for automated testing.</p>Superhero powers for managing the enterprisehttp://www2.sqlblog.com/blogs/rob_farley/archive/2015/09/07/superhero-powers-for-managing-the-enterprise.aspxTue, 08 Sep 2015 00:16:56 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:59503rob_farley<p>This month’s T-SQL Tuesday is hosted by <a href="http://www.midnightdba.com/Jen/2015/09/time-for-t-sql-tuesday-70" target="_blank">Jen McCown</a> of <a href="http://twitter.com/midnightdba" target="_blank">@midnightdba</a> fame. She wants us to write about strategies for managing the enterprise, and as a database consultant, I find myself giving advice on this kind of thing to customers remarkably often.</p> <p>No, I’m not going to do stories about LobsterPot customers. We don’t do that. What happens between us and the customer stays between us and the customer. However, I can talk about the kinds of things that we look for when we talk to a customer.</p> <p>The thing that I want look at in this post is about that twitch that you get when something doesn’t feel right. The ‘spider-sense’ feeling that Peter Parker gets when there’s something that’s not quite right.<a href="http://www.midnightdba.com/Jen/2015/09/time-for-t-sql-tuesday-70" target="_blank"><img title="TSQL2sDay150x150" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:right;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="TSQL2sDay150x150" src="http://sqlblog.com/blogs/rob_farley/TSQL2sDay150x150_30D1817F.jpg" width="154" align="right" height="154" /></a></p> <p>Experience is what helps people know what ‘normal’ looks like. I’ve heard stories that people who are trained to spot counterfeit bank notes don’t spend time with fake notes, they spend time with the real ones (on loan, presumably). They learn what ‘normal’ looks like, and get really familiar with it. That way, when something isn’t quite right, they can spot it and raise an alarm.</p> <p>For DBAs taking care of an environment, whether small or large, they will have learned what ‘normal’ looks like. They should have benchmarks that tell them what how various metrics perform, and for those things that they don’t have formal metrics on, they should still be familiar enough to recognise when something isn’t right, even if they can’t tell exactly what. Their ‘spider-sense’ should tingle.</p> <p>If you don’t have benchmarks, then get them. Every time you find something that doesn’t seem right, you will wish you had a benchmark on that thing to be able to quantify your suspicion. Feeling like something isn’t right is great, but it won’t be long until someone asks “How do you know this thing needs attention?” and “I just know” probably won’t cut it. If you’re a consultant, you can probably get away with “In my experience...”, because that’s what they’re paying you for, but having supporting evidence – actual numbers – can help, particularly if you’re the main person responsible and are needing to persuade someone to find the money for a CapEx.</p> <p>Having the numbers handy is useful for a lot of situations, but there are a whole bunch of tools also available to look at too. A while back I wrote about how DBAs could use the same kinds of tools that other industries hire data professionals to provide, in a post called “<a href="http://sqlblog.com/blogs/rob_farley/archive/2015/06/09/dbas-and-the-internet-of-things.aspx" target="_blank">DBAs and the Internet of Things</a>”. If you take this kind of approach and start analysing the readings from all kinds of things that affect your database, then you can get ahead of the game. Feed this stuff into something like Azure ML for predictive analytics, and you might be able to have an even-better spider-sense, where the prediction isn’t just based on your own opinions, but on what has caused failures in the past.</p> <p>Too often, the significant thing is some small detail that most people wouldn’t notice, but before of your experience and expertise, you can spot it and work out whether it’s significant or not. Then if you don’t have that particular thing benchmarked, or analysed by other tools, you can include it to see what’s going on.</p> <p>...and develop superhero powers for managing your enterprise. It’s something we regularly recommend to DBAs.</p> <p><a href="http://twitter.com/rob_farley" target="_blank">@rob_farley</a></p>SQL 2014 queries under a strobe lighthttp://www2.sqlblog.com/blogs/rob_farley/archive/2015/08/31/sql-2014-queries-under-a-strobe-light.aspxMon, 31 Aug 2015 13:12:42 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:59445rob_farley<p>I just wrote a <a href="http://sqlblog.com/blogs/rob_farley/archive/2015/08/31/live-query-statistics-in-sql-2014-not-just-sql-2016.aspx" target="_blank">post about Live Query Statistics</a>. Let me show you how this technology in SQL Server 2014 can be used for some amazingly cool (nerdy cool, at least) stuff.</p> <p>Behind the scenes, LQS uses a DMV called sys.dm_exec_query_profiles. When you run a query with SET STATISTICS PROFILE ON, the engine puts data into this DMV for every operator in the plan that produces data (which doesn’t include the SELECT operator, or Compute Scalar, for example). What SSMS does while you’re watching a long-running query is poll this DMV over and over to get the progress data, so that you can see it in those moments between kicking off the query and its completion.</p> <p>When you use LQS on a relatively quick query, say, one that completes in a single second, you can’t exactly watch this information come through. You might be able use a debugger, and pause operation of your server for a moment while you step through it, but this is far from ideal. And yet a query that completes in a single second might actually need some work. What if this query needs to be able to run many times per second, and you’re looking for strategies to tune every last bit out of it?</p> <p>Clearly LQS is going to be no use.</p> <p>But the workings behind it... that’s another matter. Let me show you.</p> <p>When botanists are wanting to study what happens in the flight of an insect, they take lots of photographs, often using a strobe light to capture a moment with as little blur as possible. It allows for incredibly detailed images, like the ones you can see in <a href="http://www.dailymail.co.uk/sciencetech/article-2011394/Amazing-photos-capture-split-second-movements-animals-leaping-flying--single-frame.html" target="_blank">this article from the Daily Mail in the UK</a>.</p> <p><a href="http://www.dailymail.co.uk/sciencetech/article-2011394/Amazing-photos-capture-split-second-movements-animals-leaping-flying--single-frame.html" target="_blank"><img style="margin:5px;" src="http://i.dailymail.co.uk/i/pix/2011/07/05/article-2011394-0CDEE9E900000578-675_964x834.jpg" width="555" height="480" /></a></p> <p><em>(Image only linked from source – please let me know if it’s not there, but also go and look at the amazing pictures that this guy takes)</em></p> <p>I don’t know how many times this insect flapped its wings between each image that was captured, but I think you’ll agree that with enough images, it would be possible to learn a lot about the way that the wing-flapping takes place.</p> <p>This is the same as what I’m doing with queries in SQL Server 2014.</p> <p>Suppose I run a query over and over, with <strong>statistics profile </strong>turned on. The code here runs my sub-second query over and over for six minutes.</p> <blockquote> <p><font face="Consolas">--set statistics profile on</font></p> <p><font face="Consolas">declare @t datetime = dateadd(minute,6,getdate());</font></p> <p><font face="Consolas">while (@t &gt; getdate()) <br />begin</font></p> <p><font face="Consolas">select p.Name, sum(d.OrderQty) as Qty <br />from Sales.SalesOrderDetail d <br />join Production.Product p <br />on p.ProductID = d.ProductID <br />group by p.Name;</font></p> <p><font face="Consolas">end</font></p> </blockquote> <p>I used an SSMS window for this, and told SSMS to discard the results. I only need to run it once to see the results – I just care about the profile stats. Incidentally, it returns 266 rows.</p> <p>Now, I know that the plan that’s being used here is:</p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_06AEBF90.png" width="749" height="247" /></p> <p>The Node_IDs of these operators are 1 (the Join), 2 (the Aggregate), 3 (the Scan on SalesOrderDetail), and 7 (the Scan on Product). These numbers have gaps just because of the inner workings of the Query Optimizer. And they go from left to right because that’s how a plan runs. The SELECT calls Node 1, which calls Nodes 2 and 7, and Node 2 calls Node 3.</p> <p>So during five of the six minutes that my query was running over and over and over, I went into a different window and polled the DMV every half a second.</p> <blockquote> <p><font face="Consolas">set nocount on <br />go</font></p> <p><font face="Consolas">declare @t datetime = dateadd(minute,5,getdate());</font></p> <p><font face="Consolas">while (@t &gt; getdate()) <br />begin</font></p> <p><font face="Consolas">insert dbo.rf_strobe <br />select getdate() as myquerytime, *, 'Hash Match' <br />from sys.dm_exec_query_profiles</font></p> <p><font face="Consolas">waitfor delay '0:00:00.5'</font></p> <p><font face="Consolas">end</font> <br /></p> </blockquote> <p>I had made a table which matched the DMV, with a couple of extra columns thrown in. One to record when I captured the moment (so that I could differentiate between moments), and one (called ‘comment’) to comment which query I was monitoring. The DMV contains the sql_handle and plan_handle, so I could’ve differentiated between them later, but I wanted to be able to differentiate between them more easily than that.</p> <p>This data gave me 597 different values for ‘myquerytime’, 597 different moments captured. I don’t know how many different times my query ran in that time – probably far more than 597, although I wouldn’t have really cared if it were less. These 597 moments each had up to 4 rows, showing how each operator was going in its cycle.</p> <p>There are columns in the DMV for all kinds of attributes, but the one that I was most interested in was the row_count, which tells me how many rows the operator has produced. There are columns about CPU, but as I’m looking at a query which runs in a small number of milliseconds, I’m more interested in the number of rows its produced.</p> <p>Here’s a some of the data that’s in my table.</p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_4EFE9610.png" width="430" height="357" /></p> <p>You can see that when I polled the DMV at 19:28:20.303, Node 3 (the Clustered Index Scan on SalesOrderDetail) had pushed 25241 rows, and no rows had been outputted by the other three. Half a second later, the snapshot showed 91044 rows from that node. Another half a second, and it was 54279, and in the poll at 19:28:21.807, the scan had served up all its rows, and there was output from the other three operators.</p> <p>Each of these is like a strobe image, capturing a moment in the life of the query. </p> <p>To recreate how the query runs, we need to piece them together. To do this, I’m going to assume that the outputted rows are done in the same order each time (which is a reasonable assumption when I’m running the query over and over in quick succession, with no changing parameters or conditions). By summing the row_count across all the operators in each poll, I can order the polls. A quick bit of pivotting...</p> <blockquote> <p><font face="Consolas">select row_number() over (order by OverallRowCount) as rownum, * <br />from <br />( <br />select&#160;&#160;&#160;&#160; <br />&#160;&#160;&#160; count(*) + sum(row_count) as OverallRowCount, <br />&#160;&#160;&#160; max(case when node_id = 1 then row_count end) as HashMatchJoin, <br />&#160;&#160;&#160; max(case when node_id = 2 then row_count end) as HashMatchAgg, <br />&#160;&#160;&#160; max(case when node_id = 3 then row_count end) as CIXScan, <br />&#160;&#160;&#160; max(case when node_id = 7 then row_count end) as IXScan <br />from dbo.rf_strobe <br />where comment = 'Hash Match' <br />group by myquerytime <br />) s <br />order by rownum <br />;</font></p> </blockquote> <p>...and I can see the query start:</p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_34C0BD9D.png" width="597" height="660" /></p> <p>..and the moment when the Clustered Index Scan stops outputting more rows:</p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_2AFDE6D6.png" width="522" height="359" /></p> <p>...and the end of the query.</p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_3C04A56A.png" width="510" height="221" /></p> <p>Notice that almost all the time we polled the DMV to see how the query was going, data was still being pulled out of SalesOrderDetail. It was only 96.1% (574/597) of the way into the query that data started to be outputted from the blocking Hash Aggregate. And because a Hash Match Join blocks until the hash table has been completed, we have to wait even longer before we eventually start pulling data from the Product table, when we pull up to 504 rows to find the 266 rows of interest.</p> <p>I’m sure you can get a feel for how this query is running from this information.</p> <p>But let’s compare this to what happens if I force a Nested Loop join instead of the Hash Match Join. This plan:</p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_4BA9D1DD.png" width="746" height="219" /></p> <p>I repeated the collection of data, running the same query over and over but with OPTION (LOOP JOIN), and a different comment in the polled data. I also decided to use dynamic SQL to query my strobe table to save rewriting the pivot for each plan.</p> <blockquote> <p><font face="Consolas">declare @comment varchar(100) = 'Forced Loop Join'; <br />declare @qry nvarchar(max) = <br />'select row_number() over (order by OverallRowCount) as rownum, * <br />from <br />( <br />select <br />&#160;&#160;&#160; count(*) + sum(row_count) as OverallRowCount <br />' + (select ', max(case when node_id = ' + cast(node_id as varchar(10)) + ' <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; then row_count end) as [' + <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; cast(node_id as varchar(10)) + ': ' + physical_operator_name + ']' <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; from dbo.rf_strobe <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; where comment = @comment <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; group by node_id, physical_operator_name <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; order by node_id <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; for xml path(''),type).value('.','nvarchar(max)') <br /> + ' <br />from dbo.rf_strobe <br />where comment = @comment <br />group by myquerytime <br />) s <br />' <br />; <br />exec sp_executesql @qry, N'@comment varchar(100)', @comment = @comment;</font> <br /></p> </blockquote> <p>It started very similarly, but was very different towards the end.</p> <p>The Scan seemed to go for a longer portion of the plan – 98.5% (587/596), but as the Hash Match Aggregate started producing rows, the Nested Loop was pulling the row from the Seek and returning it to the SELECT operator before pulling the next row in. You can see the row_count going up equally across the three operators, which is very different to what we saw with the Hash Match Join.</p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_0A31B035.png" width="725" height="787" /></p> <p>A factor I hadn’t considered before became evident at the end. In the Hash Match Join example, we saw a bunch of moments when all the rows had produced their data, which aren’t there in the Nested Loop example. The result of the query is no different, but the tear-down time is much quicker with the Nested Loop – presumably because the Hash Table used for the join doesn’t need to be dropped. I didn’t expect this to be as significant as it seems to be, but we certainly managed to catch six images – about 1% of them – when the operators had all stopped returning data, but the query was still hanging around as far as the DMV was concerned. With everything else being identical, I can only assume it’s down to the Hash Table having more to do to shut down than the Nested Loop.</p> <p>Just for fun, I tested the pivot query itself, giving 582 strobe images.</p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_350F35A9.png" width="787" height="161" /></p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_4CC8FDC0.png" width="742" height="117" /></p> <p>The first thing to notice is that the Compute Scalar operators didn’t report anything, as expected.</p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_30BD6037.png" width="676" height="261" /></p> <p>The next is that the right-most Sort was blocking, and had a significant pause after the Scan finished – about 35 rows or 6% of the query time.</p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_58F229FA.png" width="673" height="237" /></p> <p>The Stream Aggregate doesn’t block, but the left-most Sort, which has called the Stream Aggregate (via two Compute Scalars) does, following which there’s another pause (but smaller – fewer rows), after which the Segment and Sequence Project operators don’t block.</p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_059D7485.png" width="679" height="328" /></p> <p>At the end of the query we have about 80 rows – well over 10% of the query time – after the Sequence Project has outputted its last row. </p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_6B5F9C11.png" width="679" height="228" /></p> <p><img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:5px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_0CE15C52.png" width="678" height="163" /></p> <p>Now – there is more that I will learn about this still, and I have been making some assumptions about whether the DMV polling gives a sufficiently random moment. But from the things I’ve seen, there is definitely information about queries that I haven’t seen before and which require some further research.</p> <p>Finally, I spent a bit of time looking at visualisation options for this. I immediately thought of the Power View play axis that’s on scatter charts, but sadly I didn’t have enough luck coming up with an effective visualisation very quickly. I had felt like Hans Rosling with his scatter chart about birth rates, and used LAG and MAX() OVER() to come up with a speed of row production compared to the total, but I kept getting the “representative sample” message, which wasn’t conducive. I’m sure it won’t be long before this visualisation would be easy, and for the purposes of analysis, I was more interested in exploring the data rather than making it look pretty.</p> <p>Strobe photography is very impressive. I just never thought it would apply to T-SQL queries.</p> <p><a href="http://twitter.com/rob_farley" target="_blank">@rob_farley</a></p>Live Query Statistics in SQL 2014, not just SQL 2016http://www2.sqlblog.com/blogs/rob_farley/archive/2015/08/31/live-query-statistics-in-sql-2014-not-just-sql-2016.aspxMon, 31 Aug 2015 08:07:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:59439rob_farley<p>One of the coolest things in SQL Server 2016 is Live Query Statistics (LQS). But did you know that it’s available in SQL 2014? (Edit: Needs SP1)</p> <p>The thing is that we haven’t been able to view it effectively before now, before SQL Server Management Studio 2016 became available.</p> <p>LQS provides the ability to watch an execution plan <b>while the query is still running.</b></p> <p><img width="868" height="496" title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;margin:5px;display:inline;padding-right:0px;border-top-width:0px;" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_5FDB021B.png" border="0"></p> <p>In this image above, you can see that I have a fairly long-running query, and I’ve taken a screen shot while the query is running. I can tell it’s still running because of the dotted lines. At this point, the Hash Match has asked the Index Scan for all the rows it’s going to, and the Index Scan has been marked as 100% done. The Hash Match is now asking the Merge Join for rows. The Merge Join is pulling data out of the Sort, which has already pulled in the rows it needed, and so on.</p> <p>It’s cool stuff.</p> <p>And significantly, this is against <b>SQL Server 2014 (SP1). </b>I had to use SSMS 2016, but then the button appeared at the top to include Live Query Statistics...</p> <p><img width="369" height="179" title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;margin:5px;display:inline;padding-right:0px;border-top-width:0px;" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_536F6FA3.png" border="0"></p> <p>...and then I could see them. When I connected to an earlier version, such as SQL Server 2012, the box was disabled and I couldn’t see them.</p> <p>So why not install SSMS 2016 (it’s only the client tool – I’m not suggesting you put it on a server), and see how some of your queries look?</p> <p>You shouldn’t do this against a production machine, because it takes some effort for SQL Server to produce the data used for this. But see how you go. It’s definitely a useful performance-tuning feature which is available right now.</p> <p><a href="http://twitter.com/rob_farley" target="_blank" mce_href="http://twitter.com/rob_farley">@rob_farley</a></p>