Search results matching tags 'SQL Server' and 'Database Design'http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&tag=SQL+Server,Database+Design&orTags=0Search results matching tags 'SQL Server' and 'Database Design'en-USCommunityServer 2.1 SP2 (Build: 61129.1)Off to Richmond for SQL Saturdayhttp://sqlblog.com/blogs/louis_davidson/archive/2015/03/13/off-to-richmond-for-sql-saturday.aspxFri, 13 Mar 2015 22:12:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:58188drsql<P class=MsoNormal style="MARGIN:0in 0in 8pt;"><FONT size=3><FONT face=Calibri>I got the email prodding speakers to blog about our upcoming sessions, so I got myself up and started to write this blog. It has been such a long time since I have done much side SQL work (other than doing quite a bit of tech editing, along with doing some work for PASS leading up to speaker submissions), that my blog didn't even show up in the blogs list on sqlblog.com. My last blog was right after PASS when I had attended the Summit from my hospital bed.<o:p></o:p></FONT></FONT></P>
<P class=MsoNormal style="MARGIN:0in 0in 8pt;"><FONT size=3><FONT face=Calibri>Since then, it has been quite a bumpy road. For a person who usually travels as much as I do for fun (SQL and Disney) and work, not having left the Nashville area since vacation in September has been weird. But all sorts of stuff have gotten in the way, mostly that I just haven't felt like blogging (heck, I haven't had an entry on my simple-talk blog since then either, though a few editorials were posted on sqlservercentral.com by my editor that I wrote pre-surgery).<o:p></o:p></FONT></FONT></P>
<P class=MsoNormal style="MARGIN:0in 0in 8pt;"><FONT size=3 face=Calibri>But now, finally, it is time to wake the heck up. I am leaving Nashville this Sunday, heading for a week of work in Virginia Beach, not coincidentally the same week as </FONT><A href="https://www.sqlsaturday.com/381/eventhome.aspx"><FONT color=#0563c1 size=3 face=Calibri>SQL Saturday Richmond</FONT></A><FONT size=3 face=Calibri> where I will be talking about In-Memory OLTP tables and how they affect your database design.<SPAN style="mso-spacerun:yes;">&nbsp; </SPAN>It is also pretty cool that </FONT><A href="https://twitter.com/jessicammoss"><FONT color=#0563c1 size=3 face=Calibri>Jessica Moss</FONT></A><FONT size=3 face=Calibri> will be presenting down in Virginia Beach while I am in town, so a stop at the </FONT><A href="http://hrssug.thecloudlyfe.com/"><FONT color=#0563c1 size=3 face=Calibri>Hampton Roads SQL Server User Group</FONT></A><FONT size=3><FONT face=Calibri> is definitely in order for most of the people I work with.<o:p></o:p></FONT></FONT></P>
<P class=MsoNormal style="MARGIN:0in 0in 8pt;"><FONT size=3 face=Calibri>Here is the abstract for my presentation (if you want Jessica’s go </FONT><A href="http://hrssug.thecloudlyfe.com/march-meeting-2015/"><FONT color=#0563c1 size=3 face=Calibri>here</FONT></A><FONT size=3><FONT face=Calibri>):</FONT></FONT></P>
<P class=MsoNormal style="MARGIN:0in 0in 8pt;">&nbsp;</P>
<P class=MsoNormal style="MARGIN:0in 0in 8pt;"><B style="mso-bidi-font-weight:normal;"><A href="https://www.sqlsaturday.com/viewsession.aspx?sat=381&amp;sessionid=28084"><FONT color=#0563c1 size=3 face=Calibri>How In-Memory Database Objects Affect Database Design</FONT></A><FONT size=3><FONT face=Calibri> <o:p></o:p></FONT></FONT></B></P>
<P class=MsoNormal style="MARGIN:0in 0in 8pt;"><FONT size=3><FONT face=Calibri>With SQL Server 2014, Microsoft has added a major new feature to help optimize OLTP database implementations by persisting your data primarily in RAM. Of course it isn't that simple, internally everything that uses this new feature is completely new. While the internals of this feature may be foreign to you, accessing the data that uses the structures very much resembles T-SQL as you already know it. As such, the first important question for the average developer will be how to adapt an existing application to make use of the technology to achieve enhanced performance. In this session, I will start with a normalized database, and adapt the logical and physical database model/implementation in several manners, performance testing the tables and code changes along the way.<o:p></o:p></FONT></FONT></P>
<P class=MsoNormal style="MARGIN:0in 0in 8pt;"><o:p><FONT size=3 face=Calibri>&nbsp;</FONT></o:p></P>
<P class=MsoNormal style="MARGIN:0in 0in 8pt;"><FONT size=3 face=Calibri>Does this mean I am fully back and over my funk? Good grief, I don't know. But I have submitted for 4 other SQL Saturdays over the rest of this year, and I have projects that are just waiting for me to get started. Some days I want to just lay down and not get up until it is time to go back to bed. Others I want to write a new book, travel to the ends of the earth and talk about SQL Server. The fact is, I am taking this one task at a time, and I look forward to talking about SQL Server for you at 9:45 on the 21st of March. And when that is over, I am going to Dollywood's opening weekend and let the </FONT><A href="http://www.dollywood.com/themepark/rides/Tennessee-Tornado.aspx"><FONT color=#0563c1 size=3 face=Calibri>Tennessee Tornado</FONT></A><FONT size=3 face=Calibri> spin some sense into my head. Hope I see you there (SQL Saturday or Dollywood, either way we can have some fun!)</FONT><o:p></o:p></P>How In-Memory Database Objects Affect Database Design: Uniquenesshttp://sqlblog.com/blogs/louis_davidson/archive/2014/07/06/how-in-memory-database-objects-affect-database-design-uniqueness.aspxSun, 06 Jul 2014 20:15:56 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:54426drsql<p><em>This part is part of an ongoing series of blogs I am writing while preparing to give a presentation based on the prefix of this blog's title. It is a work in progress as I explore the effects of the new in-memory model. I would love comments that tell me that I am wrong (especially if it turns out that I am!)</em></p> <p>The first issue that arose in my tests was with the new concurrency model. I started out with the stored procedure I will show later in the entry. It sees if a customer exists, and if not create it. The table is the customerAddress table, with the following structure:</p> <p><a href="http://sqlblog.com/blogs/louis_davidson/image_40F83E64.png"><img title="image" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/louis_davidson/image_thumb_634FD3DD.png" width="422" height="243" /></a></p> <p>With on-disk structures, I have a uniqueness constraint on the Address columns, along with the customerId. The customerId is the first column in the constraint declaration, so the performance was great. I don’t use a transaction or error handling (yet) because I want to simulate what a normal application might do. I put the code in stored procedure form because, well, it will just be easier to manage for me. (Plus we can change that code to be proper a lot easier, and when I build the natively compiled procedures, things will be easier to simulate.</p> <p>Basically, the gist is, look up the address using all of the address columns that are in the uniqueness constraint. If you get a customerAddressId, return it, if not, create the customer and then return it, using the alternate key values:</p> <p>CREATE PROCEDURE Customers.CustomerAddress$CreateAndReturn <br />( <br />&#160;&#160;&#160; @customerId INT, <br />&#160;&#160;&#160; @Line1&#160; NVARCHAR(30), <br />&#160;&#160;&#160; @Line2&#160; NVARCHAR(30), <br />&#160;&#160;&#160; @City&#160;&#160; NVARCHAR(30), <br />&#160;&#160;&#160; @ZipCode VARCHAR(20), <br />&#160;&#160;&#160; @USStateCode CHAR(2), <br />&#160;&#160;&#160; @AddressTypeCode NVARCHAR(20) <br />) <br />AS <br />BEGIN <br />&#160;&#160;&#160; SET NOCOUNT ON <br />&#160;&#160;&#160; DECLARE @customerAddressId INT = ( <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; SELECT CustomerAddressId <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; FROM&#160; Customers.CustomerAddress <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; WHERE&#160; CustomerId = @CustomerId <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; (Line1 = @Line1 OR (Line1 IS NULL AND @Line1 IS NULL)) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; (Line2 = @Line2 OR (Line2 IS NULL AND @Line2 IS NULL)) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; (City = @City OR (City IS NULL AND @City IS NULL)) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; (ZipCode = @ZipCode OR (ZipCode IS NULL AND @ZipCode IS NULL)) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; USStateCode = @USStateCode <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; AddressTypeCode = @AddressTypeCode ) --we are assuming validation is done elsewhere</p> <p>&#160;&#160;&#160; IF @customerAddressId IS NULL <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; BEGIN <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; INSERT&#160; INTO [Customers].[CustomerAddress] ( [CustomerId], [Line1], [Line2], [City], [ZipCode], [USStateCode], [AddressTypeCode] ) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; VALUES&#160; ( @CustomerId, @Line1, @Line2, @City, @ZipCode, @USStateCode, @AddressTypeCode )</p> <p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; SET @customerAddressId = ( <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; SELECT CustomerAddressId <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; FROM&#160; Customers.CustomerAddress <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; WHERE&#160; CustomerId = @CustomerId <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; (Line1 = @Line1 OR (Line1 IS NULL AND @Line1 IS NULL)) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; (Line2 = @Line2 OR (Line2 IS NULL AND @Line2 IS NULL)) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; (City = @City OR (City IS NULL AND @City IS NULL)) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; (ZipCode = @ZipCode OR (ZipCode IS NULL AND @ZipCode IS NULL)) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; USStateCode = @USStateCode <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; AddressTypeCode = @AddressTypeCode ) </p> <p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; END</p> <p>&#160;&#160;&#160;&#160;&#160; RETURN COALESCE(@customerAddressId, -100) <br />END <br />GO</p> <p>My test data resides in a side database, with a lot of duplicated customers and addresses. The goal is to do a good number of reads, but not make the code so complex that we start to deal with slight variances in addresses. These are the kinds of things one might do in an ETL process, or certainly as a background process in their OLTP system.&#160; For this example, I want to avoid duplication, but only if it is 100% duplicated.</p> <p>The code is called in batches of four inter connected sets of procedures. Each returns a surrogate key via a return statement (it isn’t 100 finished, as I haven’t really handled stuff like –100 being returned, even though it really couldn’t occur with my current data.): <br /></p> <p>DECLARE @CustomerId INT <br />EXEC @CustomerId = Customers.Customer$CreateAndReturn @firstName = 'Tyler', @MiddleName = 'R', @LastName = 'David', @EmailAddress = <a href="mailto:'Tyler@email.com'">'Tyler@email.com'</a></p> <p>DECLARE @CustomerAddressId INT <br />EXEC @CustomerAddressId = Customers.CustomerAddress$CreateAndReturn @customerId = @CustomerId,@Line1='9 Gordon Highway',@line2='Apt 296',@city='Buffalo',@ZipCode='81254',@USStateCode='MO',@AddressTypeCode='Office'</p> <p>DECLARE @SalesOrderId int <br />EXEC @SalesOrderId = Sales.SalesOrder$Create @CustomerId=@CustomerId,@CustomerAddressId=@CustomerAddressId,@SalesOrderStatusCode='New'</p> <p>EXEC Sales.SalesOrderItem$Create @SalesOrderId=@SalesOrderId,@SalesOrderItemNumber=48904,@Quantity=3.6386,@UnitPrice=14.57,@ProductPriceId=3779 <br />EXEC Sales.SalesOrderItem$Create @SalesOrderId=@SalesOrderId,@SalesOrderItemNumber=98015,@Quantity=3.0596,@UnitPrice=5.31,@ProductPriceId=1043 <br /></p> <p>I have 4 files of these stored procedure batches, each file having 10,000 of batches in it. I start them all at the same time using my <a href="http://sqlblog.com/blogs/louis_davidson/archive/2012/03/20/utility-objects-waitfor-delay-coordinator-sql-server-2008.aspx" target="_blank">WaitFor Sync FrameWork I blogged</a> about a few year’s back. When I was using the on-disk tables, not once did this code fail to work, and not once did I get an error, even though I certainly could have if a duplicated row was inserted:</p> <p>DECLARE @customerAddressId INT = ( <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; SELECT CustomerAddressId <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; FROM&#160; Customers.CustomerAddress <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; WHERE&#160; CustomerId = @CustomerId <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; (Line1 = @Line1 OR (Line1 IS NULL AND @Line1 IS NULL)) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; (Line2 = @Line2 OR (Line2 IS NULL AND @Line2 IS NULL)) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; (City = @City OR (City IS NULL AND @City IS NULL)) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; (ZipCode = @ZipCode OR (ZipCode IS NULL AND @ZipCode IS NULL)) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; USStateCode = @USStateCode <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; AddressTypeCode = @AddressTypeCode ) --we are assuming validation is done elsewhere</p> <p>&#160;&#160;&#160; IF @customerAddressId IS NULL <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; BEGIN <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; INSERT&#160; INTO [Customers].[CustomerAddress] ( [CustomerId], [Line1], [Line2], [City], [ZipCode], [USStateCode], [AddressTypeCode] ) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; VALUES&#160; ( @CustomerId, @Line1, @Line2, @City, @ZipCode, @USStateCode, @AddressTypeCode )</p> <p>Time is the enemy. Time passes between the fetch of the CustomerAddressId and the insert. If another connection inserts the row in those moments between these statements, you could get a duplicate key error. Which I accepted as a possibility. If I finished this code for on-disk, I would handle that error by refetching the key. No problem.</p> <p>When I changed my table to an in-mem table (which I will talk more about the actual structure in a blog or two), I immediately got duplicated data. Why? Three reasons.</p> <p>1. The in-memory code ran 300% faster with little optimization.</p> <p>2. No UNIQUE key constraint. I put a hash index on the customerId so the WHERE clause would use a simple seek operation, but with only a PRIMARY KEY constraint, there was nothing to stop the duplicate</p> <p>3. The no locking optimistic concurrency control. (I won’t go into too much detail, but read this if you haven’t yet: <a title="http://sqlblog.com/blogs/kalen_delaney/archive/2013/10/20/sql-server-2014-in-memory-oltp-hekaton-whitepaper-for-ctp2.aspx" href="http://sqlblog.com/blogs/kalen_delaney/archive/2013/10/20/sql-server-2014-in-memory-oltp-hekaton-whitepaper-for-ctp2.aspx">http://sqlblog.com/blogs/kalen_delaney/archive/2013/10/20/sql-server-2014-in-memory-oltp-hekaton-whitepaper-for-ctp2.aspx</a>)</p> <p>Each of these factors figured into the issue. Since the code is faster, it is more likely that we will have collisions. This was certainly a factor, as in my first tests, I forgot to index the customerId, and the code ran twice as slow as the on-disk version, and there were no collisions.</p> <p>Without a uniqueness constraint, the rows will get created with no issue, even if you accidentally get duplicates to create. The lack of constraints is one of my least favorite parts of the whole in-memory structures. </p> <p>Using Multi-Valued Concurrency Control (MVCC), even while the row is being inserted (or really, as long as the transaction has not been committed), other connections can execute the search for the address and get back no results. In the default isolation level, access is in the basic snapshot isolation level. The second connection sees how the table is when they start the transaction.&#160; Even if I escalated to REPEATABLE READ or SERIALIZABLE, it wouldn’t eliminate duplication, since the new row’s uniqueness is defined as a surrogate key and there would technically be no collision. Is using a surrogate key the best solution? It isn’t feeling like it for these tables, but you know what, this is the norm for most people. I could easily just chuck the primary key on the natural key here, and let the identity be the possible (and far less likely) duplication point. But that seems like cheating, and what if a table has 2 natural key choices, or 3, or more. You would be stuck again with this issue (and it will not be easy to solve.) </p> <p>Fortunately, this code here is going to fail IF multiple rows are committed by the time it executes:</p> <p>SET @customerAddressId = (SELECT CustomerAddressId <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; FROM&#160; Customers.CustomerAddress <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; WHERE&#160; CustomerId = @CustomerId <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; (Line1 = @Line1 OR (Line1 IS NULL AND @Line1 IS NULL)) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; (Line2 = @Line2 OR (Line2 IS NULL AND @Line2 IS NULL)) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; (City = @City OR (City IS NULL AND @City IS NULL)) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; (ZipCode = @ZipCode OR (ZipCode IS NULL AND @ZipCode IS NULL)) <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; USStateCode = @USStateCode <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160; AddressTypeCode = @AddressTypeCode ) <br /></p> <p>Since a subquery can only return 1 value and not cause an error. But with MVCC, this too could be an issue if we want to put it in a transaction and rollback on an error.&#160; Sadly however, it will not be so easy to fix because it would only fail if the transaction with the insert has been committed.</p> <p>As I start to think about how to solve this problem, the solution is going to have to be very different than with disk based tables. There, if I threw a transaction around the search and the insert (and possible a few indexing hints to block multiple readers), I could stop the duplication easily. However, in all of the MVCC isolation levels, readers are never blocked. I could use an <a href="http://sqlblog.com/blogs/louis_davidson/archive/2007/05/07/using-application-locks-to-implement-a-critical-section-in-t-sql-code.aspx" target="_blank">application lock to single thread the inserts</a>, but that would definitely not be the most efficient operation, and it would not work at all in managed code. </p> <p>In reality, there is another more common solution (as I expect a lot of people wouldn’t even have the alternate key on the address to start with). Just give in and accept the duplicates as part of high performance business. We can minimize the damage by changing the two subqueries to:</p> <p>SET @customerAddressId = (SELECT MIN(CustomerAddressId)</p> <p>So we always try to use the first one. With a low cardinality table like a customer’s address, you will only get back a few rows at most, so the aggregate isn’t going to be terribly costly. And you build cleanup routines to reallocate duplicated addresses. This is something that you will need to do with the data in any case, since it is altogether possible that the user types: '9 Gordon Hwy' instead of '9 Gordon Highway', and you get duplicates you need to clean up anyhow. My current times to enter the 40,000 rows across 4 connections using the command line SQLCMD interface (the files crashed SSMS!) is around 12 seconds. </p> <p>When I get back to refining the code tomorrow, this is going to be my new tack for this (and perhaps the customer create) procedure. Minimize duplication, but go ahead and allow it. That problem is for the ETL Architect to deal with. (Now if much of my daily job wasn’t writing ETL, that would actually be comforting.)</p>How In-Memory Database Objects Affect Database Design: Or does it?http://sqlblog.com/blogs/louis_davidson/archive/2014/06/30/how-in-memory-database-objects-affect-database-design-or-does-it.aspxMon, 30 Jun 2014 21:03:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:54341drsql<p><em>This part is&nbsp;part of an ongoing series of blogs I am writing while preparing to give a presentation based on the prefix of this blog's title. It is a work in progress as I explore&nbsp;the effects of the new in-memory model. I would love&nbsp;comments that tell me that I&nbsp;am wrong (especially if&nbsp;it turns out that I am!)</em>&nbsp;</p><p>Before I start describing what I have discovered so far along the way, let's make a few educated guesses. I have my data model from the previous blog entry (<a href="http://sqlblog.com/blogs/louis_davidson/archive/2014/06/28/how-in-memory-database-objects-affect-database-design-the-logical-and-physical-models.aspx" mce_href="http://sqlblog.com/blogs/louis_davidson/archive/2014/06/28/how-in-memory-database-objects-affect-database-design-the-logical-and-physical-models.aspx">here</a>), and I have a plan of attack for entering data. Basically, I plan to take the "normal" way that people deal with data like this, and enter data into each table separately, outside of and overarching&nbsp;transaction.</p><p>So I will create or retrieve, the Customer first, then the CustomerAddress, then I will create the SalesOrder and ever how many SalesOrderItems that the customer (or my data generation using RedGate's Data Generator!) wants. I may dress it up over time, but for&nbsp;my initial experiments, this is the plan.</p><p>For a database with UNIQUE constraints on alternate keys, and FOREIGN&nbsp;KEY constraints on relationships, this is all pretty easy and safe. I know I can't duplicate a customer, or violate foreign key constraints. How this will play out in the&nbsp;code is still just a bunch of guesses, with my only goal to basically be to not have the ETL architect trying to decide if running me over with a car is punishment enough&nbsp;for me introducing more bad data to deal with.</p><p>But the title of the session ends in "...Database&nbsp;Design".&nbsp;The code isn't&nbsp;database design. Rather, as an architect, it is important to realize that the database design affects the code. So will we need different&nbsp;data structures or will the normal normal model suffice? My initial feeling is that the logical model doesn't change. The&nbsp;actual implementation details (such as the lack of foreign, unique, and check constraints) will be a difference, and the hardware/implementation layer changes immensely, and this affects your code in ways you had best understand.</p><p>I "think" that what I determine will be that the basic data architects output remains rather constant. The rules of normalization (briefly described <a href="http://sqlblog.com/blogs/louis_davidson/archive/2011/05/29/normalization-and-how-to-know-when-you-are-done-the-short-version.aspx" mce_href="http://sqlblog.com/blogs/louis_davidson/archive/2011/05/29/normalization-and-how-to-know-when-you-are-done-the-short-version.aspx">here</a>), don't change at all. You still design the tables and columns based on the same factors of cardinality you would before. The physical model will be the same, and if your data modeling tool supports the in-memory structures, it is just a matter of choosing which tables should be in-mem and which&nbsp;don't really merit it, followed by a lot of testing (a lot).</p><p>For my first tests, I only made the&nbsp;four&nbsp;"hot" tables in-mem:</p><p><img width="962" height="494" style="width:962px;height:494px;" src="https://drsql.sharepoint.com/Documents/In%20Mem%20Demo%20-%20Physical%20Model%20-%20Hot%20Tables.png">&nbsp;</p><p>This design&nbsp;will work fine when&nbsp;dealing&nbsp;with interop code,&nbsp;even if I want to implement data integrity checks to the domain tables.&nbsp;If I want to use native code, then all of the tables used will need to be in-memory. Are data integrity checks that important? Well, they are to me, but not so much to&nbsp;a lot of&nbsp;folks who trust their&nbsp;other layers to get it right. My scenario, where all of the domain tables are "static" enable this scenario to work just fine. As long as&nbsp;ProductType never changes, and the&nbsp;other&nbsp;code layers have only the right values, you can easily say&nbsp;"this works" (as long as it does... and your code has been tested for what happens if&nbsp;anything crashes on any given statement to the&nbsp;database...which is not an easy task.).</p><p>In my next post, I will share at least one of the effects the change to in-memory code has had on the code I am using, and what I am considering doing about it. <br></p>How In-Memory Database Objects Affect Database Design: The Logical and Physical Modelshttp://sqlblog.com/blogs/louis_davidson/archive/2014/06/28/how-in-memory-database-objects-affect-database-design-the-logical-and-physical-models.aspxSat, 28 Jun 2014 20:04:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:54340drsql<p><em>This part is&nbsp;part of an ongoing series of blogs I am writing while preparing to give a presentation based on the prefix of this blog's title. It is a work in progress as I explore&nbsp;the effects of the new in-memory model. I would love&nbsp;comments that tell me that I&nbsp;am wrong (especially if&nbsp;it turns out that I am!)</em>&nbsp;&nbsp;</p><p>Now that I have the conceptual model I want to work with (and bunch of&nbsp;"life" tasks out of the way), I head now into fleshing out the model. I go ahead and add attributes, along with domains that&nbsp;I will use to build my actual database.&nbsp; The model continues on with a minimalist approach, just the basics that are needed, nothing more, nothing less. Some times, names, etc;</p><p>&nbsp;<img width="792" height="478" title="Logical Model" style="width:792px;height:478px;" alt="Logical Model" src="https://drsql.sharepoint.com/Documents/In%20Mem%20Demo%20-%20Logical%20Model.PNG"></p><p>As this is a very simple model,&nbsp;I won't drag it out anymore than I&nbsp;need to, and I am going to go ahead and make this a physical model:</p><p>&nbsp;<img width="962" height="494" style="width:962px;height:494px;" src="https://drsql.sharepoint.com/Documents/In%20Mem%20Demo%20-%20Physical%20Model.PNG"></p><p>&nbsp;</p><p>Basically, I set datatypes, added domain tables for some of simple domains (to make doing data checking easier when (if?) I build manual foreign keys in the code, since in-memory databases don't support the concept), and added row create and modify times to every editable table. (The four tables Customer, CustomerAddress, SalesOrder, and SalesOrderItem that I will be loading as part of my stress testing.) Nothing too fancy, but at the same time, not so simplistic that I felt like it would lose any and all meaning.&nbsp; The other tables are not really "transactional", so I will be loading them all at once as a configuration step. My lack of adding physical columns like row modify times isn't a suggestion that it isn't needed at all in a real system (who added that $.10 fare from Boston to LA?), but rather I wanted to keep it as an indication that I wasn't planning on dealing with that here. The four major tables will be loaded at full speed ahead, and as fast as we can take orders from new and existing customers, we will. All other data will be simply configuration data, for me. </p><p>On the other hand, if I was going to do a full day test, a process that might be interesting to include would be a price hike. Who chooses the price? The database code, or the front end? If the price changes while the client has the old price in hand, should the transaction fail and say "Sorry, our prices went up, try again! Thank you come again." or just accept the previous price?&nbsp; And then, how much time might acceptably pass before the fail notice comes up? Could that be hacked by a craft 10 year old to get pricing from a year ago? Discussions like this&nbsp;is why actual design sessions take simply forever to complete, and are very boring to people who strictly code... It is also why we are avoiding the topic here, for the most part.</p><p>In terms of our implementation, one of the more interesting columns to&nbsp;deal with are going to be the CustomerNumber, and the SalesOrderNumber. Generating unique values for customer consumption is a fairly simple concept, but it will change when we start working with in-memory code, certainly for the 2014 version. </p><p>The next steps (which are already underway, but still looking for interesting things to blog about along the way), are already underway. In the next entry in this session writing process, I will discuss my methods of testing the structures, the issues I have seen along the way, and anything else that comes up.</p><p>&nbsp;</p>Speaking this weekend at SQL Saturday 286 - Louisvillehttp://sqlblog.com/blogs/louis_davidson/archive/2014/06/19/speaking-this-weekend-at-sql-saturday-286-louisville.aspxFri, 20 Jun 2014 03:20:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:54243drsql<p>Wow, where does the time go? Hope I see every reader there in Louisville this weekend for&nbsp;a lovely time. Say you live in Alaska and it would be too far to go in one day? Hey, we all have our problems (<em>and don't tell anyone, but I am going to try to finish up video versions of these sessions by the end of the year.. shh</em>.)</p><p>I will be doing my session on Database Design Fundamentals and How to Write a DML Trigger, both sessions I have done quite often of late, and I really like how they work now. I have a bit of a reputation as a slide/code tinkerer and this time is no different, but the biggest thing I have settled on is how I deal with presentation like these, where honestly the subject is more than a 1 hour session can handle. Homework.</p><p>I will cover the fundamentals of whatever topic&nbsp;in slides, and then demo as much code as I have time, but the rest is homework. I got the idea when I was prepping to speak to the Richmond User Group, as I was flip flopping between showing some slides and showing code. I realized that the slides would explain the fundamentals better than slogging through the internals of the code directly, and that when I attend sessions with a lot of code, all I am sitting there thinking is "man, I want that code".&nbsp; So I try to comment the code to make it self explanatory, run the code before each presentation, enhance it as I have time to, show you enough of the code to get you started, and then give it to you to play with (and if you want to pick it apart, email me at <a href="mailto:drsql@hotmail.com">drsql@hotmail.com</a>, I LOVE constructive criticism. &nbsp;</p><p>I post all of my slides and code on my website (<a href="http://drsql.org">http://drsql.org</a>)&nbsp;not just because it is part of a presentation, or to make myself feel cool, but mostly so I can reference it as I need it. I use my <a href="http://s/Snippets.aspx">code snippets</a> all of the time when coding, and I try to keep them up to the version I am using (or writing about as it were.) So hopefully, I see you and get to explain the fundamentals, then the (rather wordy at times) slides are there for reference, and the code is there to get you started practicing on your own. Homework.</p><p>The abstracts for this weekend's presentations:&nbsp;</p><p><span id="ContentPlaceHolder1_lblSessionTitle">Database Design Fundamentals</span> </p><p><span id="ContentPlaceHolder1_lblDescription">In this session I will give an overview of how to design a database, including the common normal forms and why they should matter to you if you are creating or modifying SQL Server databases. Data should be easy to work with in SQL Server if the database has been organized as close as possible to the standards of normalization that have been proven for many years. Many common T-SQL programming "difficulties" are the result of struggling against the way data should be structured and can be avoided by applying the basic normalization techniques and are obvious things that you find yourself struggling with time and again (i.e. using the SUBSTRING function in a WHERE clause meaning you can't use an index efficiently).&nbsp;</span></p><p><span id="ContentPlaceHolder1_lblSessionTitle">How to Write a DML Trigger</span> </p><p><span id="ContentPlaceHolder1_lblDescription">Triggers are extremely powerful and useful (if somewhat rarely needed) objects that are coded very similar to a common stored procedure. Yet for their similarity, there are some very important differences that need to be understood. In this session, I will walk through what goes into writing a robust DML trigger, starting with a simple version of a trigger, and working through some very useful applications of DML Triggers..</span>&nbsp;</p><p><span id="ContentPlaceHolder1_lblDescription"></span>&nbsp;</p>How In-Memory Database Objects Affect Database Design: Getting Startedhttp://sqlblog.com/blogs/louis_davidson/archive/2014/03/10/how-in-memory-database-objects-affect-database-design-getting-started.aspxTue, 11 Mar 2014 00:05:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:53062drsql<P><EM>Weird, somehow I lost text in the middle of this post after it was posted... I&nbsp;created it a few days&nbsp;back and set it to post later... Corrected.</EM></P>
<P>So I put in the following abstract for Devlink (and may for SQL Saturday Orlando and/or&nbsp;PASS Summit.) I don't know if I will get accepted, but I am pretty excited to work on this topic one way or another...</P>
<P>"With SQL Server 2014, Microsoft has added a major new feature to help optimize OLTP database implementations by persisting your data primarily in RAM. Of course it isn't that simple, internally everything that uses this new feature is completely new.<SPAN style="mso-spacerun:yes;">&nbsp; </SPAN>While the internals of this feature may be foreign to you, accessing the data that uses the structures very much resembles T-SQL as you already know it. As such, the first important question for the average developer will be how to adapt an existing application to make use of the technology to achieve enhanced performance. In this session, I will start with a normalized database, and adapt the logical and physical database model/implementation in several manners, performance testing the tables and code changes along the way.&nbsp;"</P>
<P>My plan is to start with a moderately straightforward table design, and implement a set of scripts that create a bunch of data. Starting out with normal on-disk tables using both ad-hoc calls and stored procedures, I will morph the design from just changing some of the tables to in-memory, then change the procedures to use in-memory wherever possible and test&nbsp;any changes we need to make to the physical structures. Another aspect to the process will be errant data. Part of the experiment will be to have a smattering of bad data to simulate real live programmers and users doing real live stuff. </P>
<P>I think I know the main content already, but I expect to pick up a lot of knowledge between now and then (which I will attribute, naturally!)&nbsp;I have at least 5 months, before I would need to present it, so who knows if I will change my mind. Along the way, I&nbsp;will blog semi-frequently about the progress, including what I have learned, code samples,&nbsp;philosophies, etc.&nbsp;</P>
<P>&nbsp;</P>
<P>&nbsp;</P>Chapter 7–Enforced Data Protectionhttp://sqlblog.com/blogs/louis_davidson/archive/2011/06/21/chapter-7-enforced-data-protection.aspxTue, 21 Jun 2011 04:36:13 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:36380drsql<p>As the book progresses, I find myself veering from the original stated outline quite a bit, because as I teach about this more (and I am teaching a daylong db design class in August at <a title="http://www.sqlsolstice.com/" href="http://www.sqlsolstice.com/">http://www.sqlsolstice.com/</a>… shameless plug, but it is on topic :) I start to find that a given order works better. Originally I had slated myself to talk more about modeling here for three chapters, then get back to the more implementation topics to finish out the book, but now I am going to keep plugging through the implementation tasks, then finish up with modeling task (which I hope I might end up getting some help with…emails are going out once I talk it over with my editor).</p> <p>In the last edition, the chapter on data protection was more inclusive, including programmatic data protection, including client code and stored procedures. But, keeping with the basic, implementation type chapters (and trying my best to shorten chapters to more realistic chunks (the free chapter shouldn’t be 1/2 of the book, or so I am told), I will put that off to probably the final chapter.</p> <p>This chapter was broken up into two main sections, Check Constraints and Triggers.&#160; I will demonstrate the following scenarios, and if you see anything missing, please do make suggestions</p> <p>Check Constraints</p> <ul> <li>Simple value checks – Like when you want to make sure there are no blank string values CHECK (len(value) &gt; 0)</li> <li>Value reasonableness checks – Like if a value should be a non-negative integer, CHECK (value &gt;= 0)</li> <li>Checks using different tables – Like setting up a data driven format check</li> </ul> <p>Triggers – Broken down by AFTER and INSTEAD OF Triggers</p> <ul> <li>AFTER</li> <ul> <li>Range checks on multiple rows – Like when you want to make sure that the sum of rows related to (and including) the newly inserted rows is &gt; 0</li> <li>Maintaining summary values (only as necessary) – Denormalization, pure and simple, but if you are going to do it, triggers are the way to go (you really shouldn’t)</li> <li>Cascading inserts – Like creating child rows to ensure a 1 to at least 1 row relationship is met, or creating a parent</li> <li>Child-to-parent cascades – Like deleting parent rows when the last child row is deleted</li> <li>Maintaining an audit trail – Also something that will come up in security, but implementing an audit trail of actions on a table. Less needed these days with auditing, but </li> <li>Relationships that span databases and servers – sometimes you just have to implement RI between databases, so it is back to 6.0 style RI</li> </ul> <li>INSTEAD OF</li> <ul> <li>Automatically maintaining values – For example, if you want to implement a bulletproof rowLastUpdatedTime column to know when the row last changed, but don’t trust the client (who does?)</li> <li>Formatting user input – Like formatting words in all caps, or proper case. Another thing that might be better done outside of SQL Server, but it is very straightforward to implement</li> <li>Redirecting invalid data to an exception table – For example, eliminating data that is outside of the norm. Possibly done better outside of SQL Server code, but if you really want to build something that takes previous data into consideration, this might be a reasonable way.</li> <li>Forcing no action to be performed on a table, even by someone who technically has proper rights – Simple do nothing trigger that works because in an instead of trigger you have to replicate the action, so you don’t.</li> </ul> </ul> <p>It might seem weird to consider formatting data or redirecting invalid data to another table as data protection, but the point of data protection is to make sure that they data ends up in a reasonable state, and triggers can do some “magical” seeming stuff. Admittedly, triggers are not a fan favorite with many programmers because they do those magical stuff that they cannot directly control, but in many ways that is the point.&#160; If the dev forgets to update the last update date, the ETL may not see the row, and oops your data is out of sync.</p> <p>In any case, I do my best to make it clear that you don’t in fact have to do any of this, but here are the tools in the tool bag. </p>See you in Columbus Saturday?http://sqlblog.com/blogs/louis_davidson/archive/2011/06/06/see-you-in-columbus-saturday.aspxMon, 06 Jun 2011 04:32:45 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:36088drsql<p>Assuming all goes as planned, I will be in Columbus, OH this Friday night and Saturday for <a href="http://www.sqlsaturday.com/75/eventhome.aspx" target="_blank">SQL Saturday 75</a>. I really love SQL Saturday events the best of all of the events because they are very intimate in nature. As a fairly antisocial person, I sometimes get overwhelmed by the size of other events, even the SQL Rally was just barely in my comfort range. Here the number of people and size of rooms just feels like home, like you are shooting the breeze with a group of friends.</p> <p>My session will be at 9:00 AM (<a href="http://www.sqlsaturday.com/75/schedule.aspx" target="_blank">full schedule</a>), so don’t be late!</p> <p><strong>Characteristics of a Great Relational Database</strong> <br />When queried, most database professionals would mention normalized as one of the most important characteristics that tell the difference between a good and bad database design. I won't disagree in the least, but there is so much more to be considered.&#160; Even if you did a great job of normalization, poor naming, poorly implemented keys, too many or too few indexes, and so on can derail your design.&#160; In this session I will present seven primary characteristics of a design that differentiates between an ugly design that will have your colleagues nitpicking you to death and one that will have them singing your praises. Characteristics such as comprehendible, documented, secure, well performing, and more (including normalized, naturally) will be discussed. <br /></p> <p>It is the second time I will do this presentation, and the first time where I can see the faces of the recipients, so it will be nice to gauge how people like it. It is a lot of fun actually, though no matter what I am talking about I want to talk more about normalization, which I believe is the key to improving the databases of the future. So many of the session that are given at these things are geared towards systems that are already screwed up and limping along, and I really want to evangelize the merits of doing it right.&#160; That is where this presentation fits in, the time period between design and performance tuning, where you determine the future work that is done with the system. Is it well performing, understandable, easy to maintain and use? Or does it take a crew of ten thousand DBAs doing nothing but putting their fingers in the leaks to keep the thing running?</p> <p>And don’t forget all of those BI sessions too… The better you do with the relational database, the easier the dimensional designer/implementers have it too.</p> <p>Anyhow, I hope to see you all there (well, not all of you, just the SQL nerds who are reading this. The history buffs who are still wondering why we are going to be inside the founder of our continent this weekend, well, you I feel sorry for you.</p>Normalization and How to Know When You Are Done… The short version…http://sqlblog.com/blogs/louis_davidson/archive/2011/05/29/normalization-and-how-to-know-when-you-are-done-the-short-version.aspxSun, 29 May 2011 20:54:15 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:35960drsql<p>A while back, I was working on a short article about Normalization for a book that never got published (admittedly I wasn’t getting paid for the article, and it wasn’t for charity, so I wasn’t <em><strong>that</strong></em> broken up over it.)&#160; The task at hand was to, in 2 pages or less, describe the process of normalization and help you to know when you have finished. In my upcoming book Pro SQL Server 2000 + N (where N &gt; 10) Relational Database Design and Implementation, it takes about 45 pages. So it wasn’t really a realistic task, especially considering I have spent about a full paragraph letting you know how hard the task is going to be. The most important thing that is missing from this short introduction is examples, which I include in the book in truck loads.</p> <p>There are two distinct ways that Normalization is approached. In a very formal manner, there are a progressive set of “rules” that specify “forms” that you are working to achieve. There is nothing wrong with that definition, but progressing through the forms in a stepwise manner is certainly not how any seasoned data architect is likely to approach the problem of designing data storage. Instead, you design with the principles of normalization in mind, and use the normal forms as a test to your design. </p> <p>The problem with getting a great database design is compounded with how natural the process seems. The first database that the past uneducated version of me built had 10+ tables, all of obvious ones like customer, orders, etc. set up so the user interface could be produced to satisfy the client. However, tables for address and even order items were left as part of the main tables, making it a beast to work with for queries, and as my employer wanted more and more out of the system, the design became more and more taxed. The basics were there, but the internals were all wrong and the design could have used about 50 or so tables to flesh out the correct solution. Soon after (at my next company, sorry Terry), I gained a real education in the basics of database design, and the little 1000 watt halogen light bulb went off… </p> <p>That light bulb was there because what had looked like a more complicated in the college database class that no normal person would have created (bet you can’t guess what my grade was in <strong><em>that </em></strong>class!) was really there to help my design fit in with the tools that I was using. Turns out that the people who create relational database engines use the same concepts of normalization to help guide how the engine is created that I needed to for a database to work well. So if the relational engine vendors are using a set of concepts to guide how they create the engine, it turns out to be actually quite helpful if you follow along.</p> <p>First, lets look at the “formal” rules. The normalization rules are stated in terms of “forms”, starting at First Normal Form, and including several others some of which are numbered, some are named for the creators of the rule. (Note that in the strictest terms, to be in a greater form, you ought to also conform to the lesser form. So you can’t be in third normal form and not give in to the definition of the First). To be honest, it is rare that a data architect will actually refer to the normal forms&#160; in conversation specifically unless they are having a nerd argument with a developer that is trying to design an entire customer relationship management system in a single table, but understanding the basics of normalization is essential to understanding why it is needed. What follows is a very quick restatement of the normal forms:</p> <ul> <li><b>First Normal form/Definition of a Table</b> – Attribute and row “shape” <ul> <li>All columns must be atomic—one value per column </li> <li>All rows of a table must contain the same number of values – no arrays </li> <li>Each row should be different from all other rows in the table – unique rows <br /></li> </ul> </li> <li><b>Boyce-Codd Normal Form – </b>Every<b> </b>candidate key is identified, and all attributes are fully dependent on a key, and all columns must identify a fact about a key and nothing but a key. <ul> <li>Encompasses: <ul> <li>Second Normal Form - All attributes must be a fact about the entire primary key and not a subset of the primary key </li> <li>Third Normal Form - All attributes must be a fact about the primary key and nothing but the primary key <br /></li> </ul> </li> </ul> </li> <li><b>Fourth Normal Form</b> - There must not be more than one multivalued dependency represented in the entity. That is to say that every attribute relates to the key with a cardinality of one. Not a common rule to violate, but it definitely does occur. <br /></li> <li><b>Fifth Normal Form</b> - All relationships are broken down to binary relationships when the decomposition is lossless. Very rarely violated in typical designs. </li> </ul> <p>There are other, more theoretical forms that I won’t mention, but they are rare to even encounter the definition. In the reality of the development cycle of life, the stated rules are not hard and fast rules, but merely guiding principles that can be useful to help you avoid certain pitfalls. In practice, we end up with denormalization, (meaning purposely violating a normalization principle for a stated, understood purpose, not ignoring the rules to get done faster) mostly to satisfy some programming or performance need from the consumer of the data (programmers/queriers/etc)</p> <p>Once you deeply “get” the concepts of normalization, you really will find that you build a database like a well thought out Lego creation, desiring how each piece will fit in to the creation before putting pieces together, because disassembling 1000 Lego bricks to make a small change makes Legos more like work than fun. Some rebuilding based on keeping agile can be needed, but the more you plan ahead, the less data you will have to reshuffle. </p> <p>In actual practice, the formal definition of the rules aren’t thought of at all, but instead the guiding principles that they encompass are.&#160; In my mind, I use the following four concepts in the back of my mind to guide the database I am building, falling back to the more specific rules for the really annoying/complex problem I am trying to solve:</p> <ul> <li><b>Columns</b> - One column, one value </li> <li><strong>Table/row uniqueness</strong> – Tables have independent meaning, rows are distinct from one another. </li> <li><b>Proper relationships between columns</b> – Columns either are a key or describe something about the row identified by the key. </li> <li><b>Scrutinize dependencies </b>- Make sure relationships between three values or tables are correct. Reduce all relationships to binary relationships if possible. </li> </ul> <p>The question in the title still has yet to be conquered. “How to&#160; know when you are done?” What I left out of the description of Normalization was the granularity you go with. The word “atomic” is a common way to describe a table or column that is normalized enough. Atomic would tend to indicate something that is broken down to its absolute lowest form. But unless you are not a nerd (and would you really be reading this if you weren’t?) we know that there are lots of particles smaller than an atom. When you try to mess with particles smaller than the atom, you get a mushroom cloud that even Timothy Leary would not have approved of.</p> <p>It is the same way with databases. Tables and columns split to their atomic level have one and only one meaning. Deal with them at a higher level, and you will suffer with lots of substrings, switching attributes that you use to find out what a table means in a situation. But break things down too far, and you will suffer even more. My best example of this is a column that holds a large quantity of text. If you never need to us part of the data using SQL, a single column is perfect (a set of notes that the user uses on a screen is a good example.) You wouldn’t want a paragraph, sentence, and character table to store this information. On the other hand, that same character column is abused when the users start putting coded information (because users WILL find a way to work if your software fails them). Then and you have to search for, you will need to begin working with the less comfortable string manipulation functions in SQL… And just try to index a part of a large text column. Possible? Sometimes. Best way to go? Never. </p> <p>The key to knowing what is normalization and what is an academic exercise for a nerd is to understand the needs of the users (commonly referred to as requirements, as in “Why don’t we ever have good requirements before we code!?!”). If it is clear that the user is planning on maintaining a list of values and will need to update them programmatically, then it is your job to make each value a row in a table. But if there is no requirement to ever search on a value in that list or programmatically access part of the value, then it might be overkill to do anything other than leave the value alone.&#160; It is often best to err on the side of caution, but the ideal relational storage for a document would be minimally at the word/punctuation level. If you are read this far and are convinced that would be the proper solution, then you need to get a complete book or take a class on the subject before you start creating a relational database.</p> <p>The reasonable answer to when you are done normalization is when users have exactly the right number of places to store the data they need and you can query/use the data without parsing it… Easy enough until the user changes their mind, huh?</p>Article: An Example of Test-Driven Developmenthttp://sqlblog.com/blogs/andy_leonard/archive/2010/10/29/article-an-example-of-test-driven-development.aspxFri, 29 Oct 2010 13:00:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:29957andyleonardToday, <A href="http://www.sqlservercentral.com/" target=_blank>SQLServerCentral</A> republished the first article in my Test-Driven Database Development series: <A href="http://www.sqlservercentral.com/articles/Testing/66553/" target=_blank>An Example of Test-Driven Database Development</A>. (Thanks SSC!)