Tag Archive for Modeling

Robert L. Davis (b|t) is hosting this month’s T-SQL Tuesday blog party. The party was started by Adam Machanic (b|t) in December of 2009.

This month’s T-SQL Tuesday’s topic is on Data Change. This is an important topic for me, since I’m about to create our first Data Warehouse at work. I’ve chosen to champion the companies who are using the Standard Edition of SQL Server.

Crestfallen

I took a class on Change Data Capture this past year at the SQL Saturday in Portland, Oregon. I absolutely loved it. I couldn’t wait to implement it. Then I found out it was an Enterprise Level feature only. I was crestfallen.

I went back to work asking if we could consider using Enterprise SQL Server, I was told that we have too many cores on our Nutanix hardware to make the features we would use worth it. You see, SQL Server pricing is based on the number of cores the underlying hardware has, not the number of cores you utilize in your VM. So what is a girl to do? (And guys, I haven’t forgotten you either.)

Since the Change Data Capture feature is not available to us Standard Edition users, let me take you through another pattern that uses auditing columns on each table.

The Basis of The Pattern I Use

I use a pattern that includes four fields on all transactional tables. This (absolutely) includes lookup tables too. The two table types that are an exception to this pattern are audit tables and error tables. I’ll cover why later in this article.

Four fields include CreatedOn, CreatedBy, UpdatedOn, and UpdatedBy. The dates should be DateTime2. CreatedOn is the easiest to populate. You can create a default on the field to be populated with GetDate().

The other fields can be a little more tricky depending on your environment. There are two patterns for populating CreatedBy as well as two patterns for populating UpdatedBy and UpdatedOn. Below you will find the pros and cons of each.

Pattern One For CreatedBy

The first pattern has the CreatedBy using the SYSTEM_USER function to populate the field. This will retrieve the windows login that the user used to login into the application.

Pros

This pattern allows the use of a default on the CreatedBy field. This allows the field to not be dependant on code to populate it.

All users have a validated account.

Accounts that are used by SSIS or services don’t need to have a row in the User table.

Cons

The field will be a varchar or an nvarchar.

The CreatedBy field will need to be parsed to find out which user inserted the record for reporting. (One way around this, is to remove the domain before inserting the (n)varchar).

This is my preferred technique since you don’t need to rely on the User table being immediately populated for a user to start using an application. It also makes it easier when the UpdatedBy field is updated through a pattern below.

Pattern Two for CreatedBy

This technique is based on an existing User table and uses their ID for the value of CreatedBy.

Pros

This pattern allows the CreatedBy field to join to the User table through an integer. Information about the user can be determined easily by joining to the User table but takes less space than the varchar/nvarchar mentioned above.

CreatedBy is not relying on windows authentication. This is helpful when not all users of the application have a windows login account.

Cons

This pattern requires a row for various tasks in the User table when SSIS or services insert data automatically.

This pattern requires CreatedBy to be populated by the application instead of by a default value. This requires working with the AppDev team and getting the AppDev team to write some additional code for you. (If you need to resort to bribing them, I suggest pizza, donuts, chocolate, or beer.)

Now let’s talk about UpdatedOn and UpdatedBy. They are a little more tricky to populate since they are populated on an update.

Pattern One For UpdatedBy and UpdatedOn

This pattern requires code to populate the fields since there are no defaults on updated rows. As stated above, you’ll have to work with the AppDev team.

Note: This is not a bad thing. I whole heartedly believe in having a good relationship with the AppDev team. It makes it easier to work with the team when they have the ability to write SQL. It also has the benefit of being included earlier in design sessions. (I personally have been blessed with a great AppDev lead.)

Pros

This pattern can be used to populate the UpdatedBy field and even the UpdatedOn fields using similar code.

Either User IDs or Windows account can be used to populate UpdatedBy.

Cons

The pattern is dependent on the AppDev team to remember to include UpdatedBy and UpdatedOn for every updated row, especially when there are tight deadlines.

You need to be vigilant that the fields are always updated, especially with lookup tables.

This is my preferred technique since it doesn’t involve triggers, which we’ll see in pattern two below.

Pattern Two For UpdatedBy and UpdatedOn

This pattern uses the After Update trigger. The reason I choose the After Update trigger, is that it waits until all the changes are made to the table(s) being updated by other triggers.

Note: I do not encourage multiple tables to be updated in a single trigger. (Hmm. This sounds like a good future post.)

Pros

This can be used as a stop gap while the AppDev team refactors their code to populate UpdatedBy and UpdatedOn fields as well as CreatedBy.

There is no requirement to rely on the application code to populate these fields.

Cons

The UpdatedBy can’t be updated with the User’s Id when populating the field.

You might not be allowed to use triggers. (Note: Triggers are not bad when used correctly.)

If you are adding this pattern to existing triggers such as “AFTER INSERT, UPDATE, DELETE”, then make sure to check that the execution of this trigger was based on an update. This can be done by checking if there are any records in the Inserted and deleted tables that are used in triggers.

CREATE TRIGGER dbo.Trg_OrderStatusType_iud ON dbo.OrderStatusType
AFTER INSERT, UPDATE, DELETE
AS
SET NOCOUNT ON
IF EXISTS (SELECT 1 FROM Inserted) AND EXISTS (SELECT 1 FROM deleted)
BEGIN
UPDATE dbo.OrderStatusType
SET
UpdatedOn = ISNULL(i.UpdatedOn, GETDATE()) -- This makes sure if a NULL was used, it is overwritten with the current Date.
FROM
dbo.OrderStatusType AS ost
INNER Join inserted AS i ON ost.OrderStatusTypeID = i.OrderStatusTypeID
END
GO

Exceptions For Audit and Error Tables

These two types of tables are usually populated by inserts only. Most of the time the CreatedOn field is the only field that needs to be added. I would change the name to AuditedOn and ErroredOn respectively since the CreatedOn will most likely represent the creation of the original record.

Why are Lookup tables not in this list? They can be modified over time. Someone may need a value changed or the value needs to be discontinued. The row can’t be deleted when it is discontinued since that would break referential integrity.

In Summary

By using CreatedOn and UpdatedOn you can compare dates when data needs to be processed without having to compare the entire table.

One last thing on the UpdatedBy and UpdatedOn fields. For a long time I insisted that these two fields are NULL during an insert, but I have since changed my mind. By setting the UpdatedBy and UpdatedOn to the same values as CreatedBy and CreatedOn for an insert, you won’t have to deal with the ISNULL function or using the CreatedOn for (some) comparisons.

Thanks For All The Fish

I want to thank Robert L. Davis for hosting this month and I look forward to participating in future T-SQL Tuesday blog parties.

The Fourth Mission

In this installment of SQLCoOp, we are sharing our views on something we’ve learned this month. While I learned quite a bit this month in order to create a new SQLite Database with test data that mapped back to my SQL Server database, I will limit this post to how my first PowerShell script aided me in creating a new SQLite Database from Embarcadero’s ER/Studio Data Architect.

The Project

I’ve been working on a project recently where we have a SQL Server backend on our server and SQLite databases on our client machines. The two databases have quite a few tables in common, but the two database platforms don’t share the same toolsets. Here are the hurdles I had to overcome.

The SQL language that SQLite uses is similar to T-SQL, but not the same. This means scripting tables, constraints, and other SQL objects are slightly different.

I use Embarcadero’s ER/Studio Data Architect (DA) to do all of my data modeling, but there isn’t a driver that will work with DA to compare my database model to a SQLite database so that I can keep them in sync.

I needed to create test data in both databases, but I don’t have an environment where the databases can coexist so that I can write the proper insert scripts.

In this post, I’ll focus on the first two hurdles. The third hurdle was solved using Red Gate’s SQL DataGenerator, Python, and SSIS. I will save this story for another time since it is worthy of a blog post all on its own.

Hurdle 1: Understanding SQLite

The first thing I had to learn was the ins and outs of SQLite. I acquired a great book, Using SQLite and started reading. I also searched the depths of the internet. What I found was that SQLite likes to store almost everything as integer, numeric, and text, BUT there are data types that we are familiar with like varchar, datetime, etc. They just get “interpreted” as integers, numeric, and text depending on what the data types are. This is a good reference for understanding how the more familiar data types are actually stored in the SQLite database.

When I was first learning about SQLite, I mistook the storage classes as actual data types. I thought SQLite was going to use a type of implicit conversion, but SQLite is not the same as SQL Server. There is also a big benefit to using the more familiar data types instead of sticking to the storage classes that SQLite uses. The benefit has to do with the ADO.NET class and making life easier for your application developer who has to take data from SQL Server and get it into SQLite.

NOTE: You REALLY need to understand how dates work in SQLite. Depending on how the date and time are stored in the underlying database (string, integer, numeric) will determine what “zero” is equivalent to. Read this to find out more.

There are several software packages out in the world that will allow you to get visual data models of SQLite tables, but I already use Embarcadero’s ER/Studio Data Architect (DA) and I didn’t want to learn another one. I also didn’t want to have my SQL Server data models in one software package and my SQLite data models in a different one. So I had to figure out how to get around this hurdle…and I used PowerShell to do it.

The first thing I did, was to create my data models. My first SQLite data model was based on my initial research which implied I should store all my data in integer and text data types since that is what the engine will be using to store the data. This was a poor assumption on my part. Since I had both my SQL Server and my SQLite database models in DA, I was able to easily update my data types in my SQLite database with their SQL Server counter parts. I did have to inspect each field manually since not all attributes were the same between the two databases. For instance, I have to use CURRENT_TIMESTAMP instead of getdate() and I had to model auto incrementing columns differently.

Note: This is a sample database

SQL Server database

First iteration of my SQLite database

NOTE: I was able to use the Compare and Merge Utility by selecting the option to compare against another model.

Second iteration of my SQLite database

Making a SQLite Script from ER/Studio Data Architect

Before we go down this rabbit hole, there is another piece of information that you need to know about SQLite. You can’t add foreign key constraints after the table is created. This is very frustrating for me since I know of some existing tables that need foreign key constraints and it will be a bigger task than it would be in SQL Server to get foreign key constraints applied. For this project, it means I need to script the foreign keys as well as the primary keys in the table creation script.

There are two operations that need to be done in order to generate a SQLite script from DA. The first is to use the Generate Database Wizard to generate a whole database instead of using the Compare and Merge Utility to compare the database model with the actual database. This operation is outlined below in steps 1 though 6. The second operation is to use a simple PowerShell script to make the necessary changes to the script so that it will no longer be a T-SQL script made to run in SQL Server, but a SQL script ready to run in SQLite. This is outlined in step 7 below.

Here are the steps you’ll need to do:

Step 1

Make sure that all auto incrementing fields are not set up as primary keys in the database model. The reason for this, is all auto incrementing fields are primary keys, so the syntax for creating an auto incrementing field is very specific.

Take a look at the EmployeeAward table in DM SQLite above. It is setup as an identity column, but not as a primary key. In order for me to have a visual cue that the field is a primary key, I used an object called a Domain. Domains are templates that can be used throughout the database model so that data types, defaults, and even definitions remain consistent and can easily be changed throughout an entire database model or across all database models if the Repository feature of DA is being used.

Step 2

Right click the physical model that you will be working with, and select Generate Database. You will be walked through a wizard. At the end you are going to want to save your settings so that you don’t have to select them again.

Step 3

On Page 1 of the wizard, you can create individual scripts or a single long script. I’ve been doing both so that I can select a single table later on if I need to regenerate it. You will also need to select the folder that you would like the scripts to be created in. I created 3 folders for my project.

MultiScripts (This will hold scripts for each object separately)

SingleScript (This will hold a single script that contains all the objects.)

Standardize (This holds the PowerShell script called StandardizeSQL.ps1 which will be explained in step 7.)

Step 3

I don’t do anything on page 2 of the wizard since this script will not run in SQL Server.

Step 4

Page 3 of the wizard is where the fun starts. Set the following options for the tables you want to include in the script. These options are found on the General Options tab. All the other tabs don’t need to have their options modified…unless you used features like sparse columns that can’t be used in SQLite. Then you might need to fiddle with some of the other settings.

Step 5

Save your settings on page 4 of the wizard so that you can quickly run through the wizard next time.

Step 6

If you are asked to open your newly created script, then say no. We’re not done yet.

Step 7

Run the following PowerShell script. You can do this by right clicking the file containing the PowerShell script and selecting Run with PowerShell. Note: The ability to run PowerShell scripts needs to be enabled first.

The script will go through all the *.sql files in the indicated folder and change the T-SQL specific syntax to the SQLite syntax. Here are the syntactical changes.

Int will be changed to integer.

Identity columns will be changed to primary key autoincrement columns.

The words nonclustered and clustered will be removed from primary and foreign key constraints since all indexes are nonclustered and those keywords are not supported in SQLite.

Before you run the script below, you’ll need to change the path to your scripts on line 8.

Note: there is a space before the asterisk on line 8 and there are spaces on either side of “int” on line 12.

To follow our quest for SQL knowledge through this collaborative project, follow the #SQLCoOp tag on Twitter.

See you next time!!

On a SQL Collaboration Quest

Four SQL professionals gathered from the four corners of the world to share their SQL knowledge with each other and with their readers: Mickey Stuewe from California, USA, Chris Yates from Kentucky, USA, Julie Koesmarno from Canberra, Australia, and Jeffrey Verheul from Rotterdam, The Netherlands. They invite you to join them on their quest as they ask each other questions and seek out the answers in this collaborative blog series. Along the way, they will also include other SQL professionals to join in the collaboration.

As I was hanging our family Advent Calendar up, I thought I would make one for you, my readers. I’ll be sharing my scripts from two of my favorite products as well as tips on how to create your own.

Second Step Is Creating The Change Tracking Fields

Today I’ll share with you my macro called Add Change Tracking Attributes. This script will add the attributes InsertedBy, InsertedDTS, ModifiedBy, and ModifiedDTS to the selected entities. These attributes are based on the domains that were created by the macro in yesterday’s post found here. If you add these fields, and then decide you don’t want them, you can use my macro called Remove Attribute From Selected Tables, which I blogged about here to remove them.

Take a look at lines 81 and 82. These two lines are needed to be able to give the default a name and here is why: This macro creates attributes based off of domains. The Inserted attributes have defaults. If we don’t name the defaults, then SQL Server will provide names for them…and they are not pretty. In order for us to be able to override the domain, we have to set the enableoverride property for two properties. Then we’ll be able to provide a name to the default. This step can’t be done in the domain, because defaults are a type of constraint and all constraints need to have unique names.

Tomorrow I’ll show you a Red Gate Snippet used to create the triggers for the tables with these fields.

Bradley Balls (b|t) is hosting this month’s T-SQL Tuesday blog party. The party was started by Adam Machanic (b|t) in December of 2009. The topic this month is Second Chances. Bradley has asked us to write about something we would like to have changed if we were given a second chance. I’m going to write about something that I actually was given a second chance to do differently.Setting the stage

[Neo sees a black cat walk by them, and then a similar black cat walk by them just like the first one]Neo: Whoa. Déjà vu.
[Everyone freezes right in their tracks]Trinity: What did you just say?Neo: Nothing. Just had a little déjà vu.Trinity: What did you see?Cypher: What happened?Neo: A black cat went past us, and then another that looked just like it.Trinity: How much like it? Was it the same cat?Neo: It might have been. I’m not sure.Morpheus: Switch! Apoc!Neo: What is it?Trinity: A déjà vu is usually a glitch in the Matrix. It happens when they change something.

This is an ironic topic. Today is the last day at my current job and the topic I’ve chosen to write about has to do with one my first assignments four and half years ago. I first arrived at my company in January. Do you know what happens in January? The Sales Team has a new structure that needs to be applied to all of their reports as of yesterday. That particular year, they added a new layer to their hierarchy. The database model couldn’t handle it and neither could the reports. I proposed a new model using recursion, both in the database model and in the reports and it was approved. It proved to provide flexibility in the years to come. It had one flaw remaining though. It had maintained the current practice of assigning Clients to Sales People. That doesn’t sound too bad, until you know that when a Sales Person leaves, all of their Client records have to be updated… one by one by someone in sales. It also caused problems when there wasn’t a Sales Person available to assign to the clients right away.

Changing direction

This past January I had an opportunity to improve upon my original design. I simply changed directions. In the past, each Client had a Sales Person and each Sales Person had a Territory. Now, each Client has a Territory and each Territory has a Sales Person. If someone leaves, only ONE Territory record needs to be updated with a new Sales Person. If a new Sales Person is not available, then the Territory still shows up in the reports. This change was completely transparent to the report users.

The best part came a month after the new model was implemented. The Sales Team needed to have a single Sales Person represent different Territories in different Parent Territories. That was not possible with the old model. A Sales Person could only have one Territory, but with the new model it was possible… and it was already in place.

Thanks for all the fish

Thanks go out to Bradley Balls for hosting this months T-SQL Tuesday blog party. Please visit his website at http://www.sqlballs.com.