As a DBA working with SQL Server, some data issues arise that need to be fixed. One such data issue is the duplication of data. Duplicate data can happen for a number of reasons. One such reason is the absence of constraints such as Primary Keys.

Luckily, there are ways for the Data Professional to clean up such a quagmire. First, let’s set the stage with a really simple example.

Transact-SQL

1

2

3

4

5

6

7

8

CreateTableDupeDemo (DemoIDInt,SomeIntInt)

InsertIntoDupeDemo

SELECTTOP10000

DemoID=ABS(CHECKSUM(NEWID()))%100+1,

SomeInt=ABS(CHECKSUM(NEWID()))%50+1

FROMMaster.dbo.SysColumnst1

CROSSJOINMaster.dbo.SysColumnst2

Here, we are simply creating a table and populating the table with two columns of randomized data. This should give us a significant amount of duplicate data and a good enough test bed to demonstrate how to quickly fix this issue. Due to the random nature of these inserts, we will get a significant range of duplicates for some values and maybe no duplicates at all for other values. (Did that make sense?)

Let’s check our work and see how well we did with the duplicate creation.

Transact-SQL

1

2

3

Select*

FromDupeDemo

OrderByDemoID,SomeInt

Each time I have run this, I have come up with a good test bed. It should work relatively well for you as well. Now, we can work on eliminating those duplicates. To better illustrate that we have dupes and the distribution, let’s have a look at this query.

Transact-SQL

1

2

3

4

SelectDemoID,SomeInt,COUNT(SomeInt)asCounted

FromDupeDemo

GroupByDemoID,SomeInt

OrderByDemoID,SomeInt

This query will yield results like what we see in the image to the right.

Similarly, I can run a query that would be contained within a CTE and take advantage of the ROW_Number() function to help find where I may have dups.

Transact-SQL

1

2

3

4

5

6

7

8

withdedupeas(

SelectDemoID,SomeInt

,ROW_NUMBER()over(partitionbyDemoID,SomeIntorderbySomeint)asRownum

FromDupeDemo

)

Select*Fromdedupe

WhereRownum>1

OrderByDemoID,SomeInt

Now that we know for certain that we have a large amount of dupes, lets finally work on eliminating them. We can use a query just like the last one and alter it a bit to make it work so we can delete the dups. Under normal circumstances, I would make absolutely certain that the tables to be affected by these deletes were backed up. You can accomplish that by either copying the data in the table into a new table or you can run a database backup.

Transact-SQL

1

2

3

4

5

6

7

withdedupeas(

SelectDemoID,SomeInt

,ROW_NUMBER()over(partitionbyDemoID,SomeIntorderbySomeint)asRownum

FromDupeDemo

)

Deletededupe

WhereRownum>1

Really easy script. This will delete all source table records that are dupes. You can validate that by rerunning either of the first two queries used to demonstrate that duplicate data existed. This is illustrated in the following image.

Since SQL 2005, removing bad duplicate data has become substantially easier. This example should help to demonstrate that as well as provide a method to create a repeatable test while learning the technique.

In SQL Server, there is a good set of preloaded reports to help you monitor the Instance. These reports can be found under the Standard Reports and can be reached via context menu. To find them, right click your Instance in Object Explorer from within SSMS. Then navigate the context menu – much like the menus shown here.

You can see from the image that there is a decent amount of good reports available to choose.

When you run one of these reports, it is like a report that you would run from SSRS. However, SSRS is not required to be installed in order to run these reports.

Occasionally, you may encounter an error when trying to run these reports. The error that you may encounter is as follows:

This error message seems to pop up commonly after an upgrade to SQL Server is performed (CU or SP). Sometimes, it can crop up without an upgrade having been recently performed.

In the event that you encounter this particular error, you should try installing BIDS. Though it is not necessarily a pre-requisite – it is a good idea to have it installed anyway and it also resolves this error message.

Additionally, besides looking into these Standard reports and hopefully helping you circumvent the aforementioned error, there is another recommendation. In SQL 2005, there is a download for the Performance Dashboard Reports. And in SQL 2008 there is a new feature called Management Data Warehouse.

As database professionals, we have a need to benchmark performance of the database, processes, and essentially overall performance. When benchmarking, it is preferable to get a baseline and then run the same benchmark tests on a periodic basis and compare those results to the baseline.

Recently I was reminded of a couple of tools that should be in every DB Professionals vocabulary and tool set. Each one is used for a different purpose. Those tools are:

How many DB professionals have never had to deal with bitwise operations in SQL Server? Who has never had a single value in the database represent more than one data value? Have you ever had one of these fields serve as the(implicit or explicit) foreign key to a source table?

Sometimes it can seem a bit tricky dealing with such data – especially the last. I’m not going to delve into the complexity of such scenarios. I do want to present a basic intro however.

There are three basic operators for bitwise operations. The operators are &, |, and ^. These operators perform logical operations against integer type data. Here is an example of what each would return when using the same values:

[codesyntax lang=”tsql”]

1

select175&amp;75asBitAnd,175|75asBitOr,175^75asBitXOr

[/codesyntax]

[codesyntax lang=”tsql”]

1

2

BitAnd BitOr BitXOr

11239228

[/codesyntax]

When performing a Bit & operation, the bits in the result are set to 1 if and only if both bits (for the current bit being resolved) in the input expressions have a value of 1. Any other combination results in a 0.

When performing a Bit | operation, the bits in the result are set to 1 if either of the corresponding bits is a 1. A 0 is returned when both values are 0 in the corresponding bit.

When performing a Bit ^ operation, the bits in the result are set to a value of 1 if 1 of the two bits (but not both) has a value of 1. Any other combination (either all 0s or all 1s) will result in a value of 0.

As I work with this a bit, I will be sharing a particular scenario that has been quite interesting for me.

I often see a request for some scripts to help with database maintenance. Sometimes those questions come in the form of recommendation requests for maintenance plans. As many already know, there are some really good recommended scripts by Ola Hallengren. There is also the fantastic index defrag script by Michelle Ufford.

Under normal circumstances, my recommendation would be that the maintenance of your database depends on your environment. However, that doesn’t always work well. Sometimes, something is needed quick to get up and in place while you figure things out in your environment and for your database (let’s say you just started a new job or you just inherited the database duties).

Under such circumstances, it would be good to have something ready to go already. Thus, I recommend using these resources and learn from them.

This is just a quick reminder about the meeting coming up on March 10, 2011 at 6:30 PST. You can find more information about the event from here.

I am looking forward to this presentation. I think this presentation can be highly useful for both Production DBA as well as Development DBA. Please join us Thursday evening to get some good information. Anybody who wants to attend is welcome to attend – whether from Vegas or not. (I will be attending afterall and currently do not live in Vegas.)

We will be recording the meeting and providing a link to it after the meeting. So if you can’t make it, you may still be able to access it.

Last Year I introduced a couple of scripts that I worked on. Those scripts developed into a series covering the comparison of some methods and MS provided stored procs that could help you in the retrieval of table sizes in SQL Server.

I pulled out one of those scripts recently in order to find what tables, in an R and D database, were consuming alot of space. While running that script, I realized that it could easily be used to help me with an additional task. I decided that this script could help me deduce the top 10 biggest tables in a database that doesn’t necessarily need to have that data. Or maybe, I just need to clean out the data so I can test populating the database. This script is predicated on a lack of foreign keys – but can easily be adapted.

I think the script, along with prior explanations, is pretty straight forward. This can quickly help reset those LARGE tables for continued testing. Of course, that is predicated on you not already having a script for that, and that you don’t know what tables need to be reset (maybe you are in a large team).

This is just one example of a script that can be useful for more than what it was first designed to do. As DB professionals, we often come across situations were a prior script can easily be repurposed for the task at hand. Knowing that, and how to do it, is an important tool in your toolbox.

The Ides of March are upon us and that means that we have another opportunity to learn some stuff about SQL Server. We have that opportunity because it is time for the monthly S3OLV User Group meeting.

This month, Glenn Berry of SQL Server MVP fame has volunteered to present to our group via Livemeeting. Glenn will be teaching us how to properly Select and Size database hardware in regards to OLTP performance. Here is the abstract for this presentation.

The foundation of database performance is the underlying server hardware and storage subsystem. Even the best designed and optimized database application can be crippled by an inadequate hardware and storage infrastructure. Recent advances in new processors and chipsets, along with improvements in magnetic and SSD storage have dramatically changed the evaluation and selection process compared to the past. Many database professionals struggle to keep up with new technology and often simply let someone else make their hardware selection and sizing decisions. Unfortunately, the DBA usually gets the blame for any performance issues that crop up later. Don’t let this happen to you! This session covers current and upcoming hardware from both Intel and AMD and gives you the tools and resources to make better hardware selection decisions to support SQL Server OLTP workloads.

Details

We will be holding this learning opportunity March 10, 2011 between 6:30 PM PST and 8:30PST.

A little about our Presenter

Glenn works as a Database Architect at NewsGator Technologies in Denver, CO. He is a SQL Server MVP, and he has a whole collection of Microsoft certifications, including MCITP, MCDBA, MCSE, MCSD, MCAD, and MCTS, which proves that he likes to take tests. His expertise includes DMVs, high availability, hardware selection, full text search, and SQL Azure. He is also an Adjunct Faculty member at University College – University of Denver, where has been teaching since 2000. He recently completed the Master Teacher Program at Denver University – University College. He is the author of two chapters in the book SQL Server MVP Deep Dives, and blogs regularly at http://sqlserverperformance.wordpress.com.

Several moons ago I learned that I had won one of the categories for the 2010 SQLServerpedia Blogger Awards. You can read about that here. Of all of the promises made in that post, I have yet to fulfill two of them (one is being fulfilled now). I can’t fulfill the other promise because I do not have a cubicle at work – just an open work area. That said, I am prominently displaying my trophy in my work area.

As you can see in the image, I have displayed my trophy (though the photo does not prove that it is at work).

Again, I am thankful to have won and I will continue to blog and try to put out useful information to the blogosphere.

Thanks to all who voted for my entry (which was really just one part of a series).

P.S. I have a really good reason for this post coming out so late too. My award was delivered in the middle of me moving from Vegas back to Utah.

Well, we are coming around into a new year. With this new year there is once again PASS Summit preparations that are well under way. Time has come again for people to volunteer to help with PASS Summit. The call has gone out!! Allen Kinsel has blogged about it here.

For those that haven’t volunteered or helped in the past, it doesn’t require too much and there are many ways in which you can help. Assistance is needed from abstract review to speaker ratings to some database work to being an usher during the Summit. PASS has put together a little survey to help you in volunteering for the Summit 2011. You can find the survey here.

So what do you get by volunteering to help at PASS? You get a sense of accomplishment for having helped, in whatever miniscule way, to put on a pretty big Conference for the best Community out there – the SQL Server Community. I liked helping out last year and have volunteered again. I hope to be selected and wouldn’t mind doing something different this year. On the flip-side, I wouldn’t mind doing the same things I did last year.

I know, this wasn’t much of a motivational speech. Hopefully this works better for you – JUST DO IT.

Within the world of SQL Server there are a few things one can be certain of – things will change. This is true of the features in SQL Server. Additionally, Extended Events is constantly evolving which underscores this constant change.

What is this gaping hole in the coverage of Extended Events? To be honest, it is not a very complicated topic or very difficult gap to fill. It’s just something that has been overlooked. The gap boils down to this: how does one consistently find the correct path to the Extended Event Log file (XEL file)?

Legislation and regulation sometimes dictates that certain activities must be tracked within a database. On occasion, it will be required that queries be audited and tracked to reach compliance with the legislation or regulation. To achieve this compliance, this article will demonstrate how to use Extended Events to audit statements being executed within the database.