Problem

Reading some of the comments from
my last tip got me thinking about the performance of
table joins on different
datatypes. I had always assumed (I know I should know better) that as long
as the columns, regardless of whether or not there is a foreign key relationship
defined, are indexed that there should be no performance issues based on datatype
of the join column chosen to join two tables. In this tip we will compare
joining tables using the following datatypes to see if there is in fact any performance
difference: INT, BIGINT, GUID (one randomly generated and generated using the newsequentialid()
function).

Solution

In order to perform this test, let's setup 4 similar sets of parent/detail
tables. The only difference will be that each parent table primary key and
its associated detail table join column will use a different column type: one of
INT, BIGINT and GUID (one randomly generated and generated using the
newsequentialid() function). Below is the T-SQL to generate these tables
and load some sample data.

As with the last test we did let's run through some
SELECT queries both
before and after the indexes on the tables have been rebuilt to see if there is
any difference in performance with the indexes being more or less fragmented.
Below is the state of the
indexes after the initial data load.

TableName

IndexName

IndexType

FragmentationPct

PageCount

parent_seqguid

PK__parent_s__CF31D692708D8E83

CLUSTERED

0.71

1253

parent_int

PK__parent_i__CF31D69240B1A39F

CLUSTERED

0.42

952

parent_guid

PK__parent_g__CF31D6926ABDEAE2

CLUSTERED

99.23

1821

parent_bigint

PK__parent_b__CF31D6927AAC240B

CLUSTERED

0.47

1052

detail_seqguid

detail_seqguid_join_col

NONCLUSTERED

0.83

3220

detail_int

detail_int_join_col

NONCLUSTERED

0.57

1733

detail_guid

detail_guid_join_col

NONCLUSTERED

99.38

5970

detail_bigint

detail_bigint_join_col

NONCLUSTERED

0.67

2231

To test the performance we will run two different SELECT statements. One
that returns all the records from the parent and the associated records from the
detail table. The second query will return a single record from the parent
and the associated record from the detail table. Note that when querying a
single record from the parent table you need to make sure that the detail table
has the same number of records to ensure the test results are not skewed based on the
size of the dataset returned. Below is the T-SQL for these statements.

As always we'll use
SQL Profiler to collect the performance statistics. Below are the SQL
Profile results from this first test run.

Table/Query

CPU (ms)

Reads

Duration (ms)

parent_int/Full Join

1295

6242

10431

parent_bigint/Full Join

1397

6832

11523

parent_guid/Full Join

1483

8616

14734

parent_seqguid/Full Join

2512

8208

14008

parent_int/Single Record

0

27

2

parent_bigint/Single Record

0

27

1

parent_guid/Single Record

0

27

2

parent_seqguid/Single Record

0

27

1

Looking at these results we see that in order to join the entire table the seqguid
case uses quite a bit more CPU than the other 3 cases. Also both GUID cases
performed ~20% more IO than the integer test cases. In the test cases where
the query was limited to just one record from the parent table the performance was
almost identical for all 4 test cases. Even though we have quite a bit of
fragmentation with the index on the random GUID column it still performs just as
well as the other datatypes in this case.

Now let's rebuild our indexes, verify we don't have any more fragmentation
and run through our test cases again. Below is the T-SQL to rebuild the indexes.

Now let's rerun our select queries again and see if there is any difference
with performance. Below are the
SQL Profiler results from this second test run.

Table/Query

CPU (ms)

Reads

Duration (ms)

parent_int/Full Join

1102

6243

10231

parent_bigint/Full Join

1328

6832

11275

parent_guid/Full Join

1295

8044

14101

parent_seqguid/Full Join

2532

8203

14206

parent_int/Single Record

0

27

2

parent_bigint/Single Record

0

27

2

parent_guid/Single Record

0

27

2

parent_seqguid/Single Record

0

27

1

These results match almost exactly with our first test run. The only difference
being, now that the index has been rebuilt, the random GUID datatype is performing
fewer reads in the case where we are querying the entire parent table. Although
it's not quite as good as either of the integer test cases.

So there we have it. At least for this simple test case, we have demonstrated
that there is very little difference with table join performance using these 4 datatypes.

About the author

Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name
*Email
Email me updates

Signup for our newsletter
I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.

Looking at these results we see that in order to join the entire table the seqguid case uses quite a bit more CPU than the other 3 cases.

[/quote]

But the table above is showing the BigInt join with the highest CPU usage. 2532. Am I missing something?

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.