In this chapter from Inside Microsoft SQL Server 2008 T-SQL Programming, get a clear picture of how the different temporary object types behave, in which circumstances you should use each, and whether you should use them at all.

Itzik Ben-Gan

T-SQL programming often involves the need to materialize data temporarily. Temporary tables are just one solution; other ways for handling an independent physical or logical materialization of a set include table variables and table expressions such as views, inline user-defined functions (UDFs), derived tables, and common table expressions (CTEs).

You might need to physically persist interim states of your data for performance reasons, or just as a staging area. Examples of such scenarios include:

Materializing aggregated data to some level of granularity (for example, employee and month), and issuing running, sliding, and other statistical reports against that data

Materializing a result of a query for paging purposes

Materializing result sets of interim queries, and querying the materialized data

Materializing the result of a query with the GROUPING SETS, CUBE and ROLLUP options, and issuing queries against that data

Walking through the output of a cursor and saving information you read or calculate per row for further manipulation

Pivoting data from an Open Schema environment to a more traditional form, and issuing queries against the pivoted data

Creating a result set that contains a hierarchy with additional attributes such as materialized paths or levels, and issuing reports against the result

Holding data that needs to be scrubbed before it can be inserted

One of the benefits of materializing data in a temporary table is that it can be more compact than the base data, with preprocessed calculations, and you can index it when it might be inefficient or impractical to index all the base data. In terms of performance, you typically benefit from materializing the data when you need to access it multiple times, but in some cases, even when all you have is a single query against the data, you benefit.

You might also need to materialize interim sets logically in virtual temporary tables (table expressions) to develop solutions in a modular approach. I’ll show examples in this chapter that address this need as well. Either way, there are many cases in which using temporary tables, table variables, or table expressions can be useful.

There’s a lot of confusion around choosing the appropriate type of temporary object for a given task, and there are many myths regarding the differences between temporary tables and table variables. Furthermore, temporary tables and table variables are often misused because of lack of knowledge of efficient set-based programming.

In this chapter, I will try to provide you with a clear picture of how the different temporary object types behave, in which circumstances you should use each, and whether you should use them at all. At the end of the chapter, I’ll provide a summary table (Table 7-1) that contrasts and compares the different types. This table covers the factors you should take into consideration before making your choice.

Temporary Tables

SQL Server supports two types of temporary tables: local and global. For the most part, I’ll focus on local temporary tables because this is the type you would typically consider in the same situations as table variables and table expressions. I’ll also describe global temporary tables, but these typically have different uses than local temporary tables.

Local Temporary Tables

I’ll start with some fundamentals of local temporary tables before showing examples, and I’ll do the same whenever discussing a new temporary object type. When referring to temporary tables in this section, assume that the discussion pertains to local ones.

You create and manipulate a temporary table just as you would a permanent one, for the most part. I’ll point out the aspects of temporary tables that are different from permanent ones, or aspects that are often misunderstood.

tempdb

Temporary tables are created in tempdb, regardless of the database context of your session. They have physical representation in tempdb, although when they’re small enough and Microsoft SQL Server has enough memory to spare, their pages reside in cache. SQL Server persists the temporary table’s pages on disk when there is too little free memory. Furthermore, tempdb’s recovery model is SIMPLE and cannot be changed. This means that bulk operations against temporary tables can benefit from minimal logging. Also, SQL Server supports a deferred drop feature in tempdb. When the application drops a large temporary table SQL Servers defers the drop activity to a background thread, so the application can continue working immediately.

Unlike user databases, tempdb is created from scratch as a copy of the model database every time you restart SQL Server, hence there’s no need for a recovery process in tempdb. This fact leads to optimizations that you can benefit from when modifying data in tempdb regardless of the object type you are working with (temp table, table variable, or even a regular table). The transaction log doesn’t need to be flushed to disk and therefore transactions in tempdb are committed faster. Also, certain types of modifications against objects in tempdb (mainly INSERT and UPDATE operations on heap and LOB data) can benefit from optimized logging: because you don’t need to run a redo phase from the log (roll forward transactions that were committed after the last checkpoint) only the value before the change needs to be recorded in the log—not the value after the change. Later in the chapter I’ll provide more details about working with tempdb.

One reason to use a temporary table is to take the load off of a user database when you need to persist temporary data. You can also enjoy the fact that tempdb is treated differently from user databases.

TIP

My preferred method for checking whether an object already exists is to use the OBJECT_ID function. If the function returns NULL, the object doesn’t exist. If you want to check whether a temporary table already exists, make sure you specify the tempdb database prefix; otherwise, SQL Server looks for it in the current database, doesn’t find it, and always returns NULL. For example, to check whether #T1 exists, use OBJECT_ID(‘tempdb..#T1’) and not OBJECT_ID(‘#T1’).

Also, SQL Server supports a second argument for OBJECT_ID, where you can specify the object type you’re looking for (for example, ‘U’ for user table). The second argument’s value must match the type column in sys.objects.

Scope and Visibility

Temporary table names are prefixed with a number symbol (#). A temporary table is owned by the creating session and visible only to it. However, SQL Server allows different sessions to create a temporary table with the same name. Internally, SQL Server adds underscores and a unique numeric suffix to the table name to distinguish between temporary tables with the same name across sessions. For example, suppose that you created a temporary table called #T1. If you query the view sys.objects in tempdb looking for a table with the name LIKE ‘#T1%’, you will find a table with a name similar to the following (the suffix will vary):

#T1__________________________________________________________________________________________________________00000000001E. Although this is the table’s internal name, you refer to it in your code by the name you used when you created it—#T1.

Within the session, the temporary table is visible only to the creating level in the call stack and also inner levels, not to outer ones. For example, if you create a temp table in the session’s outermost level, it’s available anywhere within the session, across batches, and even in inner levels—for example, dynamic batch, stored procedure, and trigger. As long as you don’t close the connection, you can access the temporary table. If it’s created within a stored procedure, it’s visible to the stored procedure and inner levels invoked by that procedure (for example, a nested procedure or a trigger). You can rely on the visibility behavior of temporary tables—for example, when you want to pass data between different levels in your session, or even just signal something to an inner level and that inner level doesn’t support input parameters (for example, a trigger). However, in some cases, you can pass such information through the context_info feature, which is visible across the session. (See SET CONTEXT_INFO in SQL Server Books Online for details.)

When its creating level gets out of scope (terminates), a temporary table is automatically destroyed. If a temporary table was created in the outermost level, it is destroyed when the session is terminated. If it’s created within a stored procedure, it is automatically dropped as soon as the stored procedure is finished.

Remember that a temporary table is not visible to levels outside of the creating one in the call stack. That’s why, for example, you can’t use a temporary table created in a dynamic batch in the calling batch. When the dynamic batch is out of scope, the temporary table is gone. Later in the chapter, I’ll suggest alternatives to use when such a need occurs. The next part, regarding the scope, is a bit tricky. You can, in fact, create multiple temporary tables with the same name within the same session, as long as you create them in different levels—although doing so might lead to trouble. I’ll elaborate on this point in the Temporary Table Name Resolution section later in the chapter.

The scope and visibility of a temporary table are very different than they are with both permanent tables and table variables and can be major factors in choosing one type of temporary object over another.

Transaction Context

A temporary table is an integral part of an outer transaction if it’s manipulated in one (with DML or DDL). This fact has consequences for logging and locking. Logging has to support rollback operations only, not roll-forward ones. (Remember, there is no recovery process in tempdb.) As for locking, because the temporary table is visible only to the creating session, less locking is involved than with permanent tables, which can be accessed from multiple sessions.

Therefore, one of the factors you should consider when choosing a temporary object type is whether you want manipulation against it to be part of an outer transaction.

Statistics

The optimizer creates and maintains distribution statistics (column value histograms) for temporary tables and keeps track of their cardinality, much as it does for permanent ones. This capability is especially important when you index the temporary table. Distribution information is available to the optimizer when it needs to estimate selectivity, and you will get optimized plans that were generated based on this information. This is one of the main areas in which temporary tables differ from table variables in terms of performance.

Also, because statistics are maintained for temporary tables, queries against your temporary tables will be recompiled because of plan optimality reasons (recompilation threshold reached, statistics refreshed, and so on). The recompilation threshold is reached when a sufficient number of rows of a referenced table have changed since the last compilation. The recompilation threshold (RT) is based on the table type and the number of rows. For permanent tables, if n ≤ 500, then RT = 500 (n = table’s cardinality when a query plan is compiled). If n > 500, then RT = 500 + 0.20 × n. For temporary tables, if n < 6, then RT = 6. If 6 ≤ n ≤ 500, then RT = 500.If n > 500, then RT = 500 + 0.20 × n. You realize that, for example, after inserting six rows into a temporary table, adding a seventh will trigger a recompile, whereas with permanent tables the first trigger will occur much later. If you want queries against temporary tables to use the same recompilation thresholds as against permanent ones, use the KEEP PLAN query hint.

The fact that the optimizer maintains distribution statistics for temporary tables and the aforementioned implications are the most crucial aspects of choosing a temporary object type. These factors are especially important when choosing between temporary tables and table variables, for which the optimizer doesn’t create or maintain distribution statistics. Rowcount information is maintained for table variables (in sys.partitions) but this information is often inaccurate. Table variables themselves do not trigger recompiles because of plan optimality reasons, and recompiles are required to update the rowcount information. You can force a recompile for a query involving table variables using the RECOMPILE query hint.

You must ask yourself two main questions when considering which type of temporary object to use:

Does the optimizer need distribution statistics or accurate cardinality estimations to generate an efficient plan, and if so, what’s the cost of using an inefficient plan when statistics are not available?

What’s the cost of recompilations if you do use temporary tables?

In some cases the optimizer doesn’t need statistics to figure out an optimal plan—for example, given a query requesting all rows from a table, a point query filtering a column on which a unique index is defined, a range query that utilizes a clustered or covering index, and so on. In such cases, regardless of the table’s size, there’s no benefit in having statistics because you will only suffer from the cost of recompilations. In such cases, consider using a table variable.

Also, if the table is tiny (say, a couple of pages), the alternatives are 1) using a table variable resulting in complete scans and few or no recompilations; or 2) using a temporary table resulting in index seeks and more recompilations. The advantage of seeks versus scans may be outweighed by the disadvantage of recompiles. That’s another case for which you should consider using table variables.

On the other hand, if the optimizer does need statistics to generate an efficient plan and you’re not dealing with tiny tables, the cost of using an inefficient plan might well be substantially higher than the cost of the recompilations involved. That’s a case in which you should consider using temporary tables. In the Table Variables section, I’ll provide examples related to these scenarios in which I’ll also demonstrate execution plans.

Temporary Table Name Resolution

As I mentioned earlier, technically you’re allowed to create multiple local temporary tables with the same name within the same session, as long as you create them in different levels. However, you should avoid doing this because of name-resolution considerations that might cause your code to break.

When a batch is resolved, the schema of a temporary table that is created within that batch is not available. So resolution of code that refers to the temporary table is deferred to run time. However, if a temporary table name you refer to already exists within the session (for example, it has been created by a higher level in the call stack), that table name will resolve to the existing temporary table. However, the code will always run against the innermost temporary table with the referenced name.

This resolution architecture can cause your code to break when you least expect it; this can happen when temporary tables with the same name exist in different levels with different schemas.

This part is very tricky and is probably best explained by using an example. Run the following code to create the stored procedures proc1 and proc2:

proc1 creates a temporary table called #T1 with a single integer column, inserts a row with the value 1, returns #T1’s contents, and invokes proc2, which also creates a temporary table called #T1 with a single integer column, inserts a row with the value 2, and returns #T1’s contents. Both #T1 tables have the same schema. Now, invoke proc1:

EXEC dbo.proc1;

The output is what you probably expected:

col1
-----------
1
col1
-----------
2

Both procedures returned the contents of the #T1 table they created. Being oblivious to the resolution process I described earlier doesn’t really affect you in this case. After all, you did get the expected result, and the code ran without errors. However, things change if you alter proc2 in such a way that it creates #T1 with a different schema than in proc1:

Can you explain the error? Admittedly, the problem in the resolution process I described is very elusive, and you might not have realized it after the first read. Try to read the paragraph describing the resolution process again, and then see whether you can explain the error. Essentially, when proc2 was invoked by proc1, a table called #T1 already existed. So even though proc2’s code creates a table called #T1 with two columns and inserts a row with two values, when the INSERT statement is resolved, proc2’s #T1 does not exist yet, but proc1’s does. Therefore, SQL Server reports a resolution error—you attempt to insert a row with two values to a table with one column (as if).

If you invoke proc2alone, the code has no reason to fail because no other #T1 table exists in the session—and it doesn’t fail:

EXEC dbo.proc2;

You get an output with the row loaded to proc2’s #T1:

col1 col2
----------- -----------
2 2

The execution plan for proc2 now resides in cache. Ironically, if you now run proc1 again, the code will complete without errors. proc2 will not go through a resolution process again (neither will it go through parsing or optimization); rather, SQL Server simply reuses the plan from cache:

EXEC dbo.proc1;

And now you get the output you probably expected to begin with:

col1
-----------
1
col1 col2
----------- -----------
2 2

However, if proc2’s plan is removed from cache and you run proc1, your code will break:

EXEC sp_recompile 'dbo.proc2';
EXEC dbo.proc1;

This generates the following output:

Object 'dbo.proc2' was successfully marked for recompilation.
col1
-----------
1
Msg 213, Level 16, State 1, Procedure proc2, Line 5
Column name or number of supplied values does not match table definition.

In short, I hope that you realize it’s wise to avoid naming temporary tables the same in different stored procedures/levels. A way to avoid such issues is to add a unique proc identifier to the names of temporary tables. For example, you could name the temporary table in proc1 #T1_proc1, and in proc2 name the temporary table #T1_proc2.

When you’re done, run the following code for cleanup:

IF OBJECT_ID('dbo.proc1', 'P') IS NOT NULL DROP PROC dbo.proc1;
IF OBJECT_ID('dbo.proc2', 'P') IS NOT NULL DROP PROC dbo.proc2;

Schema Changes to Temporary Tables in Dynamic Batches

Remember that a local temporary table created in a certain level is not visible to outer levels in the call stack. Occasionally, programmers look for ways around this limitation, especially when working with dynamic execution. That is, you want to construct the schema of the temporary table dynamically and populate it based on some user input, and then access it from an outer level. Frankly, insisting on using local temporary tables in such a scenario is very problematic. The solution involves ugly code, as is the nature of dynamic SQL in general, plus recompilations resulting from schema changes and data modifications. You should consider other alternatives to provide for the original need. Still, I want to show you a way around the limitations.

Here’s an initial algorithm that attempts to provide a solution for this request:

In the outer level, create temporary table #T with a single dummy column.

Within a dynamic batch, perform the following tasks:

Alter #T, adding the columns you need.

Alter #T, dropping the dummy column.

Populate #T.

Back in the outer level, access #T in a new batch.

The problem with this algorithm lies in the last item within the dynamic batch. References to #T will be resolved against the outer #T’s schema. Remember that when the batch is resolved, #T’s new schema is not available yet. The solution is to populate #T within another dynamic batch, in a level inner to the dynamic batch that alters #T’s schema. You do this by performing the following tasks:

In the outer level, create temporary table #T with a single dummy column.

Caching of Temporary Objects

SQL Server 2008 supports the caching of temporary objects across repeated calls of routines. This feature is applicable to local temporary tables, table variables, and table-valued functions used within routines such as stored procedures, triggers, and user-defined functions. When the routine finishes, SQL Server keeps the catalog entry. If the object is smaller than 8 MB, SQL Server keeps one data page and one IAM page, and uses those instead of allocating new ones when the object is created again. If the object is larger than 8 MB, SQL Server uses deferred drop, and immediately returns control to the application. This feature results in reduction of contention against system catalog tables and allocation pages, and in faster creating and dropping of temporary objects.

I’ll demonstrate caching of temporary objects (across repeated calls of routines) through an example. Run the following code to create a stored procedure called TestCaching that creates a temporary table called #T1 and populates it with a few rows:

Run the following query to determine which entries representing temporary tables exist in the system catalog:

SELECT name FROM tempdb.sys.objects WHERE name LIKE '#%';

At this point there are no entries in the system catalog representing temporary tables; therefore, this query returns an empty set.

Execute the TestCaching procedure:

EXEC dbo.TestCaching;

The stored procedure terminated, but the temporary table was cached—or more specifically, SQL Server kept its entry in the system catalog, an IAM page, and a data page. Query tempdb.sys.objects again:

SELECT name FROM tempdb.sys.objects WHERE name LIKE '#%';

This time you get an entry back representing the temporary table that was cached. I got the following output (but of course you will get a different table name):

name
-----------
#2DE6D218

If the procedure’s execution plan is recompiled or removed from cache, SQL Server removes the cached temporary objects that were created by the stored procedure from cache as well. SQL Server also removes cached temporary objects when tempdb has little free space.

Run the following code to mark the stored procedure for recompile, causing the associated cached temporary object to be removed from cache:

EXEC sp_recompile 'dbo.TestCaching';

Query sys.objects:

SELECT name FROM tempdb.sys.objects WHERE name LIKE '#%';

The query should return an empty result set. If not, try again in a few seconds, because the table is dropped in the background.

Note that in the following cases SQL Server will not cache temporary objects across procedure calls:

When you issue a DDL statement against the temporary table after it was created “e.g., CREATE INDEX”.

When you define a named constraint.

When you create the temporary object in a dynamic batch within the routine.

When you create the temporary object in an ad-hoc batch (not within a routine).

I’ll first demonstrate the effect of applying DDL changes post-creation of the temporary table. Run the following code to alter the procedure TestCaching, adding an index to the temporary table after it was created:

EXEC dbo.TestCaching;
SELECT name FROM tempdb.sys.objects WHERE name LIKE '#%';

This returns an empty result set, indicating that the temporary table wasn’t cached.

As a workaround, you can include an unnamed UNIQUE or PRIMARY KEY constraint as part of the temporary table definition. The constraint implicitly creates a unique index on the constraint keys. Run the following code to test this approach:

This time the query against sys.objects should report one temporary table. I got the following output:

name
-----------
#3A4CA8FD

Note that you can create composite indexes implicitly without sacrificing caching by including a composite UNIQUE or PRIMARY KEY constraint in your table definition, as in UNIQUE(col1, col2, col3).

As for named constraints, you might find this restriction odd, but naming a constraint prevents SQL Server from caching your temporary objects. You just saw in the last example that when the UNIQUE constraint was not named, SQL Server cached the temporary table. Now try the same example, but this time name the constraint:

This time the temporary object wasn’t cached (again, it may take a few seconds for the temporary object that was cached previously to be removed from cache). So even though naming constraints is in general a good practice, bear in mind that if you want to benefit from caching of temporary objects, you shouldn’t name them.

Global Temporary Tables

Global temporary tables differ from local ones mainly in their scope and visibility. They are accessible by all sessions, with no security limitations whatsoever. Any session can even drop the table. So when you design your application, you should factor in security and consider whether you really want temporary tables or just permanent ones. You create global temporary tables by prefixing their names with two number signs (##), and like local temporary tables, they are created in tempdb. However, because global temporary tables are accessible to all sessions, you cannot create multiple ones with the same name; neither in the same session nor across sessions. So typical scenarios for using global temporary tables are when you want to share temporary data among sessions and don’t care about security.

Unlike local temporary tables, global ones persist until the creating session—not the creating level—terminates. For example, if you create such a table in a stored procedure and the stored procedure goes out of scope, the table is not destroyed. SQL Server will automatically attempt to drop the table when the creating session terminates, all statements issued against it from other sessions finish, and any locks they hold are released.

I’ll walk you through a simple example to demonstrate the accessibility and termination of a global temporary table. Open two connections to SQL Server (call them Connection 1 and Connection 2). In Connection 1, create and populate the table ##T1:

CREATE TABLE ##T1(col1 INT);
INSERT INTO ##T1 VALUES(1);

In Connection 2, open a transaction and modify the table:

BEGIN TRAN
UPDATE ##T1 SET col1 = col1 + 1;

Then close Connection 1. If not for the open transaction that still holds locks against the table, SQL Server would have dropped the table at this point. However, because Connection 2 still holds locks against the table, it’s not dropped yet. Next, in Connection 2, query the table and commit the transaction:

SELECT * FROM ##T1;
COMMIT

At this point, SQL Server drops the table because no active statements are accessing it, and no locks are held against it. If you try to query it again from any session, you will get an error saying that the table doesn’t exist:

SELECT * FROM ##T1;

In one special case you might want to have a global temporary table available but not owned by any session. In this case, it will always exist, regardless of which sessions are open or closed, and eliminated only if someone explicitly drops it. To achieve this, you create the table within a procedure, and mark the stored procedure with the startup procedure option. SQL Server invokes a startup procedure every time it starts. Furthermore, SQL Server always maintains a reference counter greater than zero for a global temporary table created within a startup procedure. This ensures that SQL Server will not attempt to drop it automatically.

Here’s some sample code that creates a startup procedure called CreateGlobals, which in turn creates a global temporary table called ##Globals.

After restarting SQL Server, the global temporary table will be created automatically and persist until someone explicitly drops it. To test the procedure, restart SQL Server and then run the following code:

You probably guessed already that ##Globals is a shared global temporary table where you can logically maintain cross-session global variables. This can be useful, for example, when you need to maintain temporary counters or other “variables” that are globally accessible by all sessions. The preceding code creates a new global variable called var1, initializes it with the character string ‘abc’, and queries the table. Here’s the output of this code: