Create table as select (CTAS) with “not null” column

CTAS (Create Table As Select) is a common way on the Parallel Data Warehouse (PDW) to transform one table into another table for example for calculations or for ELT (Extract Load Transform) processes.

The general syntax is quite simple and explained in the PDW help file:

CREATETABLE[ database_name .[ dbo ].| dbo.] table_name

[({ column_name }[,...n])]

WITH(

DISTRIBUTION ={HASH( distribution_column_name )|REPLICATE}

[,<CTAS_table_option>[,...n]]

)

AS<select_statement>

[;]

<CTAS_table_option>::=

LOCATION = USER_DB

|CLUSTEREDCOLUMNSTOREINDEX

|CLUSTEREDINDEX({ index_column_name [ASC|DESC]}[,...n])

|PARTITION( partition_column_name RANGE[LEFT|RIGHT

FORVALUES([ boundary_value [,...n]]))]

<select_statement>::=

[WITH<common_table_expression>[,...n]]

SELECT<select_criteria>

For example, a simple CTAS statement to copy the contents of one table (FactSales) to another table (FactSalesTmp) may look like this:

createtable FactSalesTmp

with(distribution =hash(DateKey))

asselect*from FactSales

As you can see from the CTAS syntax definition above it’s not possible specify column properties like NULL/NOT NULL or constraints. In the help file there is a note about this:

You cannot specify NULL | NOT NULL for the columns in the CTAS statement; the nullability property is derived from the columns and expressions in the SELECT results.

However, if you’re using CTAS to fill an intermediate stage table used for partition switching operations the switch out table has to have exactly the same definition as the target table and this also means that the NULL/NOT NULL setting on the column has to be identical. So how is the nullability property derived from the expressions in the select results? To show this behavior, I’m using a simple fact table with the following definition:

createtable FactSales (

DateKey intnotnull

, ProductKey intnotnull

, StoreKey intnotnull

, Quantity intnotnull

, Amount decimal(13,4)notnull

, Costs decimal(13,4)

)with(distribution =hash(DateKey))

Using the simple CTAS statement from above (the one copying FactSales to FactSalesTmp) preserves the nullability of all columns. As long as you refer to existing table columns, the nullability is preserved from that columns.

But what about calculations? Let’s try the following CTAS statement:

createtable FactSalesTmp

with(distribution =hash(DateKey))

asselect*

, Amount*0.8AS StandardCosts

from FactSales

I simply added another column here using a simple calculation. You can check the resulting table structure by choosing the context menu ‘View Code’ in Data Tools or by running the following query:

What you see, is that the calculation (StandardCosts) is understood to be nullable by the PDW. How can we mark this column as not null the CTAS statement?

Rewriting the CTAS from above using coalesce doesn’t solve the problem:

createtable FactSalesTmp

with(distribution =hash(DateKey))

asselect*

,coalesce(Amount*0.8,0)AS StandardCosts

from FactSales

However, using isnull does the trick:

createtable FactSalesTmp

with(distribution =hash(DateKey))

asselect*

,isnull(Amount*0.8,0)AS StandardCosts

from FactSales

The reason for this behavior is the different handling of the data type for coalesce and isnull. So, if you want to have an expression being marked as ‘not null’ in a CTAS statement, use the isnull-function.