As you can see, SQL Server split all records across your CPUs and as a result you still have your records in the Delta Store.
There are 2 ways to overcome that situation:
1. Make batch size bigger than number of CPUs used by your SQL Server multiplied by 102,400. In my case that number is 4 x 102,400 = 409,600;
2. Lover degree of parallelism. Yes if you want to load data into smaller chunks the process will be slower.
So, lets test the second scenario. Delete all records again from "tbl_Test_Columnstore" table and run following modified script:Script #5:

Sunday, June 19, 2016

In SQL Server 2016 Columstore Indexes are really matured and it is really time to start using them.
In order to use something you have to have a simple guidance, a map, which will show you The Big Picture.

And here is the little "pocket" guidance with all major highlights:

Columnstore Indexes in SQL Server 2016.

Description

Columnstore Indexes (CSI) are data structures, which aggregate stored data not by Rows, but by Columns.They are not very good for Index Seek operations (OLTP), but very good for Index Scans (OLAP).CSI are modifiable, but CRUD operations are behave differently:

Bulk Insert: When inserting more than 102,400 rows data go directly to Columnstore;

Insert: When number of inserted rows is less than 102,400, new rows will be inserted they into a Delta Storage;

Delete: Rows are not actually deleted from the CSI, they are just marked to be deleted;

Update: Rows are not actually updated. Updated rows are marked as deleted and then new modified rows are inserted into Delta Storage;

Categorizing

Updatable CSI can be Clustered and Non-Clustered;

CSI can be built on a Disk-Based table and on In-Memory optimized table.

Main Features

It can be Filtered;

Can include all or just specified columns;

For Disk-Based tables it can have Compression Delay (in Minutes). Good for OLTP, might decrease fragmentation.

Max degree of parallelism can be control for the index usage;

CSI can be partitioned;

“COLUMNSTORE” is Default compression level. Old data can be over compressed by “COLUMNSTORE_ARCHIVE”;

What is "Sparsing"?
"Sparsing" is the way SQL Server optimizes spacing for NULL values at the cost of overhead for non-NULL values.
In other words, if you expect having in your column more nulls than not nulls you can SPARSE that column to optimize the space.

I've seen the situations when a lot of columns in Data Mart tables were almost completely filled with NULLS and I started wondering if "SPARSE" can be a good tool to gain some space.

I've tested it on a small table:

CREATETABLE tbl_TestSparse(ID INTIDENTITY(1,1)

CONSTRAINT PK_ID PRIMARYKEYWITH (FILLFACTOR=100)

, Sparsed_Column INT SPARSE NULL);

I inserted million rows into that table and started my testing by changing "Sparsed_Column" type and percentage of nulls in that column. Here are results in Megabytes I've got:

As you can see, SPARSE become profitable only for NVARCHAR(10) column when 50% of it's values were NULLs. I'd argue on this gain. If you know you'll have too many nulls in text columns - make them VARCHAR/NVARCHAR because as you probably noticed, those data types are handled by SQL Server so good that SPARSE does not any positive impact on them.

The only case of usage for SPARSE I see for really large tables where fixed size columns, such as INT, BIGINT, MONEY, DATETIME are mostly having NULL values.

Would say our table had Billion records, then by Sparsing empty INT column we could potentially save 3 GB - 3.5 GB. It might be not very big percentage from the entire table, but still pretty big number from the Disk Space perspective.

If you are not sure how many NULLs you'll have in a column, do not use "SPARSE". It will add the complexity and might negatively impact performance in the future.

In any case, if you see a column, where 99% values are NULLs it could be the sign of bad design or bad data.

Both Inserts A and B have the very same result and produce the same size data, but surprisingly, even though Approach 3.A. looks uglier and requires Index Rebuild it works much faster than Approach 3.B. because of Window function "ROW_NUMBER" and associated with it spill in tempDB for Sorting operation.

Thursday, June 16, 2016

Being on SSIS presentation recently, I've realized that a lot of people, who are working with SSIS for years, still do not know what "Parameterizing" is and how to do it.

SSIS has been changed a lot in SQL Server 2012, where Microsoft announced "Project Deployment Model". Since then you can deploy Project, and you can assign Parameters to that project, which can be passed to it for execution. Before that, developers used Configurations to supply values for internal variables and connections.

For this test I've created one database "Project" scoped connection "AdventureWorks2016", which has to be available for all packages within my project, and "Package" scoped variable "TestVariable":

Parameterizing Connection

1. Do a right click on "AdventureWorks2016" connection and "Parameterize..."

2. Then you'll see "Parameterize" window. There you can change name of your new parameter or even not create a new parameter, but assign an existing one. You can also choose which property of your connection you want to parameterize. By default it offers "ConnectionString".

4. To see your newly created parameter just double click on "Project.Params" in "Solution Explorer":

It will open new "Project.Params" window with our new parameter:

ParameterizingVariable

That procedure is even easier.
1. In your opened "Project.Params" window click "New" icon:

2. Rename it to "TestParameter", change type to "String", and assign "Test Value":

3. Then switch back to your "Package" window and click on "Expression Builder" button for "TestVariable" in the "Variable" window:

4. You'll have "Expression Builder" window opened:

5. Expand "Variables and Parameters" section and "Drag-n-Drop" "TestParameter" to the Expression small window. Please note the difference between Variables and Parameters: Parameters have "$Project" signature.

6. Than click OK and we will see the parameterized value for our "TestVariable":