Thursday, February 27, 2014

This is not a new topic. If you search, you will surely find many posts on this, mostly with traditional techniques but not using newest capabilities such as SQL Windowing. Since I wrote a post on Analysis Services for the same, thought to write the same on T-SQL too. Here is the way of calculating running totals using Window components and functions which provides an efficient way of calculating and simple code structure.

The following query shows the way of calculating. The first code creates a window based on SalesOrderId (which is unique) and get the running totals over SubTotal for a given year. The second code creates a window on OrderDate (which is not unique). This will show the totals for the date instead of Running-Totals for the date unless the range is specified using boundaries. That is the reason for adding upper and lower boundaries using ROW, UNBOUNEDPRECEDING and CURRENT ROW inside the window for restricting rows to be participated for the calculation.

-- Window based on OrderId which is unique

SELECTOrderDate,SalesOrderID,SalesOrderNumber,CustomerID

,SubTotalTotal

,SUM(SubTotal)OVER(ORDERBYSalesOrderID)RunningTotal

FROMSales.SalesOrderHeader

ORDERBYOrderDate,SalesOrderID

-- Window based on OrderDate which is not unique

SELECTOrderDate,SalesOrderID,SalesOrderNumber,CustomerID

,SubTotalTotal

,SUM(SubTotal)OVER(ORDERBYOrderDate

ROWSBETWEENUNBOUNDEDPRECEDINGANDCURRENTROW)RunningTotal

FROMSales.SalesOrderHeader

ORDERBYOrderDate,SalesOrderID

Both queries produce same result;

Here is a comparison on the same using traditional techniques. Though it shows that the query uses window is faster than other queries, always check and pick the best.

Wednesday, February 26, 2014

Calculating running totals against a numeric is not an easy task with both relational databases using T-SQL and multidimensional databases using MDX. In a way, calculation on relational databases is little bit easier using new window functions than using multi-dimensional queries. But it does not mean that the calculation using MDX is much more complex and difficult to write. However it depends on the familiarity. Comparatively we do not write MDX statements as much as we write T-SQLs. That is the reason for tagging “complex” on MDX, that is the reason for spending much time even on a simple query than the time spend with T-SQL for similar implementations.

I had to write a similar query today for calculating running totals mixing with YTD and previous year values. As usual I searched for best practices, there were many posts on this based on either calendar year or financial year but could not find a specific one for calculating running totals for a given period. Wrote it, and thought to share it. Here is the MDX, it is written on AdventureWorksDW. You may find it interesting and helpful.

Sunday, February 23, 2014

SQL Server Management Studio is an integrated management, development and querying application that we use for working with SQL Server instances and databases. It is based on on Visual Studio shell and it is the key application for both developers and administrators for working with SQL Server.

Just like other Microsoft applications, Management Studio allows you to adjust/customize the environment as you wish. It offers many settings, mostly unknown to us, that can be used for making the application more user-friendly and more flexible. Thought to explore some of important ones via number of notes. Here is the first one, this note explores settings listed under Text Editing for T-SQL .

The general options related to T-SQL can be found under Tools Menu –> Options menu –> Text Editor –> Transact-SQL –> General;

There are 10 options that can be set under T-SQL –> General. However some of them are not related to T-SQL. Let’s explore one by one.

Auto list membersSelecting this option lists columns, functions, tables, etc. out on a pop-up menu based on the statement you write, making intelliSense enabled. By selecting the relevant item, the code can be completed without typing the whole word avoiding mistakes caused by misspelling and increasing the typing time.

Hide advanced membersI believe that this option is not applicable for SQL Server but yet to be confirmed. As per BOL, this hides members marked as “advanced” limiting items loaded to the pop-up. This option is disabled when “Auto list members” is cleared or no members are marked as “advanced”.

Parameter informationIf this option is selected, the complete syntax of current declaration or procedure is displayed with its parameters. The parameter which is bold shows the one needs to be set as the next parameter.

Enable virtual spaceSelecting this makes the position of cursor consistence with all the lines in the code regardless of the length of the line. By default, this is disable. Therefore the position of the cursor is not consistence when moving up and down. Have a look on below image;

Now, if the cursor is moved down, it will be positioned to column 31.

If the option is selected, position of the cursor will be remained in same column.

The reason for this is, when the option is selected, tabs or spaces are automatically added to complete the line.

Word wrapThis makes the entire line you have typed visible in viewable editor area even though it has extended beyond the area horizontally.

Editor with option cleared.

Editor with option selected.

Show visual glyphs for word wrapThis option comes as a sub option of “Word wrap” hence it is only enabled when “Word wrap” is selected. Selecting this makes a glyph (a graphical symbol that shows a returned-arrow) appeared on wrapped lines indicating that the lines are wrapped.

Apply Cut or Copy commands to blank line when there is no selection This setting allows us to cut or copy blank lines and paste without selecting anything. Look at the below image. It has a blank line and the cursor is positioned in it. Now press Ctrl+C for copying;

If the option is selected, Ctrl+C will copy the blank line and Ctrl+V will insert a new blank line.

Line numbersSelecting this options displays line numbers for each line;

Enable single-click URL navigationSelecting this option makes URL in the editor clickable for opening the web page. If the option is cleared, there will be no change on the cursor when passing over the URL but if it is selected, URL will be shown as a hyperlink and can be click on it while holding the Ctrl key. The first image shows the editor with the option cleared and the second shows with the option selected.

Navigation barThis option is for getting all objects and procedures displayed in drop-downs at the top of the editor for easy navigation. However this option is not enabled for SQL Server.

Friday, February 21, 2014

How can we access other rows in a set while accessing one particular row? In other words, can we access values in other rows other than current row while the current row is being processed? There were no built-in functions for supporting this functionality with previous versions of SQL Server but there were many ways of getting the required result generated. One common way was linking the same table to itself either using as a derived table or CTE. Microsoft SQL Server offers four offset functions for supporting this requirement. Here is note on it;

SQL Server 2012 Offset Functions: LAG, LEAD, FIRST_VALUE, LAST_VALUEOffset functions allow to access values located in other rows while accessing the current row;

Function

Description

LAG

LAG works on window partition and window order clauses. It allows to access a value of a row at a certain offset from the current row which appears before the current row based on the order specified. It accepts three parameters; value (or column) which needs to be returned, offset as optional (1 is default), and default value to be returned in case of no row at the specified offset (null is default).

LEAD

LEAD works on same manner, just like LAG. Only different is, while LAG is looking for records before the current row, LEAD is looking for records after the current row.

FIRST_VALUE

This allows to access values from the first row in the window frame. The first value of the first row is accessed with a window frame extent ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

LAST_VALUE

This allows to access values from the last row in the windows frame. The extent ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING should be used with LAST_VALUE.

Here is an example for offset functions;

USEAdventureWorks2012

GO

-- Create a view for forming a set that contains sales amount for years and months.

Monday, February 17, 2014

Not expecting duplicates but a significant improvement in performance, we tend to use UNION ALL when no duplicates are guaranteed in combining two sets. The reason is, UNION ALL does not performing an additional task for filtering (or removing) duplicates, hence gives a better performance than UNION. However, the purpose of UNION ALL is not for improving the performance but for producing the result with duplicates if exist. While this was being taught during my classes, a thought came into my mind: Why other set operators such as INTERSECT and EXCEPT do not offer the same built-in functionality like INTERSECT ALL and EXCEPT all?

A usual quick search did not give me the required answer but proved that it is not available as a built-in functionality. There were few possible solutions but the one given by T-SQL expert Itzik Ben-Gan was quite interesting. Here is a small note on INTERSECT and EXCEPT and implementation suggested by Itzik Ben-Gan for INTERSECT ALL and EXCEPT ALL.

INTERSECT and INTERSECT ALLINTERSECT allows us to retrieve only rows that are available in both two sets. This removes duplicates if found. It basically behaves as DISTINCT INTERSECT. If duplicates to be included to the result, a workaround is required and it can be easily done with ROW_NUMBER ranking function. All required columns need to be added under PARTITION BY clause and SELECT < constant> needs to be added for ODER BY clause for instructing SQL Server to not consider the order of the window. This generates aN unique number of all records, making them available in the final resultset of INTERSECT. Here is a sample code for this;

USEtempdb

GO

-- creating table 1

CREATETABLEdbo.InternetSales

(

SaleDatedatetimeNOTNULL

,Productvarchar(100)NOTNULL

,AmountmoneyNOTNULL

)

GO

-- creating table 2

CREATETABLEdbo.ResellerSales

(

SaleDatedatetimeNOTNULL

,Productvarchar(100)NOTNULL

,AmountmoneyNOTNULL

)

GO

-- inserting values for table 1

INSERTINTOdbo.InternetSales

(SaleDate,Product,Amount)

VALUES

('01/01/2014','A', 100)

,('02/01/2014','B', 100)

,('03/01/2014','B', 100)

,('04/01/2014','B', 100)

,('05/01/2014','C', 100)

,('05/01/2014','C', 100)

-- inserting values for table 2

INSERTINTOdbo.ResellerSales

(SaleDate,Product,Amount)

VALUES

('01/01/2014','A', 100)

,('02/01/2014','B', 100)

,('03/01/2014','B', 100)

,('04/01/2014','D', 100)

,('05/01/2014','D', 100)

-- checking INTERSECT, will return only A, B

-- If duplicates were included, should return A, B, B

SELECTProductFROMdbo.InternetSales

INTERSECT

SELECTProductFROMdbo.ResellerSales

-- Adding ROW_NUMBER for creating unique numbers

-- And using CTE for removing the number

-- This returns A, B, B

WITHcte

AS

(

SELECT

ROW_NUMBER()OVER (PARTITIONBYProductORDERBY (SELECT 0))ASnumber

,Product

FROMdbo.InternetSales

INTERSECT

SELECT

ROW_NUMBER()OVER (PARTITIONBYProductORDERBY (SELECT 0))ASnumber

,Product

FROMdbo.ResellerSales

)

SELECTProductFROMcte;

EXCEPT and EXCEPT ALLEXCEPT returns all distinct rows found in “left” (or first) set that are not found in “right” (or second) set. This discards duplicates too. If duplicates are required, same workaround can be applied. Here is the code for it;

Sunday, February 16, 2014

Finding records exist only in one set that do not exist in another is a common requirement in database developments. Finding customers who have not placed orders, finding products that have not been purchased by any customers are general examples for it. There are many difference ways of obtaining the required record set and the mentioned methods/operators in the title are the commonly used ones for achieving this. However which gives the better performance is questionable, hence let’s analyze them and see.

Let’s try to get all products that have not been purchased by customers from AdventureWorks database. Have a look on all three SELECTs. They return product ids from Production.Product table which are not exist in Sales.SalesOrderDetail table.

USEAdventureWorks2012

GO

-- Finding unsold products using NOT IN

SELECTp.ProductID

FROMProduction.Productp

WHEREp.ProductIDNOTIN(SELECTProductIDfromSales.SalesOrderDetail)

ORDERBY 1

-- Finding unsold products using LEFT OUTER JOIN

SELECTp.ProductID

FROMProduction.Productp

LEFTOUTERJOINSales.SalesOrderDetaild

ONp.ProductID=d.ProductID

WHEREd.ProductIDISNULL

ORDERBY 1

-- Finding unsold products using EXCEPT

SELECTp.ProductID

FROMProduction.Productp

EXCEPT

SELECTProductID

FROMSales.SalesOrderDetail

ORDERBY 1

All three produce the same result;

Let’s analyze the execution plans of all three. Note the “Query Cost – Relative to the batch” too.

As you see, plans for NOT IN and EXCEPT are same and the performance of them are good but LEFT OUTER is different. LEFT OUTER has used “Merge Join” whereas other two have used “Nested Loops” which is low cost join. In this scenario, LEFT OUTER does not offer much benefits but NOT IN and EXCEPT give better performance.

However, this behavior is not guaranteed with all scenario hence we cannot conclude that NOT IN and EXCEPT provide better performance rather than LEFT OUTER. This is totally depend on factors such as index availability and number of records. Therefore, best way is, trying with all ways and pick the best for the situation. In addition to the mentioned methods, there are few more popular ways such as NOT EXISTS and OUTER APPLY. All these can be used for retrieving the required result, however, as mentioned above, best way can be determined only by trying the same with all the ways.

Saturday, February 15, 2014

Processing rows in one set using another set is a common coding pattern used mostly with combining or comparing rows from sets. SQL Server offers three set operators: UNION, INTERSECT and EXCEPT, for handling scenario which compares rows from one set to another and completes the return set. In some specific cases, an alternative operator which is APPLY can be used for handling similar scenario. Here is post on APPLY operator;

The APPLY operator not exactly a set operator. It is a table operator which evaluates rows in one set based on an expression set with another set, NOT combining two sets in similar manner used by other set operators but like a JOIN. It is used with FROM clause and just like JOINs, two sets are marked as “left” set and “right” set. The “right” set is always either a table-valued function or a derived table which gets processed for each row returning from the “left” set. The syntax for APPLY is as follows;

CROSS APPLYCROSS APPLY processes the “right” set for each row found in the “left” set in a similar CROSS-JOIN manner. However, if an empty result is generated by the “right” set for the correlated row given by “left”, the row will NOT be included in the resultset, in a similar INNER-JOIN fashion. Here is an example for CROSS APPLY.

USEAdventureWorks2012

GO

-- This returns all products

-- There are 504 products

SELECTProductID,Name

FROMProduction.Product

ORDERBYProductID

GO

-- Create a table-valued function that returns top orders related to given product

-- this does not return any records as there are no order for the product id 1

SELECT*FROMdbo.GetTopOrdersForTheProduct(1)

-- this returns records as there are orders for the product 707

SELECT*FROMdbo.GetTopOrdersForTheProduct(707)

-- Joining SELECT with TVF using CROSS APPLY

-- This does not return products like 1, 2

SELECTProductID,Name,o.SalesOrderNumber,o.OrderDate,o.OrderAmount

FROMProduction.Product

CROSSAPPLY

dbo.GetTopOrdersForTheProduct(ProductID)o

ORDERBYProductID

OUTER APPLY The behavior of OUTER APPLY is same as CROSS APPLY except one which is the only difference between CROSS APPLY and OUTER APPLY. In the presence of an empty result from “right” set, CROSS APPLY excludes the row found in “left” from the returned result but OUTER APPLY includes it. This behavior is conceptually similar to LEFT OUTER JOIN. Here is the code describing it;

Friday, February 14, 2014

A Sub query is a SELECT statement that is embedded to another query. Or in other words, a SELECT statement that is nested to another SELECT. Or in a simplest way, it is a query within a query. This posts speaks about types related and terms used with sub queries.

What are Inner queries and Outer queries?Once a SELECT is written within another SELECT, the one written inside becomes the INNER QUERY. The query that holds the inner query is called as OUTER QUERY. See the below query. The query refers the Sales.Customer table is the Outer Query. The query written on Sales.SalesOrderHeader is the Inner Query.

SELECT*

FROMSales.Customer

WHERECustomerID=(SELECTTOP (1)

CustomerID

FROMSales.SalesOrderHeader

ORDERBYSubTotalDESC)

What are Scalar, Multi-valued and Table-valued Sub Queries? Sub queries can be categorized based on their return type. If the query returns a single value, it becomes a scalar sub query. Scalar sub query behaves as an expression for the outer query and it can be used with clauses like SELECT and WHERE. Sub query produces NULL value if the result of it is empty and equality operators (=, != , <, etc.) are used with predicates when they are used with WHERE clauses.

Multi-valued sub query still return a single column but it may produce multiple values (can be considered as multiple records) for the column. This is mainly used with IN predicate and based on matching values, predicate either returns TRUE or FALSE.

Sub query always has an outer query which it is nested with. If the sub query is completely independent and do not require any input from outer query, it is called as a Self-Contained Sub Query. Self-Contained sub query is evaluated once for the outer query and result is used with all records produced by outer query.

Correlated sub query is a query that requires an input from its outer query. This sub query is fully dependent on the outer query and cannot be executed without required attributes from outer query. This behavior increases the cost of the execution of the sub query as it needs to be executed for each of row of outer query.

Monday, February 10, 2014

SQL Server offers many number of built-in functions for getting commonly required operations done, minimizing the time and resources need to implement them for handling business logics. These functions can be categorized with various classifications. One way of categorizing them is, understanding their scope of input and return type. Under this categorization, they can be organized into four groups: Scalar, Grouped Aggregate, Window, and Rowset. The organization of functions under this categorization makes some of functions member of more than one group.

Here is a small note on this categorization.

Scalar FunctionsThese functions accept zero or one or more than one value (or a row), process them as per its intended functionalities, and return a single value. These functions can be further categorized as string, conversion, mathematical, etc.. Some of these functions are deterministic and some are nondeterministic. (Deterministic functions always return same result any time they are called with same input and same database state. Nondeterministic functions may return different result each time they are called with same input and same database state) Here are some examples for Scalar Functions;

-- accepts no arguments

SELECTGETDATE()

-- accepts one argument

SELECTLEN('Hello World')

-- accepts multiple argument

SELECTSTUFF('accepts one argument', 9, 3,'multiple')

Grouped Aggregate FunctionsThese functions perform their intended operations on a set of rows defined in a GROUP BY clause and return a single value. If GROUP BY is not provided, all rows are considered as one set and operation is performed on all rows. All aggregate functions are deterministic and ignore NULLs except the COUNT(*).

-- Aggregate functions without GROUP BY

SELECTCOUNT(*),MIN(ListPrice),MAX(ListPrice)

FROMProduction.Product

-- Aggregate functions with GROUP BY

SELECTYEAR(OrderDate)OrderYear,SUM(SubTotal)Total

FROMSales.SalesOrderHeader

GROUPBYYEAR(OrderDate)

Window FunctionsThe operation of Window Functions are bit different from other functions. These functions produce a scalar value based on a calculation done on a subset of the main recordset. The subset used for producing the result is called as the Window. A different order can be defined for the window for performing the calculation without affecting the order of input rows or output rows. In addition to that, partitioning the window is also allowed.

Window is specified using OVER clause with its specification. SQL Server offers number of Window function for handling ranking, aggregation and offset comparisons between rows.

Here are few example on Window Functions;

-- This uses SUM over Territory subsets (window)

SELECTt.Territory,t.OrderYear,t.Total

,SUM(t.Total)OVER (PARTITIONBYMONTH(t.Territory))TotalByTerritory

FROM (SELECTTerritoryIdTerritory,YEAR(OrderDate)OrderYear,SUM(SubTotal)Total

FROMSales.SalesOrderHeader

GROUPBYTerritoryId,YEAR(OrderDate))ast

ORDERBY 1, 2

-- This uses ROW_NUMBER over all records

SELECTSalesOrderID,SubTotal

,ROW_NUMBER()OVER (ORDERBYSubTotal)ASOrderValuePosition

FROMSales.SalesOrderHeader

WHEREYEAR(OrderDate)= 2008

ORDERBYSalesOrderID

Rowset FunctionsRowset functions accept input parameters and return objects that can be used as tables in TSQL statements. SQL Server offers four Rowset functions (OPENDATASOURCE, OPENQUERY, OPENROWSET, OPENXML) and they are nondeterministic functions.

Search This Blog

About Me

Dinesh Priyankara (MSc IT) is an MVP – Data Platform (Microsoft Most Valuable Professional) in Sri Lanka with 16 years’ experience in various aspects of database technologies including business intelligence. He has done many local and foreign business intelligence implementations and has worked as a subject matter expert on various database and business intelligence related projects. He is the Founder/Principal Architect of dinesQL (Pvt) Ltd and he consults, teaches and runs training programs on data related solutions and subjects.