Answers by Dmitry Kharlap (aka Docker)

In the 3rd and 4th query N'objectid' is a string in Unicode format as it’s preceded with N character. Because column data is non-Unicode SQL Server has to apply an implicit conversion to convert data to the same type as N'objectid'. In this case SQL Server applies strict data-type precedence and as varchar has lower precedence than nvarchar the data in table columns will be converted to Unicode. This adds additional processing time and affects query execution plans as described in the answer to the next question provided below.

colname1 has Windows collation. In this case the character set order for varchar is a subset of Unicode. In this case SQL Server can apply so called range seek and still use an index (so we still see Index Seek in query execution plan). However it has some overhead comparing to an Index Seek used in queries 1 and 2.

colname2 has SQL collation. In this case character set order is considerably different from Unicode. That’s why name2_ix index can not be used anymore to seek as roughly speaking index is an ordered list of colname2 values (with “pointers” to table rows) and this order in index is different from the one in SQL_Latin1_General_CP1_CI_AS.

The answer to this question is in logical processing of the query. According to "Inside SQL Server 2005 - T-SQL Querying" by Itzik Ben-Gan the order is:

FROM

ON

OUTER (join)

WHERE

GROUP BY

CUBE | ROLLUP

HAVING

SELECT

DISTINCT

ORDER BY

TOP

As we can see WHERE is executed before SELECT where all aliases are defined. That’s why we cannot use aliases in WHERE because when WHERE clause is evaluated, the column value may not yet have been determined.

It’s necessary to add though that the physical order may be different from logical due to SQL Server optimization. As an example it’s TOP and ORDER BY co-execution mentioned by Pinal Dave in his blog. However the optimizer will make sure that the result will be the same as if we process query in the logical order.

Not using aliases in WHERE clause is absolutely logical, so it’s also forbidden in other database engines, for example MySQL or Oracle

By default, the Lookup transformation treats rows without matching entries as errors. So there are two options to configure a Lookup Transformation to return rows when the lookup operation fails to find a match:

1.Configure Lookup Transformation to redirect such rows to a separate output

a. Configure the Lookup transformation to redirect such rows to a no match output.

On General page of the Lookup Transformation Editor dialog box select set an option "Specify how to handle rows with no matching entries" to "Redirect rows to no match output"

b. Configure the Lookup transformation to redirect such rows to an error output.

On Error Output page of the Lookup Transformation Editor dialog box set an "Error" option to "Redirect the rows to an error output."

2. Configure Lookup Transformation to return matching and non-matching rows to the same output

On Error Output page of the Lookup Transformation Editor dialog box set an "Error" option to "Ignore the failure and direct the rows to an output."

These are very basic requirements, so in real life you’ll have to clear few things with a domain expert. However even right now we can spot some problems.

Some items are not serving the purpose of modeling

Almost every time creating software system is creating a model of some domain. Ideally, we would prefer to have a single, unified model. While this is a noble goal, in reality it’s either impossible or too costly for non-trivial domains. It is more useful to recognize this fact and to build your model to serve the required purpose. Definitely the model should be build with future extension in mind and predicting future has never been an easy task. But it makes a good architect great. The main goal of the desired model is the following:

“…A major part of our desired solution is to get the ability to calculate our costs per customer versus the amount they spend so we can focus on customers that are higher profit so we want cost information per oz of product and the cost of packaging…”

All of the following is not necessary for the goal.

sales.salesOrder, salesOrderNumber column is not required, besides in some systems it could be just calculated based on salesorderid

sales.customerproduct, packagingNotes column is not required

sales.salesorderlineitem, lineItemNumber is not required2.

sales.customer, name column is not explicitly required, besides we have a customer number. But it can be useful for our goal. So, it’s a question to a domain expert.

sales.product, name column is not explicitly required. But it could be useful for our goal. So, it’s a question to a domain expert.

sales.salesorder, orderdate column is not explicitly required. But it could be useful for our goal. So, it’s a question to a domain expert.

Some of them could be useful when we expand model in the future (e.g. salesOrderNumber) or for a “minor part” not mentioned in the requirements, so we might consider adding them after we talk to a domain expert.

Based on assumptions

Some items are created on assumptions not clearly stated in requirements. While it’s the most general solution that would apply for many of the systems, clearing those requirements could simplify the model.

Assumed that customer can order multiple items per order. If this is not true then SaleLineItem table is not be needed.

Assumed that customer orders the same product with the same name and packaging quite often. If this is not true then CustomerProduct table is not needed and all data can be moved to SaleLineItem.

Let’s assume it’s correct as is.

Incorrect or missing definitions

All columns in all tables (except primary keys) are nullable. That’s definitely has to be revised.

sales.product

Name is not unique by requirements.
But if this field is required (read
the beginning of the post) it makes
sense. This is a question to a
domain expert.

sales.customer

1.customerNumber definition is not correct as NNANNNN is a format for
customer number, so:

type should be char(7)

constraint should be either [0-9][0-9][a-zA-Z][0-9][0-9][0-9][0-9] or we need to make sure column has a case insensitive collation.

sales.customerproduct

Constraint AKsalescustomerProductCustomerIdProductId is not correct. The same item can be sold few times with different names.

Constraint AKsalescustomerProductCustomerIdProductName
is not correct. There is no
requirement that customer always buy
product with the same name and
packaging. It can be normal pack and
family pack for the same product.
But I would ask a domain expert.

productPerOunceCost column should be
moved to sales.Product table as
“…Current cost information can be
used for all profit projections…”.

packagingCostPerUnit column might be
moved to sales.product if packaging
cost does not depend on a size of a
unit. This is a question to a domain
expert.

sales.salesOrder

timeShipped (type: time or datetime)
and datePaid (type: date) should be
added

customerid missing foreign key

constraint AKsalessalesOrderCustomerIdSalesMonth is not correct as “…though there are with exceptions requiring a special form to be filed”

SalesMonth field is not required as
per item 3.

sales.salesOrderLineItem

soldToPricePerUnit column. There is
nothing in requirements stating a
price is negotiated for every sale
and can be different for the same
customerproduct. If it’s always the
same then this column should be
moved to sales.customerproduct. This
is a question to a domain expert.

The answer to your question lays in understanding of how a query plan is being executed.

Every icon in a graphical execution plan represents a fundamental building block that is called operator or iterator. All operators implement the same set of methods:

Open - tells an operator to prepare
to produce output rows

GetRow - requests that an operator
produce a new output row

Close - indicates that the
operator's parent is through
requesting rows

Because of this architecture operators are independent of one another and can be easily combined in many different ways. The arrow heads on the connecting lines indicate the direction of data flow; however operator execution order is quite different. When SQL Server executes a query plan, control flows down the query tree. Query execution always begins with the root operator. That is, SQL Server calls the methods Open, GetRow of the operator at the root of the query tree. If there are children of root operator then the root operator calls Open and then when necessary GetRow and Close of each of its children. If there are children of children of the root operator then Open, GetRow and Close of each of them are called by its parent and so on and so on. So when a query plan is being executed these methods propagate down through the tree to the leaf operators. Data flows or more accurately is pulled up the tree when one operator calls another operator's GetRow method.

So, the answer to you question is... A query execution plan is presented in a "seemingly backward manner" because it shows actual operator execution order (i.e. from left to right) while arrows represent the data flow between the operators.

There are few excellent sources that explain query execution process in details:

The biggest limitation of SQL Server Destination adapter is you cannot use it to access a SQL Server database on a remote server. So even if currently your destination database is on the same server as your package, but it can change in the future you need to consider using OLE DB Destination instead.

The reason behind it is SQL Server Destination adapter creates an in-memory object in the memory space of the Destination server (according to BOL to be able to do it the user executing an SSIS package requires the "Create global objects" permission to be set in Windows). This is why it has to be run on the same server as the destination, as BULK INSERT will not be able to access the memory space of a remote machine.

As for physical query processing… As you can see per Pinal’s blog (ORDER BY / TOP sample) physically they can be executed in different order (I believe mostly for optimization purposes). However thinking logically first we need to filter out rows, then to aggregate data from the remaining rows and filter aggregated results. So, I believe WHERE clause will be executed first, HAVING will be executed after WHERE.

It’s interesting that logical order also clearly shows that if you can apply the same filtering condition either in WHERE or HAVING then you’d better do it in WHERE to reduce number of rows processed by GROUP BY, CUBE | ROLLUP and HAVING.

Besides… to have WHERE and HAVING is not enough anymore. Itzik Ben-Gan submitted a suggestion to SQL Server team to introduce one more clause QUALIFY that would be executed after SELECT (i.e. after HAVING). You can see the details of his request here

2.

Similarities

Both WHERE and HAVING are used to filter the result

Both WHERE and HAVING use the same syntax for comparison

Both WHERE and HAVING can operate on variables and constants

Differences

WHERE can operate on column values from one row. HAVING can operate on column value aggregates and values of columns specified in GROUP BY clause.

HAVING can be used only with the SELECT statement

The text, image, and ntext data types cannot be used in a HAVING clause

WHERE is executed as the 4th step in logical query processing, HAVING is executed as the 7th step (GROUP BY and CUBE | ROLLUP are in between)

HAVING forces the grouping functionality when GROUP BY is not specified

The reason for this behavior is... In the first query SQL Server verifies if UNOIN is a reserved keyword. Because it’s not SQL Server syntax parser assumes it to be a table alias for @t and processes SELECT Dates FROM @t1 statement as a separate query.

In the second query SQL Server does the same, however as UNOIN is considered to be a table alias SQL Server syntax parser cannot process ALL statement and throws a syntax error.

The described behavior is specific to T-SQL though (maybe some other implementations too) and would not be possible if T-SQL follows ANSI SQL standard related to semicolon character that marks an end of a statement. Here is a quote from ANSI SQL 2003 standard, part 2 (semicolon was introduced in ANSI SQL 92):