Monday, December 21, 2009

I recently had a requirement to capture periodic snapshots of an incident/activity type OLTP table into a fact table. I already had an accumulating snapshot type fact table where I was capturing the same facts on a daily basis. But, in addition to this, the client wanted the ability to perform period over period (weekly, monthly, quarterly, yearly) reporting.

My approach was simple enough and in case anyone was thinking about an overly complicated approach I figured I'd share . . .

After my daily load of the fact table I check to see if snapshot data exists for the prior week, month, quarter, and year. If any of these snapshots do not exist I simply do an INSERT selecting all data from the accumulating snapshot table and adding the As-Of date along with the level of periodicity (W,M,Q,Y).

I used 4 T-SQL if statements to determine if the pior period data exists and if it doesn't I call the INSERT.

Below are the statements used to capture the pior period as-of dates in an INT key format. The same format you would get converting a date to a VARCHAR using the 112 style (e.g. 20091231 for December 31st, 2009). I'll leave it up to you to apply this logic to your own INSERT statements.

--Previous end of monthDECLARE @EndOfMonthKey AS INTSET @EndOfMonthKey = (SELECT CAST(CONVERT(VARCHAR,DATEADD(d,-1,CAST(CAST(MONTH(GETDATE()) AS VARCHAR) + '/1/' + CAST(YEAR(GETDATE()) AS VARCHAR) AS DATETIME)),112) AS INT))

--Previous end of quarterDECLARE @EndOfQuarterKey AS INTSET @EndOfQuarterKey = (SELECT CASE WHEN MONTH(GETDATE()) IN (1,2,3) THEN CAST(YEAR(GETDATE()) - 1 AS VARCHAR) + '1231' WHEN MONTH(GETDATE()) IN (4,5,6) THEN CAST(YEAR(GETDATE()) AS VARCHAR) + '0331' WHEN MONTH(GETDATE()) IN (7,8,9) THEN CAST(YEAR(GETDATE()) AS VARCHAR) + '0630' WHEN MONTH(GETDATE()) IN (10,11,12) THEN CAST(YEAR(GETDATE()) AS VARCHAR) + '0930'END)

I was recently tasked with creating a relational model in Cognos Framework Manager and got a little confused when deciding what kind of objects to use. I found myself asking if I should use a Query Subject or a Shortcut. Furthermore, it seemed there had been some changes to the Shortcut functionality since I had last used Cognos. I found the following that made it rather clear which object(s) I should use when creating my model.

Taken from the Cognos documentation . . .

The main difference between model objects and shortcuts is that model objects give you the freedom to include or exclude items and to rename them. You may choose to use model objects instead of shortcuts if you need to limit the query items included or to change the names of items.

Shortcuts are less flexible from a presentation perspective than model objects, but they require much less maintenance because they are automatically updated when the target object is updated. If maintenance is a key concern and there is no need to customize the appearance of the query subject, use shortcuts.

Framework Manager has two types of shortcuts:
● regular shortcuts, which are a simple reference to the target object.

● alias shortcuts, which behave as if they were a copy of the original object with completely independent behavior.

Alias shortcuts are available only for query subjects and dimensions. Regular shortcuts are typically used as conformed dimensions with star schema groups, creating multiple references with the exact same name and appearance in multiple places. Alias shortcuts are typically used in role-playing dimensions or shared tables.

Being able to specify the behavior of shortcuts is new to Cognos 8.3. When you open a model from a previous release, the Shortcut Processing governor is set to Automatic. When Automatic is used, shortcuts work the same as in previous releases, that is, a shortcut that exists in the same folder as its target behaves as an alias, or independent instance, whereas a shortcut existing elsewhere in the model behaves as a reference to the original. To take advantage of the Treat As property, it is recommended that you verify the model and, when repairing, change the governor to Explicit. The repair operation changes all shortcuts to the correct value from the Treat As property based on the rules followed by the Automatic setting, this means that there should be no change in behavior of your model unless you choose to make one or more changes to the Treat As properties of your shortcuts.

When you create a new model, the Shortcut Processing governor is always set to Explicit. When the governor is set to Explicit, the shortcut behavior is taken from the Treat As property and you have complete control over how shortcuts behave without being concerned about where in the model they are located.

Monday, December 14, 2009

Ever get an Hresult error code and have no idea what it meant or how to figure out what it's related to?

The SSIS run time engine and the data flow engine are written in native code. Therefore, many errors are returned as Hresults. In case your unfamiliar with what an Hresult is, below is a Wikipedia definition for Hresult.

In the field of computer programming, the HRESULT is a data type used in many types of Microsoft technology. HRESULTs are used as function parameters and return values to describe errors and warnings in a program.

An HRESULT value has 32 bits divided into three fields: a severity code, a facility code, and an error code. The severity code indicates whether the return value represents information, warning, or error. The facility code identifies the area of the system responsible for the error. The error code is a unique number that is assigned to represent the exception. Each exception is mapped to a distinct HRESULT.

Tuesday, December 8, 2009

While questions like this don't have black or white answers there is some simple guidance here . . .

You probably want to change it from the default configuration of 2,147,483,647 MB which is over 2 terabytes. I believe the idea here is to allow SQL Server to use as much memory as it wants. That's all well and good until it begins to starve the OS of memory which will in turn bring SQL Server to its knees.

There are implications related to OS editions, SQL Server editions, 32 versus 64 bit, AWE, etc. I won't get into that here but I will say that it's a good idea to set a maximum value and leave at least a half of a GB and if possible 1+ GB for the operating system.

For example, if you're on a budget and running 32 bit Windows Server 2003 Enterprise with 4GB of RAM (the max ram allowed on this OS) I would suggest setting this value to somewhere between 3,072 – 3,584 MB (3 - 3.5 GB).

Monday, December 7, 2009

It's a rare day that I concede defeat but today was one of those rare days. I threw in the towel on migrating existing Cognos Report Studio reports from a Framework Manager model that used a SQL Server relational data source to a new model based on a SQL Server Analysis Services 2005 data source.

There were two major reasons:

1. The inability to make any modifications to a Framework Manager model using objects from the SSAS cubes. This includes the creation of filters based on an SSAS cube.

A couple of years ago I took on a job to create a reporting solution for a large CRM implementation at a financial services firm. What I found was that the CRM solution that they were having customized had a plug-in module for Cognos-based "analytics". What this meant was that they had a Cognos model built on top of their OLTP system and some reports to go with it. Additionally, they provided an authentication module that allowed for users that were authenticated on the CRM system (using a combination of AD and forms based login) to have their credentials passed on to this CRM application. This was important because some of the users were on a different domain and were not able to authenticate using AD. I tried to sell them on creating a custom authentication provider for SSRS and scrap Cognos. Not because I refuse to work with non-Microsoft technology but because Cognos didn't seem like the right solution for the problem (I won't get into that here).

I couldn't sell them so I proceeded to build Cognos reports using Report Studio, against a Framework Manager model that was on top of the live OLTP database. The OLTP schema was complex, the model was buggy and full of gaps. It was not pretty. In the end we had 30+ reports. Some of which were very complex and written against a very complex and cumbersome model. The queries being produced by Cognos were unbelievably complex and at times rather difficult to tune and troubleshoot.

Fast forward a couple of years and the maker of the CRM software is now using SQL Server as their reporting and analytics platform and my client is struggling with supporting their existing Cognos implementation as they don't quite have the expertise to extend (modify existing and create new reports) or tune their solution and they would like to incorporate cubes.

Initially I thought I would build them a data mart. From that I would create a relational Cognos model as well as build out an SSAS database and cube(s). Then I thought, wouldn't it be great if I could simply build the cubes and have Framework Manager, Report Studio, and Analysis Studio use the cube(s) instead of having to maintain both the cubes and the relational model.

I got as far as building the mart, ETL, configuring an SSAS 2005 data source and creating a model on the SSAS cube. Then I began to port the existing reports. Well these reports used some filters in complex filters. I attempted to recreate these filters in the new model. No can do. It simply isn't supported. Then I started playing around with basic reports using the new model. At times the report was unresponsive. I then began looking around the message boards only to find that there were quite a number of people attempting to resolve performance issues with Cognos 8.3 and Analysis Services cubes with no positive responses. This was the point at which I threw in the towel.

So tomorrow I move on. I have some nice cubes created that will be used with Excel 2007 and hopefully a little bit with Analysis Studio for those that can't authenticate using AD. But, I'll have to define a relational model for the report conversion process.

An alternative would be to implement SSRS and recreate all the reports as well as create a custom authentication extension but that would be too costly at this time. Another alternative would be to create Cognos Transformer cubes but the client does not have a license for this product and do not want to pay for it, nor do they have the expertise to support it.

It's not an ideal solution (a relational FM model and AS cubes) and will require additional maintenance (two metadata models) but it will provide a much higher performing system with analytics capabilities and a much simpler and higher performing model from which to write reports.

Sunday, December 6, 2009

Durrr . . . sometimes I discover something so simple all I can say is "durrrrrr". I've been working with SSAS for over 2 years now and never knew that you could process more than one SSAS object at a time using SQL Server Management Studio. In case anyone else is as dense as me . . .

Simply open the Object Explorer Details window if it's not already open. Click on the folder containing the objects to be processes. Select multiple objects in the browser and initiate processing. That simple. I'm not sure why SSMS doesn't allow the selection of multiple objects in the Object Explorer tree control.

I was on my way back from the kitchen at my client's site when I passed the office of one of their developers. We made momentary eye contact and I could see that look of desperation. She immediately called out my name and said something like, "Dave! I need your help. You're smart." I can't resist that kind of flattery so I immediately put down my yogurt and pulled up a chair.

It seemed like a simple problem. She just wanted to take part of a string from a larger one - a substring. Below are examples of the complete strings. They were created from some kind of a flat file extract from a legacy system. Assume the column name is PRODVAL.

She simply wanted the first part of the string - up to the "space". I don't recall her logic at this point but it used the T-SQL SUBSTRING and PATINDEX functions.

I began to re-write her query. I came up with the following.
SELECT SUBSTRING(PRODVAL, 1, CHARINDEX(' ', PRODVAL) - 1) FROM table

Well, it didn't work. To troubleshoot the problem I rewrote the above query to as follows.
SELECT CHARINDEX(' ', PRODVAL) FROM table

The query returned a whole long list of zeros. Something was obviously wrong. I could see the space. So why couldn't the CHRINDEX function? Next step, figure out what the "space" character actually is. The next query helped determine that.
SELECT ASCII(SUBSTRING(PRODVAL,6,1) FROM table

The first row in the table had a PRODVAL value with a "space" in the 6th position so the above query returned the ASCII value of this 6th character. It returned a 9 - Tab.

Ok great. So now I knew we were looking for a tab, not a space. So, going back to our original query, we can modify it to use the CHAR function to look for the tab.
SELECT SUBSTRING(PRODVAL, 1, CHARINDEX(CHAR(9), PRODVAL) - 1) FROM table

Voila! Using the ASCII function we identified the mysterious character and then used the CHAR function to convert the ASCII code for Tab (9) to a character so that we could use it in a SUBSTRING function.

Friday, December 4, 2009

Is it just me or is there still a bit of a lack of an overall SQL Server BI ecosystem? Don't get me wrong, I am a big proponent of SQL Server BI stack and there are a lot of nice integration points but I think there is still a lot of work to be done to tie it all together.

One of my biggest gripes is related to where to look when there seems to be an issue with a BI solution that uses SSIS, SSAS, and SSRS. And what if you just want to know if things have been humming along. We can implement all the email notifications in the world but what if your email system is unreliable (as it was with a recent client). Hence the question, "Is everything OK?"

I'm pretty well versed in exception handling and logging and can probably set my brain to autopilot and find an error message in a log on one of 10 servers without actually thinking about it but what about everyone else? There are other stakeholders such as more junior developers or even senior level people that are unfamiliar with SQL Server. How about the business? An analyst might see something fishy in the data and want to know if last night's load was successful. Do they have to call me or someone else on the development team? In many cases yes. But why? That kind of a dependency is unhealthy.

That's where the SQL Server BI Health Monitor Dashboard comes in. The SQL Server Health what? Don't fret if you haven't heard of this. I'd be suprised, no maybe disturbed, if you had. The SQL Server Health Monitor Dashboard was something I recently put together for a client that was unable to obtain a seasoned SQL Server BI developer but needed to continue to support the solution I built for them without having to rely on getting a hold of me.

The dashboard uses all out-of-the-box data sources including some system tables, the SSIS log table created by the SQL Server log provider, and the SSRS tables found in the ReportServer database to tie together a majority of the data needed to determine where in the SQL Server BI ecosystem your problem lives thus helping fix the problem that much faster.

I've posted the source code on Codeplex. You can find it @ http://bihealthdashboard.codeplex.com/. It's not exactly plug and play at this point. Connection strings need to be modifed and you might have to create the DimDate table if you don't already have a table of dates accessible. It's also geared towards a BI solution that has a single SQL Server Agent job step that calls a master SSIS package that in turn calls all other SSIS packages and a single SQL Server Agent job step that process your SSAS database. If that doesn't sound like your system, the solution can still be valuable in that it shows you how to extract very useful process metadata from your SQL Server BI Solution.

Wednesday, December 2, 2009

I know, I know . . . this isn't a Cognos blog. I apologize. However, I do think this is a good little tid-bit and it can also be applied to SQL Server based on some (very high level) similarities in their architecture.

The Cognos scheduler is not extremely robust (nor is SQL Job Agent at times) and even if it was, there are times when it is appropriate to run a job using an enterprise scheduler application such as AppWorx or Tidal Enterprise Scheduler.

By exposing most functionality through web services, Cognos (and SSRS) allow you do to things such as execute a report. So, if your scheduler can make web services calls or call executables (wrap the web service calls in a console app) you can bypass the Cognos scheduler and leverage your enterprise scheduler by running the report directly (or through a proxy app such as a .NET exe wrapped web service) by your scheduler of choice.

Below is an example of this using .NET and Cognos.

using System;

using System.IO;
using System.Web.Services.Protocols;
using cognosdotnet_2_0;
using System.Configuration;

// Set up a new biBusHeader for the "logon" action.
jobService.biBusHeaderValue = new biBusHeader();
jobService.biBusHeaderValue.CAM = new CAM();
jobService.biBusHeaderValue.CAM.action = "logonAs";
jobService.biBusHeaderValue.hdrSession = new hdrSession();

Monday, November 30, 2009

While I typically stick with Microsoft products there are times that I must drift. I'm currently working with a client that is using Cognos for their metadata and presentation (reporting and analysis) layers. I ran into an issue this morning while trying to upgrade their version of Cognos. Unfortunately there is no automated upgrade process between the version they have and the one they want to upgrade to. This meant uninstalling the old and installing the new. In the process I could not start the Cognos service because the previous version already had a service registered with the same name. I tried uninstalling and reintsalling a number of times but to no avail. Eventually I decided to manually delete the service. I did this with the following commands (entered at a command prompt):

This command lists all the registered services on the machinesc query state= all findstr "SERVICE_NAME"

This command deletes the servicesc delete service_name

If necessary you can also delete the service from the registry. You should be able to find the key in the following location:HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services

Sunday, November 29, 2009

It's very easy to leverage the existing Microsoft.ReportingServices.Rendering.CsvRenderer.CsvReport,Microsoft.ReportingServices.CsvRendering type to "create" a new rendering option in SSRS that uses a delimiter other than a comma. Below is an example of adding a "CSV" renderer that uses a pipe ( | ).

Simply add the following to the Rendersection of your rsreportserver.config file to add the option for rendering a pipe delimited file.

<Extension Name="PIPE" Type="Microsoft.ReportingServices.Rendering.CsvRenderer.CsvReport,Microsoft.ReportingServices.CsvRendering">
<OverrideNames>
<Name Language="en-US">CSV (pipe delimited)</Name>
</OverrideNames>
<Configuration>
<DeviceInfo>
<FieldDelimiter>|</FieldDelimiter>
<NoHeader>false</NoHeader>
</DeviceInfo>
</Configuration>
</Extension>
Note the FieldDelimiter tag. This is where you can set your delimiter to the character of your choice.

Be sure to include the OverrideNames tag but feel free to change the value. Once you have completed the change, restart the Reporting Services service to reload the newly modified config file.

Thursday, October 8, 2009

Many unsofisticated and/or underfunded organizations have people executing ad hoc queries against production SQL Server instances with little or no oversight. I am currently in an organization where employees on the operations team as well as the marketing and business intelligence teams have the ability to execute queries via SQL Server Management Studio, directly against production OLTP databases. If a query takes an hour to run, so be it. They need their data and they will wait. These users know nothing about locks, blocking, resource contention, IO bottlenecks, etc and therefore have no idea what problems they might be causing.

Unfortunately, the default behavior for SQL Server Management Studio is to default each connection's Isolation Level to Read Committed. This can cause blocking under a number of different scenarios. I won't detail them here but believe me, it can be a problem.
However, this problem can be avoided. SSMS provides the ability to set the default Isolation Level for all queries. The option can be found by navigating to Tools -> Options. From there, Query Execution -> SQL Server -> Advanced.

I suggest setting the TRANSACTION ISOLATION LEVEL to READ UNCOMMITTED and the DEADLOCK_PRIORITY to Low. Setting the DEADLOCK_PRIORITY to Low will mean that, in the case of a deadlock, the SSMS query will most likely be aborted. Also note the LOCK TIMEOUT value of 30,000 milliseconds. This is the equivalent of 30 seconds and means that if an SSMS query encounters a lock for more than 30 seconds, the query will be aborted.
Note that you can also set the Isolation Level for the active connection by issuing the SET TRANSACTION ISOLATION LEVEL command. For example:SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

This command can also be used in stored procedures instead of using NOLOCK hints on all tables.

**Keep in mind that using this transaction isolation level can lead to inconsistent data in your result set since you can read uncommitted data.

Below are the descriptions of the different Isolation Levels from SQL Server BOL.

READ COMMITTEDSpecifies that shared locks are held while the data is being read to avoid dirty reads, but the data can be changed before the end of the transaction, resulting in nonrepeatable reads or phantom data. This option is the SQL Server default.

READ UNCOMMITTEDImplements dirty read, or isolation level 0 locking, which means that no shared locks are issued and no exclusive locks are honored. When this option is set, it is possible to read uncommitted or dirty data; values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction. This option has the same effect as setting NOLOCK on all tables in all SELECT statements in a transaction. This is the least restrictive of the four isolation levels.

REPEATABLE READLocks are placed on all data that is used in a query, preventing other users from updating the data, but new phantom rows can be inserted into the data set by another user and are included in later reads in the current transaction. Because concurrency is lower than the default isolation level, use this option only when necessary.

SERIALIZABLEPlaces a range lock on the data set, preventing other users from updating or inserting rows into the data set until the transaction is complete. This is the most restrictive of the four isolation levels. Because concurrency is lower, use this option only when necessary. This option has the same effect as setting HOLDLOCK on all tables in all SELECT statements in a transaction.

Wednesday, October 7, 2009

I was recently struggling with a couple of linked servers. Every time I tried testing them or running a query that referenced the linked server I received the following error:Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON
The environment I was operating in had an overly complex network configuration that I won't begin to attempt to describe. Anyhow, I was convinced that this had something to do with Kerberos authentication. It turns out it does although not the way I had originally thought.

The SQL Server had recently been migrated to a new domain. In the process, an SPN (Service Principal Name) had not been modified accordingly. By using setspn -L servername I was able to see that the SPN referenced the old domain name.

Thursday, September 3, 2009

In SSRS for SQL Server 2005, the matrix component does not allow for out of the box row group column headers. The default behavior is for the matrix to have one cell that spans all row group columns.

Often, when we have only one row group it is obvious to the end user what the data represents (e.g. store name, state, etc.). However, when we we have multiple row groups, it is less obvious what is represented in each column. Hence, the need for column headers. Don't fret. It is possible to create column headers.

Here's how . . .
In the empty cell above the rows add a Rectangle and size it to fit the entire cell. Now add Textboxes for each column you want a header for. Size the Textboxes to be aligned to your data cells. You now have placeholders for your column header text. The key to making this work is properly sizing your columns and headers and setting the CanGrow and CanShrink properties to false.

Tuesday, August 25, 2009

Prior to SQL Server 2005, the value in the TOP clause was limited to literal values which meant that variables could not be used. Hence, the number of rows returned could not be variable/dynamic. Below is an example:

SELECT TOP 10 * FROM table1

New to SQL Server 2005 is the ability to use a variable in the TOP clause.

DECLARE @top INTSET @top = 10SELECT TOP (@top) *FROM table1
This is pretty useful when there is a requirement to have a user determine the number of rows returned in a result set. Note that the variable must be between parantheses.

The book starts by describing what the Kimball group has determined to be the best way to execute the requirements gathering phase, it then moves on to designing the dimensional model including the physical relational data warehouse, creating the ETL infrastructure (SSIS), an OLAP database (SSAS), creating standard reports (SSRS), dealing with partitioning, security, maintenance, etc.

Someone with a few years of SQL Server experience can build a full-blown, mid-sized BI solution with this book alone.

The book also has a great, although not physically attractive, companion website. Check out the tools section for some invaluable tools including a dimensional modeling spreadsheet that is perfect for the first few iterations of the relational data warehouse. While not extremely fancy it will generate quality "create" scripts for the initial development phase of your project.

I recently conducted technical interviews for one of my clients to help find a mid-level SQL Server BI developer. Below are the questions I came up with for what tended to be 30-60 minute telephone interviews. The answers were written in such a way that a non-technical person could follow along and gauge whether or not the candidate was on the right track.

Data warehousing/Dimensional Modeling
1. Q: What is the Kimball Methodology/Dimensional Modeling?
A: Dimensional modeling always uses the concepts of facts, AKA 'measures', and dimensions. Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts.

2. Q: What is meant by level of granularity in a fact table?
A: At what level of detail are we storing data? We can store sales data by store by day by product or by day by region by product or by receipt item. Receipt item level would be the finest level of grain and is the favored approach assuming there are the resources (disk, processing power, etc.) to support it.

3. Q: Explain the concept of Conformed Dimensions.
A: At the most basic level, a conformed dimension means the exact same thing with every possible fact table to which it is joined. The earn site dimension table is related to both the earn reward and burn reward table. This way, we can consistently look at both the earn and burn data as it relates to earns site information.

4. Q: What is a 1) degenerate dimension 2) a junk dimension and provide an example of each?A1: A degenerate dimension is a single attribute dimension typically consisting of a transaction level identifier that does not warrant its own relational table or OLAP dimension. Examples are PO number, order number, invoice number, etc.
A2: A junk dimension is a way of organizing miscellaneous, mostly unrelated indicator/flag type attributes that would not otherwise belong to a dimension. It’s a method of reducing the clutter that can result from having many small and/or single attribute dimensions. Junk dimensions typically consist of the Cartesian product of all values. A junk dimension could include such values as order status, new customer indicator, shipping method, etc.

5. Q: What is a star-schema and why do we use it for data warehousing?
A: A star schema has a 3rd normal form fact table at the center with related 2nd normal form dimension tables with primary keys pointing back to foreign keys in the fact table. We use the star-schema because queries against it are simple and highly efficient.

6. Q: What are 1) non-additive facts 2) semi-additive facts and provide an example of each?
A: Non-additive facts are measures in a fact table that cannot be added. Any average would be an example of this. Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. Account balance can be added up across all accounts but not over time.

7. Q: What is a slowly changing dimension and what are the different types?
A: A slowly changing dimension is one in which the attributes change over time.
Type 1 – Only one record is stored. Old data is overwritten with new data.
Type 2 – Tracks historical data by adding new records each time an attribute changes.
Type 3 – Tracks history via additional columns within a single record.

8. Q: What is a surrogate key and why do we use them?
A: It’s a substitute for a natural/business key and is typically an integer. In SQL Server we often use an identity column of type INT or BIGINT. We use them because we can’t rely on the business key as they can change or overlap between multiple source systems and they are the most efficient in terms of joins.

General Relational Database
9. Q: Have you designed and built OLTP (transactional) and/or OLAP (analytical/reporting) databases. Have you used any data modeling tools? How many tables did these databases consist of?
A: A qualified candidate will have designed and built numerous OLTP and OLAP database. They might mention ERWin or Rational. While having used data modeling tool is not very important, if they have, it reinforces the fact that they have performed database design work. They should have designed some databases with 20+ tables.

10. Q: What size databases (MB/GB/TB) have you worked with?
A: They shouldn’t have to think too hard about this. Someone that can’t come up with an answer immediately has probably never “owned” a database. A qualified candidate should have worked with at least a 100GB database. 10GB+ could be acceptable. Not good if they’re talking in MBs.

11. Q: What is 3rd normal form and what is the main benefit of having a normalized database? Why would you want to denormalize?
A: Every non-key column must provide a fact about the key, the whole key, and nothing but the key (all columns are directly dependent on the primary key). Having a normalized database increases data integrity. Denormalization can be used to increase performance.

13. Q: What is an outer join? How is it different from an inner join?
A: It’s a join that includes all the rows from one table and only those rows from the other that have corresponding join values. An inner join only includes those rows that have corresponding values in both tables of the join.

Microsoft SQL Server
15. Q: What are the differences between clustered and non-clustered indexes and how many of each can you have on a table?
A: The leaf node of a clustered index contains the actual data pages. It determines the physical order in which the table data is stored. The row locator in a clustered index is the clustered index key. There can be only one clustered index on a table. In a non-clustered index the row locator is a pointer to the row. There can be 249 non-clustered indexes on a table.

16. Q: What are Included Columns when referring to a non-clustered index in SQL Server and why would you use them?
A: Included Columns are non-key columns added to the leaf level of a non-clustered index. This functionality allows us to store data alongside the index’s key column(s). We use this feature to increase performance by decreasing disk I/O because we don’t need to access the clustered index if all columns requested in a query are present as either key columns or Included Columns. When all columns are present in an index it is referred to as a covering index.

17. Q: What tools can be used to performance tune a SQL Server database and how do you use them?
A: Tools include Profiler, Database Engine Tuning Advisor (DETA), Query Plans in SQL Server Management Studio, Dynamic Management Views (DMVs), Windows System Monitor/Performance Counters.
Use Profiler to identify problem queries as well as locking and blocking. Use profiler to generate a trace file to be analyzed by DETA to provide suggested indexes and statistics.

18. Q: What does the ISNULL function do and what is the syntax?
A:When applied to a column, if the value of the current row is null it returns the specified value. ISNULL(mycolumn,’NO VALUE’)

19. Q: How would you create a pivot/crosstab using on SQL?
A: Common table expressions (CTE), the SQL PIVOT operator, CASE statements

20. Q: Have you had any experience with partitioning or replication and if so, describe?
A: Any exposure to these technologies is useful. No need to dive too deep.

SSIS (SQL Server Integration Services)
21. Q: What are the different methods available for providing configuration information to SSIS packages and which do you consider to be the best and why?
A: XML file, Environment Variable, SQL Server, registry, parent package variable. XML files don’t work for package promotion through environments (dev, test, prod). Preferred approach is to use an environment variable that points to a configuration database and then use SQL Server configuration for all other configuration values.

22. Q: What is your approach to debugging SSIS packages?
A: Inserting breakpoints in the control flow, turning on detailed logging, using Profiler

23. Q: How do you version your SSIS packages?
A: There are version properties/attributes on SSIS packages that can be set by the developer.
Another valid answer would be by using a source control tool.

24. Q: Name the transformations you use most in SSIS.
A: Should be able to quickly name some of the following: lookup, merge join, union all, multicast, derived column, aggregate, OLE DB command, pivot, unpivot, slowly changing dimension

25. Q: How have you deployed packages?
A: Packages can be deployed to SQL Server or the file system using Visual Studio, an installation utility generated by Visual Studio, and by simply copying and pasting files. Generating an install utility and copy/paste are the preferred methods.

SSRS (SQL Server Reporting Services)
26. Q: What kinds of data sources have you used when writing reports and which do you prefer and why?
A: Stored procedures, SQL directly in report, MDX directly in report

27. Q: How do you implement cascading parameters in SSRS? (A cascading parameter is one that drives the values of another dependent parameter e.g. country -> state/province)
A: Include the parent parameter (country) in the WHERE clause of the child parameter’s (state/province) query.

28. Q: How have you made your SSRS reports available to users?
A: There is an out of the box report portal referred to as Report Manager. This is what Override uses. SharePoint and custom applications can also be used to host reports.

29. Q: Have you had any experience with Report Builder and Report Builder models?
A: Any experience is a plus. If they really have worked with it they’ll probably say that Report Builder was not a great tool.

During an interview today I asked a candidate if she had experience using SSIS Checkpoint functionality. Although I haven't confirmed what she told me, I suspect that it is true. She said that she has experienced performance issues related to using Checkpoint files in packages containing loop containers. My guess is that SSIS tracks the value of the variable being used for the loop counter in the checkpoint file. Writing a value to disk for every iteration in a loop probably has very significant performance effects.

I was just troubleshooting a long-running query that happened to be an UPDATE statement and found that a Clustered Index Update operation was a good portion of the total query execution plan. Unfortunately, the table I'm updating has a composite primary key that is also the clustered index. However, the statement was not updating any columns in the key.

So why would there be a Clustered Index Update operation? Simply put, the clustered index is the table. What I mean by this is that table data is stored alongside the clustered key with this key dictating the order in which the rows are stored. Hence, any update to a table is an update to its clustered index.

Also, keep in mind that a clustered index update that affects the clustered key also has to update the all non-clustered indexes since all non clustered indexes reference the clustered key. However, this is not applicable to the above scenario as my UPDATE statement was not updating the clustered index/key.