My Links

Advertisement

News

Welcome to my weblog. My name is Jeff Smith, I am software developer in Boston, MA and I was recently named a 2009 SQL Server MVP. Check in frequently for tips, tricks, commentary and ideas on SQL Server and .NET programming.

Little Monster -- don't forget, the client isn't doing any summarization at all; only fully summarized rows are returned to the client which simply pivots them.

For example, if you have a 10,000 row transaction table which will be summarized into data for 10 clients over 12 months, only 120 fully aggregated rows should be returned to the client which are then pivoted into the final 10 by the client-side code. A T-SQL pivot would indeed only return 10 rows (with 11 more columns of course), but at the huge expense (as shown) of having SQL Server do all the work and forcing presentation code into your database layer.

So the key point is that the client is not doing any pivot calculations or aggregating at all; it is simply changing the format of data that is returned. Bandwith isn't really a consideration, since we are working with raw *data* in its native, summarized form.

>>The above code inserts the data in a datatable. How do i insert it into an excel sheet?

You don't. Excel has pivoting capabilities built in. Otherwise, it is an exercise on taking a DataTable and exporting it to Excel which isn't really the scope of this post. If there is a popular need for this, let me know.

Jeff,
Thanks for the code above. I'm writing a SQL Mobile app and need to transform row into cols to display in a datgrid. Only challenge I found with your example is that it gives a cartesian product. (e.g. 10 rows x 8 pivot cols always gives 80 rows....even some of the rows actually 'match'). So, I hacked a version to use the KeyColumn passed in and update the row if the key already existed. Hope you don't mind. Please pass along if this helps anyone else. (Also tweaked your version slightly based on having a key). Hasn't been extensively tested but gives the expected result. One other note, I did use 2.0 DataTable object but could easily be modified to not need it....
Tim
---------------------------
public static DataTable Pivot(IDataReader dataValues, string keyColumn, string pivotNameColumn, string pivotValueColumn)
{
DataTable tmp = new DataTable();
DataRow r;
int i, pValIndex, pNameIndex;
string s;

// now, fill up the table with the data
while (dataValues.Read())
{
// assign the pivot values to the proper column; add new columns if needed:
s = dataValues[pNameIndex].ToString();
if (!tmp.Columns.Contains(s))
tmp.Columns.Add(s, dataValues.GetFieldType(pValIndex));

Thanks for the code -- of course I don't mind at all if anyone makes modifications or takes the general idea I've presented and improves upon it. The whole point isn't really to provide a function that is perfect, but to show how easy and more efficient it is to have your presentation layer do formatting and NOT yoru database layer.

I am not sure, however, what you mean by the code "generates a cartesian product". Remember, as written, you should passed in a fully summarized and sorted datareader -- grouped by the columns you are pivotting, so that for a particular value that row or column shouldn't already exist in
your data. Remember, the key idea is that SQL does the summarizing, sorting and calculations, and we are just taking the result and formatting it into columns -- we shouldn't need to do any summarizing or sorting or checks to see if more than 1 value needs to be aggregated for a particular set of key values since that should have been done already by SQL.

Double-check the article for more info on this. If you can provide some sample data and the SELECT statement you are using and what is happening specifically using the original function, it will be much easier to determine what might be going wrong but right not I am not 100% sure what exactly is going on based on how you described the situation.

Sorry for my ambiguity. Good points about letting the query do any summarizing and grouping. I'm basically trying to build an Excel like-list in the datagrid. My basic row element is a company and I'm trying to show a list of all or select categories and whether, for the select companies, those categories are appropriate. A company can have multiple categories but only but each CompanyID ~ CategoryID should be unique.
Tables:
tblCompany, tblCategory, and tblCompanyCategories.
---------------------
The other hitch is that I want to show not just matching categories but all or select ones. So some form of outer join is necessary. I then want to transform the result into a list of unique companies, catgories now as columns, and a true or false if a particular category is relevant. Here's the query -- one of the ugliest I have ever written but can't think of a better way due to SQL Server Mobile's lack of support for SPROCs.
SQL:
SELECT tblCompany.CompanyID, tblCompany.Company, tblCompany.Ticker, CONVERT(bit, tblCategory.CategoryID) AS HasCategory, tblCategory.Category
FROM tblCompany INNER JOIN tblCompanyCategories ON tblCompany.CompanyID = tblCompanyCategories.CompanyID
LEFT OUTER JOIN tblCategory ON tblCompanyCategories.CategoryID = tblCategory.CategoryID
WHERE ( tblCompanyCategories.CategoryID IN (1, 2, 3, 4, 5, 6, 13, 14, 15, 16, 17)
AND tblCompany.CompanyID IN (SELECT CompanyID FROM tblCompanyCategories AS tblTemp
WHERE CategoryID = ?)) ORDER BY tblCategory.SortOrder
---------------------
The '?' is a shorthand notation for a parameter that SQL Mobile does thankfully support.

Now, when I pivot this using the standard approach (as I'm most people normally require). I get a unque row for each set of companies and categories. So, if I selected categoryid = 1, I'd have one row where perhaps it matched, but another row for each other category...in addition to the categories for each already holding a column place. I'm sure there are technical terms for these concepts but I don't know them. I think your default approach provides a single column transform whereas I was looking for a multi-column. i.e.

CompanyID Col1 Col2 Col3
1 x
1 x
1 x

vs.

CompanyID Col1 Col2 Col3
1 x x x

...or...maybe I just did it wrong! Either way, please don't mistake me. I have you to thank for getting me started. It took very little effort, using your example, to tweak it to what I wanted. I onle posted my example because I have seen people asking in a lot of different forums how to do it. You gave the best, cleanest example, and I thought I had a way of doing a multi-col pivot that I have also seen others asking about.

It appears the problem is that you have not sorted your resultset by CompanyID. To quote from the article:

>>All you need to do is open up an IDataReader object that returns the data you wish to Pivot. The data should be fully summarized by SQL Server already -- i.e., you should GROUP BY your pivot column plus any other columns you wish to return, and you should aggregate your pivot value accordingly. The DataReader should also be sorted so that all rows which will be pivoted into one are sorted together.

>>For example, consider the Orders table in the Northwind database. Suppose you want to display CompanyNames as rows, ProductNames as columns, and the total quantity of orders per CompanyName per ProductName in your crosstab. The DataReader's recordset should be grouped by CompanyName and ProductName, it should include a SUM(Qty) calculation, and it should be sorted by CompanyName. (see the example for this exact scenario using Northwind).

Again, it may pay to re-read the article one more time and double-check all parts of the example using northwind (i.e., look at the tables, the view that I use, the SELECT statement, and so on). It does seems that adding proper grouping and a sort to your SQL is all you would need to make the existing function work perfectly.

Uncle. I re-ran it, after re-reading the article and sorting on my key of CompanyID and then used the Queryperformancecounter from kernel32 using p/Invoke to time the runs. Your version runs many times faster. I'd give a better ratio but for some reason the timer doesn't give me a reliable answer. My return datatable as something like 152 rows, using the same categoryid, which sort to 47 unique companies. On your algorthym the first run is usually on the order of 3 ms. And subquent runs are at or below 1 ms. My version reports, using the same timer algorthym, something like 800000 ms....but I say it isn't reliable because that'd be like 13 mins. but there's actually no perceptible difference in performance in terms of responsiveness. Seems to be a glitch in the Queryperformancecounter. But I don't don't yours is faster. Of course, this is on the desktop. I'll also test them both in Compact Framework to see if I can't get at least a better metric on mine....

Ok...Last word on this, promise. Just tested on CF and realized why I was getting a bad result from my testing. I was forgetting to set the start ref using the test on my version. After repeated testing the two approaches seem run about the same length on the desktop. Avg. of 2-4 ms. On CF they both take a bit longer but are also comparable. Its been fun....In case anyone is interested, I'm using an excellent example by Dan Fox and Jon Box for the timer that I found at MSDN. The link is at:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnnetcomp/html/netcfperf.asp

This was designed for CF but you can simpley replace 'coredll.dll' import reference to 'kernel32.dll' to use it from the desktop.

As per the requirement of Pivot the keycolumn has to be sorted, where as incase of Month in the pivotcolumn the result can not be availed in ascending order. Please let me know if you have any solution to sort the pivotcolumn as well.

Your problem is the one of ordering things by month names or other date formats. Don't do this. Order your results by VALUES -- i.e., month numbers or actual datetime datatypes.

that is, don't do this:

select ...
from ...
order by MonthName(SomeDate)

this returns a VARCHAR, which puts December before January (D comes before J). Makes sense, right? SQL doesn't know or care if your varchar strings are the names of months, it will order them alphabetically.

always do this:

select ...
from ...
order by Month(SomeDate)

Here, the Month() function returns a numeric value from 1-12 for each month. This orders correctly. In your SELECT portion, feel free to return the monthname (or better yet, derive + format it and display it at the client layer, NOT in SQL) but do not sort by month names.

And, of course, whever you do ORDER or GROUP by a month, make sure you GROUP and/or ORDER by a Year as well if your data spans multiple years otherwise Jan2004 gets lumped in with Jan2005.

Thanks for the routine. I made a couple small changes. First, I changed this to accept a reference to an existing DataTable, instead of a DataReader. Second, I added the following if statment to the code below to accept null data points in the pivotNameColumn: if (s.Length > 0)

The reason I needed to support null data points in the pivotNameColumn is b/c the query to populate the source DataTable used a left outer join for the name and value columns, so my source DataTable and destination DataTable looked like this:

// now, fill up the table with the data: foreach (DataRow row in dataValues.Rows) { // see if we need to start a new row if (row[keyColumn].ToString() != LastKey) { // if this isn't the very first row, we need to add the last one to the table if (!FirstRow) { tmp.Rows.Add(r); } r = tmp.NewRow(); FirstRow = false;

You are saying that formatting and such should be done on the presentation layer? All formatting? How are you defining formatting for a statement like that? I am curious because I am about to impliment a solution on a project that requires a pivot and I am considering whether to do it in SQL or my C# app.Thanks! Love your site!

Yes, Richard, all formatting should be done in your presentation layer -- and this usually includes pivots. A relational database should not be returning results with varying column names; the structure of the data returned should be consistent. If your presentation layer is capable, it is just about always easier, shorter and more efficient to simply let your presentation layer pivot data. This includes Reporting Services, Crystal Reports, Access, even Excel via pivot tables. see:

http://weblogs.sqlteam.com/jeffs/archive/2005/05/12/5127.aspx

for a performance comparison for doing this in C# versus at the database layer.

Good discussion, however once I have a the pivot results from your function or a SQL pivot, how would I consolidate the results (squash the rows) so that the following NULLS would be removed to look like:

Mike -- those results make no sense .. I have no idea what you are trying to do there. Can you try to be more specific? As I mentioned several times, you need to do all summarizing in SQL *BEFORE* the function comes into play; if you don't want to return 1 row per PartCodeID, don't group by that column in your SELECT.

This has been a wonderful posting. I agree that the performance is much better in the presentation layer. Further, the implementation using a datareader as opposed to a dataadapter increases the efficiency. As I have been using the code, I started to wonder why we could not simply use an inheritance of the ObjectDataSource in ASP 2.0 and create a custom PivotDataSource using the functions listed here.

I have looked into the implementation of an ICollection to support composite primary key fields. However, I don't think this will work for multiple column fields. I just don't see how the standard GridView control could support it. A PivotView control would need to be developed and with AJAX and LINQ the possibility of this control seems bery promising.