Subscribe

Archive for the ‘PowerPivot’ Category

I have an “all in one” virtual machine for demonstrating the Business Intelligence features of SQL Server 2012 in combination with SharePoint 2010. My VM is based on the “Kiwi Build” setup documented in the TechNet Wiki article here (although mine is using VMWare Workstation instead of Hyper-V). As well as all varieties of Analysis Services (Tabular, MultiDimensional, PowerPivot) my VM also runs SharePoint, Master Data Services and Data Quality Services.

Since the initial release of SQL Server 2012, there have so far been three Cumulative Updates and this week a very large security update. I generally check these patches out using my VM and each time I’ve come across specific post-patch issues which need to be addressed. I thought it would be worth sharing these here, both for my own future reference and hopefully to assist others. Note that these issues are my experience on my VM – you may encounter very different problems when you apply the patches, most of which are caused by the updated assembly versions that the patch installed – I probably can’t help you with those.

So this week, as usual, I first took a snapshot of the VM, then applied the security update. This installed without problem and I could see following the update that all of the expected SQL Server services were running as was SharePoint Server. The post-patch fixing up is only needed once you start using the various SQL Server services –

PowerPivot Service in SharePoint

First I went to a PowerPivot Gallery in SharePoint, opened a workbook to display a pivot table with slicers. When I clicked on a slicer to filter the data, I immediately got a message saying “an error occurred during an attempt to establish a connection to the external data source” –

To fix this and get PowerPivot working again, you need to re-run the PowerPivot Configuration Tool. When this loads it checks the current farm status and in my case returned the following message telling me “newer versions of the PowerPivot solution files have been detected” –

And as you can see, the “Upgrade” option is already selected –

I did as I was told, ran the “upgrade features” option and once this completed, the error message had gone from SharePoint and I could once again use my published PowerPivot models.

Data Quality Services

Next I opened the Data Quality Services Client and attempted to make a connection to the server. I got a very verbose error message essentially telling me “an error occurred in the Microsoft .NET Framework while trying to load assembly id 65627” –

A quick online search led me to this document which discusses installing Cumulative Updates or patches on Data Quality Services. It doesn’t explicitly mention assembly ID 65627 (although other online articles do) but as it says, you do have to run DQSInstaller.exe – upgrade to fix the issue above which did the job for me.

Master Data Services

Finally (for me anyway) Master Data Services. When I opened the Master Data Manager web client, the message I got was “the client version is incompatible with the database version” and a request to update components using Master Data Configuration Manager –

So, off I went to Master Data Services Configuration Manager, connected to my MDS database, and got the same advice about upgrading the database –

Which I duly did, and once complete Master Data Services and Manager were back up and running (actually I think this is a little misleading because the original MDS error suggests that the client version is out of step, but the database upgrade fixes it).

So there you have it. From what I’ve seen so far all or some of the above steps will be needed after patching SQL Server 2012, depending upon which features you have installed. At the time of writing I haven’t tried applying Service Pack 1 which is currently at CTP level but if anyone has, please leave a comment.

Update December 2012 – I can confirm that after applying SQL Server 2012 Service Pack 1 it was again necessary to complete the steps outlined above fix the issues. As Cumulative Update 1 for SQL Server Service Pack 1 arrived fairly soon after I also tested this and found that SP1 CU1 didn’t cause anything to break.

If you take a look on the Guardian site you will see that there have already been some interesting visualisations done using the data, like this one on Tableau Public. The source download contains over 1.4 million rows and for each ride you can see starting and finishing location as well as dates and times, which bike was used and duration of the hire. So I thought, how well would this work in PowerPivot in Excel?

I downloaded the full ZIP file of data – over 156mb text file once unzipped, with a separate file describing the column names. If you want the column titles in place before loading into PowerPivot (so they can be used as is without the need for any renaming) you will need to use a text editor capable of loading the source Raw Data file. Something like NotePad++ works well here.

Firing up Excel and the PowerPivot window, first step is to get the data loaded. The data has a tab character Column Separator and if you have added the column headers into the top row make sure you checked the box – otherwise just rename the columns once imported –

Worth bearing in mind of course that 1.4 million rows is too big to load into Excel directly where you have a row limit of 1 million rows. But 1.4 million rows is fairly trivial for PowerPivot, which can handle much higher data volumes.

First thing I like to do once the data has been imported is take a roam around it to understand what’s on offer and whether any adjustments or cleaning needs to happen. You will see that the start and end time columns need to have their format changed. On load they show as dates (30/12/1899) and we need to set them to a time format to make them useful.

You will also see Start and Finish Station formats, such as “Queen Street,Bank”. I thought it might be more interesting if the start and finish stations could be split into the area and individual bike docking station. Area and docking station are in most (but not all) cases separated with a comma. So we can create calculated columns to split up the data to left or right of the comma with the addition of some conditional logic to handle any data which doesn’t contain a comma. Most of this works pretty much the same way as it in Excel and there is a good article on splitting data into separate columns here.

I’m already getting some useful data, but lets take things a little further –

We have Start and End dates for each ride, so we can follow PowerPivot best Practice and create Date dimensions for each, as linked Excel tables. You will find some very useful articles online on how to do this – I would particularly recommend this excerpt from the invaluable book by Marco Russo and Alberto Ferrari (or buy the actual book – well worth it). Once I have my date dimension joined to the source data I can easily get to day of the week, calculate weekdays or weekends etc and if I want I can use DAX Time Intelligence Functions. I also created calculated columns to show for hour of the day between 0 and 23 and a simple AM or PM column based on that hour.

Next I took a look at adding a linked table which could be used to “band” the journey times. The table I created in Excel looked like this:

Interesting point to note here – take a look at the size of my saved Excel file containing the loaded PowerPivot data and compare it to the size of the source text file I loaded:

43,885 KB in PowerPivot compared to 159,653 KB for the original text file!

Now I can start exploring the data by creating some pivot tables and charts. This Pivot Chart shows total bike hires by hour of the day with a “starting area” slicer and the table below shows the area in which the bike hires ended up, also linked to the same filter:

And this one uses the calculated “banding” column to group journey times by month and starting location:

At this point you can easily find out a lot more – average journey times, which bikes got used the most, maybe some “ratio to parent” calculations to see which docking stations are used most in each area.

To sum up, for me this reinforced how quickly you can get some interesting analyses out of large data sets with one of the most widely used desktop tools out there. If you are interested in trying this for yourself the first thing I would suggest is getting hold of this book (or if you are not a technical book person, maybe get yourself booked on one of the PowerPivot Workshops).

If you try working with the bike data yourself, either in PowerPivot or one of the other analysis or visualisation tools leave a comment on the blog. And hopefully Transport for London will make even more data available – maybe I can try a full year’s worth next time.

I’ve been spending a good bit of time with PowerPivot in recent months, primarily as part of a proof of concept exercise for a client. Like many others I have to say I started looking at it with a high degree of scepticism. Surely I thought, this thing goes against all of the principles we’ve been trying to drum into users, particularly the importance of building a well designed Unified Dimensional Model with calculated members in our Analysis Service cube – then we get that “one version of the truth” that we’re all after when we build a Business Intelligence solution. PowerPivot on the other hand is a free add-in for Microsoft Excel 2010 which lets you load very large amounts of data into memory work and with it in a new desktop version of Analysis Services called VertiPaq. So you build your models, including all the calculations, inside of Excel.

The thing is, the big majority of analysts who need to pull together and work with large sets of data often need to do so in response to rapidly changing requirements (and frequently, changing data sources). And of course they spend most of their day using Microsoft Excel. Show PowerPivot to someone who spends their time creating complicated Excel VLookups to build reports and pivot tables against data loaded from database extracts and you will quickly see their eyes light up. And in all of the time I’ve spent working with Microsoft Analysis Services (since the first release in 1998) I’ve never seen as much excitement online about a BI tool as I’ve seen recently for PowerPivot. Do a search on twitter for the #Powerpivot hashtag to see what I mean.

PowerPivot in Excel 2010

So if you have Excel 2010 you can download the PowerPivot add-in from Microsoft’s PowerPivot site here. Note that there are two different versions of the download, one each for the 32-bit and 64-bit versions of Office 2010. Right now most users are likely to be running the 32-bit version I suspect. There are very good reasons for this – if you install 64-bit Office 2010 you may have problems with existing add-ins which are 32-bit only, and there is also an issue with SharePoint datasheet views with 64-bit Office. On the other hand, if you are planning to use PowerPivot with very large data sets you will almost certainly need to install the 64-bit version of Office 2010 – as it says in this TechNet Installation Note – “the 64-bit version of PowerPivot enables you to work with up to 4GB of data in memory, and the 32-bit version enables you to work with up to 2GB of data in memory”. There is some guidance from Microsoft on 64-bit Office 2010 usage here.

Once you are up and running, you’re going to be impressed with the performance you get, even on large datasets. Data can be loaded into PowerPivot from a good variety of sources – SQL Server, Analysis Service cubes, other OLEDB/ODBC data sources, text files, Excel data, also interestingly SQL Server Reporting Services, Atom data feeds and the Azure DataMarket. Again, this is where lightbulbs start coming on over Excel Users’ heads because PowerPivot lets you create joins between disparate data sources. So providing there is a matching column, it’s quite easy to create a relationship between line of business data, maybe residing on SQL Server, and other data you might have acquired from an online service such as Azure DataMarket.

To get the best out of PowerPivot, you need to learn how to use Data Analysis Expressions (or DAX for short). One of the reasons Excel users are likely to warm to PowerPivot more than they have to Analysis Services cubes is that DAX calculations are closer to the type of functions they may already be using in Excel. That’s not to say DAX doesn’t also require some effort to learn, it definitely does if you want to master it. DAX has some great date calculation features built-in by the way, so very good for year-to-date, or same-period-last-year type reports.

So what are the downsides when you’re using PowerPivot with Excel? For me the same issues have cropped up whenever I’ve been shown it to clients. First, there is no easy way to automate the refreshing of your data from within Excel. Right now the only option you have is to open the PowerPivot window and manually refresh one or more of your connections. Because PowerPivot is a separate add-in you can’t automate it via VBA right now either. If you publish your model to PowerPivot on SharePoint 2010 then you can get the server to schedule refreshes for you but if you don’t have this luxury then right now manual refresh is the only option. There is an interesting post from Ken Puls on some of the bits he would like in the next release.

Something else that concerns me slightly is the need for governance when defining and using complex DAX calculations in Excel models. As I’ve already said, a key benefit of the dimensional model, with all of your custom calculations defined in the cube, is that there is a single definition. So if your company has a complex ratio calculation, or way of defining utilisation percentages or whatever then the possibility that two different analysts use different calculations (and arrive at different results) is significantly reduced. DAX formulae can get pretty complex so ideally you would want to store “templated” definitions somewhere. Over on PowerPivot Pro there is an interesting post on how you might use NotePad++ to create and troubleshoot the more complicated formulae. Perhaps something similar could be used as a reference guide for sharing complex DAX calculations.

PowerPivot in SharePoint 2010

So as I’ve mentioned above, as well as being an add-in for Excel 2010, PowerPivot is also an option for SharePoint 2010 in conjunction with SQL Server 2008 R2. Once you have PowerPivot running on a server in your SharePoint 2010 farm you can publish your models to “galleries” and your users can view your models in their browser, without even the need for Excel. Again, a lot of work has gone into the SharePoint bits here, making good use of SilverLight to browse through published documents. SharePoint administrators can also monitor usage of each model, identifying resource utilisation, which items have been opened most frequently and so on. This would then be used to see which PowerPivot models might be suitable candidates for re-development as a true dimensional model in an Analysis Services cube.

PowerPivot on SharePoint 2010 can be an expensive proposition though. You will need SQL Server 2008 R2 Enterprise Edition on the PowerPivot Server. Your SharePoint 2010 farm will also need Enterprise Client Access licences. And the server running PowerPivot will need to be pretty substantial (lots of memory, multi-core etc). Here’s an MSDN document on recommended configurations. I know we keep hearing the phrase “BI for the Masses” but these masses need to have pretty deep pockets. Also note that the PowerPivot Analysis Services “VertiPaq” engine will be a separate instance to any existing traditional Analysis Services instance you might be running, almost certainly requiring its own server.

If you want an in-depth understanding of how to use PowerPivot and DAX in Excel 2010 I would wholeheartedly recommend PowerPivot for Excel 2010 by Marco Russo and Alberto Ferrari. Excellent explanations of how it all works, best practices and lots of examples. You can download sample chapters here.

This is already a long post and I haven’t even mentioned the next release of SQL Server (currently code named Denali but my spell checker keeps suggesting “denial”). From the information recently released understanding how to use PowerPivot is going to stand you in good stead when the new release is available. Here’s a good overview of the “Denali” roadmap. I’ll try to follow up this post with news about Denali BI features when more information is available.

For my own part I’m very excited about the possibilities that PowerPivot opens up. Given that we are only in version 1 most of what I’ve seen is very impressive and I would definitely recommend taking it for a test drive. And your Excel “power users” will definitely like what they see.

Let me know what you think – can you see a use for PowerPivot in your organisation? Is “self service BI” a step in the right direction?