Archive for the ‘Analysis Services’ Category

There is a new Analysis Services (SSAS) Tabular book that is available to own and add to your library. The book is written by Paul te Braak, who is a lead business intelligence consultant in Australia and is one of the developers on the DAX Studio project, and he has put together an outstanding cookbook. When the book was released I was surprised and excited. I was surprised because I did not know that Paul was working on this (he is the sole author, big kudos to Paul), and excited because I knew it was going to be a good one. I had this one on my radar list of books to add to my collection; I am definitely a big fan of the Packt Publishing Cookbook series style of books. What I like about the books is that they introduce a topic or situation and then go over the solution in a very simple and easy to understand format – Getting Ready, How to do it, How it Works, There’s more. Paul adds a lot of great insights in this book in explaining how the solutions work as well as including a bunch of ‘Tips’ along the way as well.

Here is when I first became aware that the book was released last month and posted the O’Reilly media deal:

So when I was asked by the publisher this past month to provide a review of the book I was more than happy to say ‘yes’.

Paul does a great job on slowly working you into the Tabular modeling concepts and the only tool you need to get going is Excel and the Power Pivot add-in. Paul’s examples use Excel workbooks and flat files for the most part, so that makes it really easy to get started and get your learn on.

What is amazing is that this book is just over 300 pages and it is loaded with great content that covers items such as how to use Power Pivot, hierarchies, drilldown, parent-child hierarchies (including how to hidememberif in DAX), smart measures, smart keys, programmatic access in Excel – cube functions and VBA, querying with DAX, Power View, and more! Simply amazing, Paul does a fabulous job and this is a great intro book that progresses into advanced topics and has great examples, tips, and insights that are a big time value add.

The book has been updated for the 2012 edition of SSAS and the examples have also been updated against the 2012 version of Adventure Works database. The other difference from the 2008 R2 release of the book is that some of content has been broken up into a a couple of new chapters.

Well this past week I was prepping for my pre-con for the SQLSaturday #149 event and I noticed something odd. When I was connecting to my tabular models with Power View I was not seeing my hierarchies or key performance indicators (KPIs). I was a bit confused. I upgraded my environment to SP1, so what was going on. I thought the upgrade provided support for these model features now, right? I know I had seen them already, maybe I was just imagining things though. When I was doing my tabular project deployments I was also seeing a new dialog box that I was just clicking ‘Yes’ on without really understanding what it was talking about…

I completed out the pre-con still confused and then last weekend I did a bit more digging into this issue. I went back to my Excel 2013 SQLSaturday model that I upgraded and I reviewed the property settings on the database.

The item that I noticed that seemed like it might be important was the Compatibility Level setting. This database was showing SQL Server 2012 SP1 (1103). I then took a look at the database I was using for some of my demos.

Ahhh, now the light bulb was starting to get bright:) The database that I was deploying was still in the SQL Server 2012 (1100) setting even though I had upgraded my environment to SP1. Interesting. Apparently you have the ability to control this setting now if you want to run in different levels or do some testing in the same environment before upgrading all of your databases.

I didn’t realize that I now had to make a change on my model property settings before I deployed my database.

Once this setting is changed you will be prompted with the following:

Once again I click ‘Yes’ and redeploy to upgrade the database on the server. Now when I connect to my model with Power View I see my hierarchies and KPIs, perfect!

So just something to be aware of now when working with tabular model projects now. There is a property setting that we have to take a look at, Compatibility Level, and it is a pretty important one for the users of our data models:)

Here are a couple of additional reference links to take a look at (some of this documentation and instructions might differ from what you see with CTP4 of SP1, but they should be available in the RTM of SP1):

This is by no means an introduction (101) book to MDX, but it is an MDX reference book that anyone using Analysis Services should have on their desk (or device – available in eBook version as well). I have not read a cookbook technical style of book before and I simply couldn’t put it down. I really enjoyed the style of here is a situation, here’s how you do it, and now let’s see how this actually works. But the author doesn’t just stop there, nope, but what there’s more information about the recipe and then provides additional reference links to checkout or other spots in the book to look at as well. After I got done reading one recipe I continued on to the next one because I was interested in reading the solution and finding out more and seeing how it compared to how I would approach the situation.

I found the book very easy to read and the information is extremely valuable for anyone working with Analysis Services and trying to solve solutions using MDX. This is a must have book for any SSAS professional, if you don’t believe me, check it out for yourself and you will definitely agree, just like others already have like Derek Goodridge (@WorkerThread) and Jason Thomas (@de_unparagoned).

Well just this week I received my very own eBook version for review from the publisher and I have already started diving into it to check it out. This is a cookbook style of book that goes over different scenarios and/or issues and then provides the solution, explains the solution, and then provides even more information in regards to the item. Very slick and I love the format. Not only is Microsoft SQL Server MVP Tomislav Piasevoli a pro when it comes to this content, but he also got an all star MVP lineup to review and edit the book including Greg Galloway, Darren Gosbell, Deepak Puri, Marco Russo, and Chris Webb.

I plan on providing a full review on the book in a couple of weeks, but just wanted to put another shout out in regards to the book in case you haven’t heard about it yet and are looking for a technical reference cookbook in regards to MDX. Stay tuned for my complete review.

In working on a business intelligence project a common situation you will find yourself in is having to come up with a cumulative value (running value) calculation. The challenge that you will find is that depending on what product you are working with the calculation that you need is going to be different. In this post I will compare how this can be done with T-SQL, Excel, SSIS, SSRS, MDX, and DAX. Along the way I will provide some additional reference links to other options, content, and I will point out some differences in how the totals are being calculated so you will know what to expect with the results.

In these examples I will be working with the Contoso Retail Data Warehouse data sample. These examples below are not necessarily going to be the optimal approaches, just showing how you can go about getting at the desired result.

SQL Server T-SQL

So let’s start out taking a quick look how calculating the sales over each of the years directly against the data in the database through SQL Server Management Studio.

Not too bad. We have an inner query that is being used to return the pervious year(s) values and append that onto the current year (if they exist). If no value is returned then a zero is used in its place. For additional examples and options take a look at this posting by Garth Wells – Calculating Running Totals.

Microsoft Excel

So now lets take a look at how this would be done if we were using Excel.

There area a few different options you have here, in this example I enter a formula in the C2 cell of =SUM($B$2:B2) and then copy that formula down into cells C3 and C4.

SQL Server Integration Services (SSIS)

Now we will take a quick look at how this could be done in the ETL process. As we are pulling data from a source we can evaluate the values in the data flow and accumulate the values inside a script component. In the source we will simply get the sales for each year and then append the values in the flow to each other to get at the same results displayed above.

The cumulative value is being performed in the script component with some code. We have a variable setup that we are using to append the sales amount of each row in the data flow and then sending this value back out in a new output column called CumulativeSales.

Nothing too fancy going on here. I don’t have a destination in the flow, just added a derived column to get the data viewer included so that I could run the data flow to show the results.

SQL Server Reporting Services (SSRS)

Now lets switch over to Reporting Services. In this example we will not do the cumulative value in the dataset query, we will do this value calculation in the report.

In this example we leverage the RunningValue aggregate function that is available in Reporting Services. The expression that we use in the Cumulative Sales column is =RunningValue(Fields!SalesAmount.Value,Sum,”DataSet1″).

This function returns a running aggregate of the values in our specified dataset using the SUM function which is also specified in the expression.

SQL Server Analysis Services (SSAS) MDX

Now how about taking a look at Analysis Services. How would we go about doing this calculation using MDX?

There are additional ways in going about this calculation and for more options take a look at this recent posting here by Amish Manubhai Shah – Various way to calculate running total from All Period to Currentmember. Now I want to show how this solution differs a bit from the other ones we have seen. What happens if I apply a filter to the query to only display 2008 and 2009? What would you expect?

Might not be what you would want, but then again maybe it is. The Cumulative Sales calculation is summing up the values from the beginning of time (the [All] level). So the first value that we see in 2008 is including the Sales Amount from 2007 as well (if there was sales in 2006 it would include that as well and so on).

You can make a few modifications to the calculation and setup a named set to reference the items and come up with something along these lines.

Here a dynamic set is used to get the references of the members that are in the slicer (WHERE clause) and feed this information into the calculation for the cumulative sales to grab the first item to come up with the reference to the 2008 year. This will be the starting point for the cumulative total instead of from the beginning of time.

You might just want to look at year-to-date cumulative totals and be able to drilldown into a user defined hierarchy if you have one setup like Year-Quarter-Month-Date. If we modify our calculation a bit we might end up with something like the following:

Here we can see that our cumulative total stops at the year level and we can see at the lower levels that the values are accumulating as expected, so the cumulative total at Q4 is same as the Yearly value. What we have now is a YTD calculation.

PowerPivot for Excel 2010 DAX

We have one more product that I want to take a look at and that would be PowerPivot. This is going to be fairly similar to the Analysis Services solution since it actually is Analysis Services behind the scenes, the only difference here is that we will be using DAX instead of MDX.

So for the first example we will setup a cumulative sales calculation just like we did with the first MDX example. The calculation we use is as follows:

=CALCULATE(SUM(FactSales[SalesAmount]),

DATESBETWEEN(DimDate[Datekey],

FIRSTDATE(ALL(DimDate[Datekey])),

LASTDATE(DimDate[Datekey])))

So we end up with the following:

You could go ahead and add some additional logic to evaluate the Sales Amount to determine if a value exists, but you get the general idea here.

Once again this is calculating the cumulative total from the beginning of all time and that is done by using the ALL reference in the calculation for the FIRSTDATE. If we filter the years and only display 2008 and 2009 we will see the similar results like we did with MDX where the cumulative sales amount for 2008 includes the 2007 sales.

The one nice thing about PowerPivot is that this is in Excel, so if you want to do anything a little more custom or make any references to cells you can do that and use Excel functions as well. As far as setting up a cumulative sales amount value like the second MDX example I can’t determine an equivalent in DAX. Maybe someone else has an idea if this can be done so that it only does this based on the Date values being evaluated. If you do, please leave a comment.

You can do the YTD calculation in DAX and here is formula for that:

=TOTALYTD(SUM(FactSales[SalesAmount]),DimDate[Datekey])

And the results look like this if we break out the years by the quarters:

Conclusion

Cumulative total (running value) calculations are all calculated differently within each of the Microsoft products. We took a look at some examples of how this can be done in T-SQL, Excel, SSIS, SSRS, MDX, and DAX. These are not the only options, so if you want to see more take a look at the additional posts that I included as reference points the different sections. It is important to understand how the formulas need to be configured and also what the results are once the calculations and logic is put into place. I think that one of the easiest ones, besides the Excel formula, would have to be the SSRS with the RunningValue aggregate function. I like how this is setup and it evaluates everything based on the context that is being referenced. So if we go back to the SSRS report and apply a filter to the tablix to remove 2007 from what is displayed we would end up with the following:

Very simple and easy to understand, but that is just my personal opinion.

I get a lot of questions about this and see quite a few postings about Analysis Services processing in the MSDN forums. So you created an SSAS database solution and have deployed it to a production environment. Now the data has accumulated and you need to take a look at moving to a incremental processing approach because you are either loading the data more regularly or the volume of data and processing does not meet your maintenance window anymore. So what do you do?

The solution is to look at doing a more incremental processing approach instead of just a Process Full on the database. You have a few options in regards to this and I am going to discuss what happens during a dimension update process and also go over an SSIS solution that I put together.

In order to update and refresh the data in your dimension table you can perform a Process Update on the dimension (or a Process Add if you are only adding new members and not updating any records, but that typically is not the case). So what happens to the cube data when you update a dimension? Well if nothing changed then you are okay, but if you have flexible relationships and the data within these relationships changed then the aggregation designs for these are dropped from the partitions. Here is a walkthrough showing this:

Aggregation Advanced View Display in SSAS 2008

Query the partition information to view the aggregation information

Process Update with no changes – everything will stay the same

Process Update and there were data changes to dimension attributes that had aggregation designs

Aggregations designs are dropped that referenced those dimension attributes (A0 and A2 were dropped). How do you get them back?

Run a Process Default on the affected partitions (you could also choose the option to process affected objects during your dimension processing, this will do the same thing)

Aggregation designs are now back

So that was just a quick overview of what happens when you do a refresh of a dimension in your SSAS database. You can run this, just be aware of what is going on behind the scenes. If there were changes and relationships were updated, then you will also need to rebuild the aggregation and indexes if these attributes were included in any aggregation designs.

So if you want to incorporate this logic along with a incremental update of your partitions, what can you do? Well for this I leveraged SSIS and setup a package that allows for either a full or incremental update process. In my example here I actually do this for a specific cube in a SSAS database, so I only process the dimensions that are associated with a particular cube.

I have to give credit to Vidas Matelis here for providing some of the AMO coding for the processing, SSIS package that process all partitions/measure groups/cubes in one database . I modified it a bit to be able to limit the scope for a particular cube along with a Try Catch block in the code, but it is what I used for a starting point. Using the AMO allows the process to be more flexible and dynamic. As with all things, as you expand on concepts and build them out they can start to get a bit complex, so I will try and explain this as best I can in a brief sort of way for this posting.

I added a script task as a starting point so that I could evaluate the ProcessFull variable and I also use this to set the number of months that will be processed (or this could be years depending on your volume. I use a naming convention in the partition names so that it is like partition_name YYYY or partition_name YYYYMM. In this process it will be for the later YYYYMM format. So if the IncrementalMonthsProcess is set to 2 then at the time of this posting the value would be set to 200909 and this will be used later in the data flow task for evaluation where the names of the partitions will be parsed.

Now at this point you move on to either the Full or Incremental processing. The Full is fairly self explanatory where I have it broken into a full dimension process script task and then another script task for the partition processing. I broke it out into two tasks so that I could get some additional duration information for logging and history. The dimension processing only processed the dimensions for the specified cube and this is the same for the incremental approach as well. So if you have multiple cubes with more dimensions or some unused dimensions then these would not be touched.

The incremental process starts out with a script task that does a process update on all of the dimensions in the specified cube. Same script task code as the full with the exception of the type of processing that is performed.

The data flow task is used to create three recordsets of data that will be used for the partition processing – a full, data, and default processing list. I setup a script task as a source to create a recordset of values and to parse out the date information in the partitions if it exists.

The first list that is created is the full list of partitions and these get placed in the default processing list (object variable) that will be used in the last step of the package.

The split transform is used to determine which partitions will have a full process performed on them (no date information included in the partition names, so yearinfo was set to zero) along with the partitions that will be part of the incremental update. These partitions are typically part of a large measure group and the data has been partitioned by year or month (in this case month) and we only are going to process the last 2 months based on step 1. We are just going to do a Process Data on these partitions (this reduces impact on server and also once this is done makes the data accessible to end-users – the final step in the package will build out the aggregations and indexes). The remaining partitions will be left untouched (at least in regards to processing the data, aggregations might be rebuilt if they were affected).

Now back to the control flow and to finish off the process we first do the process full partition list, then move on to the process data partition list, and finally the process default processing list. As shown above in the screenshots the last step will do nothing to the partitions that were fully processed, but it will build the aggregations and indexes for the process data partitions and also for any additional partitions that might have been affected by the dimension updates.

So that was a quick overview of the process. Now there are definitely other ways you could go about this, this is simply one solution that I came up with and it has worked good. I have made a few custom modifications as needed like to do incremental processing by year or month along with limiting it to a particular cube, but these can quickly be modified and put to use in pretty much any environment. Just set the variable values (ASServer, ASDatabase, ASCube, ProcessFull, IncrementalMonthsProcess) and you are off to the races. If you want to turn on the SSIS logging that is up to you. I have used the SQL Server table logging and then leverage the SSIS report pack for reporting purposes, but that is entirely up to you.

I have uploaded the package if you want to check it out to my SkyDrive account. I do have a SSIS 2005 and 2008 version available to take a look at in the download and I have annotations included in the packages.

Hope you find this useful

(Click on the image above that contains link to download. You should be able to do a right-click and do a save target)

UPDATE (6/2/2010): Package download has been updated to reflect a fix to handle the partitions with a yearly naming convention so that they do not get included in the full processing. This was discovered during my May 2010 presentation preparation – PASSMN May 2010 Follow-up.

*** This has been updated and is the official release for this fix now ***

So for those of you who have had to put the upgrade to Windows Server 2008 on hold because of this you can now download the patch and continue on with your infrastructure upgrades. Just make sure that you properly test the patch before deploying this into your production environments.

Just wanted to touch base on a couple of items with partitions referencing the ever popular Adventure Works 2008 sample Analysis Service project available to download from CodePlex here – SQL Server 2008 SR1 downloads – it appears that they have bundled the databases together now, used to be broken out by the operational and the data warehouse. The sample project file will be part of this download for Analysis Services and will be located in the following directory if you go with the default setup – C:\Program Files\Microsoft SQL Server\100\Tools\Samples\AdventureWorks 2008 Analysis Services Project.

Now if you open up the Enterprise version you will see that the measures groups have multiple partitions defined for them. One thing to note is that the counts are not properly updated and that aggregations have not been defined for all partitions in the cube.

If you have BIDS Helper installed you can go ahead and use the add-in to perform an update on the estimated counts for all of your objects.

Just a warning, if you are going against a very large database you might not want to perform this operation.

As a work around you could go ahead and simply update the estimated row count in the properties for the partition to provide Analysis Services an estimated row count when you define and generate the aggregations for the partitions (actually there is a Partition Count setting that you will see when you go through the aggregation design wizard that gets used by the algorithm when creating aggregations, so it is important to set these values).

After the estimated counts have been updated you will see that the counts are updated, but you would still need to design aggregations for the partitions where these have not been defined yet (you might need to save the cube file, close it, and reopen the file to see the updated counts).

To create the aggregations in SSAS 2008 you have to switch over to the new Aggregations tab in the cube. You can then select the group that you want to design the aggregates for and walk through the wizard.

And you can generate the aggregates for all of the partitions at once.

Ok, now let’s get back to the partitions portion. I am going to make a modification to the Internet Sales partition for 2004 and break this out into Q1 and then place Q2 in a separate partition. This really doesn’t need to be done for the Adventure Works data since the volume of data is extremely small, but in a real world scenario this could definitely improve query performance (plus reduce processing time if you are just processing the latest quarter instead of the entire year).

I went ahead and modified the query of the existing 2004 partition so that the cutoff was less than 20040401 instead of 20041231 for OrderDatekey. You need to be careful that you do not overlap the ranges, because there is no validation going on, so you could potentially include data that is already in an existing partition. After I had modified the existing partition for 2004 and added the new partition I went ahead and updated the estimated counts.

Now that this is done let’s run a query against the cube and take a look at what is going on. Here is the query that I will execute against the cube:

So the query executed and if we look at the Profiler trace we can see that the query actually touched multiple partitions for the measure group that we were querying information from. The reason for this is because we have modified the basic partition from a single year. If we would have left it at the year level it would have been fine, but since we are dividing this up into multiple parts now it does not know where to retrieve the data to satisfy the query.

Lets go back into the partitions in the cube and set the ‘Slice’ property for the partitions. This is where you specify the tuple that defines the partition.

Now that we have this setup we will redeploy and run the query again. You will need to define the ‘Slice’ property on each of the partitions in the measure group.

UPDATE (5/16/2009): fixed this picture (before I had highlighted the aggregation, not the partition)

Now that we have defined the ‘Slice’ for the partitions we see that our query only touches the partition that we are querying against and it was faster in response time Granted this is a small set of data and using this doesn’t really make too much of a difference, but you can imagine what this would do to a very large dataset. And if we switched over to reference the Fiscal hierarchy instead we would see the same results.

That is it for now, hope you enjoyed this little tip and I want to thank Siva Harinath and Howie Dickerman from Microsoft for their presentation at last year’s Microsoft BI Conference Designing High Performance Cubes in SQL Server 2008 Analysis Services where they pointed out this item.

A couple of weeks back I experienced an issue with a some what large dimension. The dimension had around 15.5MM rows and it continued to error while processing the key attribute within the dimension in Analysis Services (SSAS) 2008. I don’t believe this is a version issue at all and I was able to come up with a solution that resolved the processing issue which I will explain.

What was happening was that all of the other attribute hierarchies were processing fine, but every time the key attribute was being processed it would continue to error out in the same spot (around 60K rows) and it would return this error message (I swapped out a few names to protect the innocent):

File system error: While attempting to read information from disk, a read error occurred for physical file: \?D:Program FilesMicrosoft SQL ServerMSAS10.MSSQLSERVEROLAPTempMSMDCacheRowsetBadRows_1932_16629_jm313.tmp, logical file: . Errors in the OLAP storage engine: An error occurred while the ‘DimensionName Key’ attribute of the ‘DimensionName’ dimension from the ‘SSAS_DB’ database was being processed.

This processing was going against an Oracle data store, not that I feel that the issue was related to that, but just a disclaimer.

I started in by verifying the drive space for the Temp directory that gets utilized during the processing to ensure that enough space was allocated. I also review the design of the dimension to make sure that options like hierarchies are not optimized and not ordered were being set properly. I also verified the SSAS server properties to make sure that the EnternalCommandTimeout was adequate to make sure that the queries would not timeout during processing. Unfortunately none of this seemed to make a difference and I continued to get the same error message. I even verified the size on the asstore file for the dimension attribute thinking that maybe I was bumping up against the 4GB limit, but it was only around 355MB.

After searching the web and forums I came across an item that seemed like a potential game winner – Using ProcessingGroup Dimension property option ByTable vs. ByAttribute may error with string keys. At this point I had nothing to lose and based on the hardware – Dell 2950s with 32GB of RAM and 2 quad core processors, plus we were running x64 software – I felt that this was a good solution. I went ahead and switched the ProcessingGroup property on the dimension from the default ByAttribute to ByTable. So basically a massive query gets executed and the results get cached which is then used to process all of the attribute hierarchies. In running this process not only did the process complete successfully, but it actually shaved about 20% of the time off for processing the dimension. You definitely need to read up on this process, make sure that you have the hardware in place, and do some proper testing.

Some of the other references that I looked at during this to fix the processing issue were the following: