OBIEE, OBI Apps, ODI, Informatica, Essbase, BI Publisher...you name it. The OBI suite entails them all and more.
I will talk mainly about its technical aspects, interesting findings and noteworthy solutions, but will definitely throw in some random weirdness.

Friday, March 27, 2009

Ranking on Essbase from OBIEE

Greetings, programs!

So I was doing some ranking reports on Essbase sources. From a rather simple request, this developed into a long quest for the best solution in terms of performance, usability, flexibility and making the thing dynamic. For anyone who faces this…here’s a quick overview of the steps I went through and my findings. You might need to scroll somewhat to get to the final solution.

The report was basically about getting the top N products by “actual versus forecast” “gross sales and backlog” (i.e. two measure hierarchies are taking part in this) on the lowest level of the product dimension with N being a prompted value.The product dimension itself included two hierarchies: business and marketing. The split beginning at Generation2, the query should only look at the marketing hierarchy and then filter more detailed on the marketing units (Gen3) in a prompt.

Fair enough. So let’s create this using the normal functionalities in Answers.

The important bit being the “Rank” column:

topn(Scenarios."Act vs Forecast",@{vPromptedRanking}{10})

Firing this off against the datasource results in…a long while of me staring at “Retrieving data”. Performance is quite bad with this simple example taking 35 seconds. In my other use cases with more members and especially some nasty calculated members in the account hierarchy we are hitting up to 5 minutes in the dashboard with all the prompts and whatnot. Unacceptable. To the log(cave)!

First thing we see: there’s no ranking. No restriction whatsoever with regards to the “Act vs Forecast”. What does this mean for the query? It will actually retrieve me all the members I’ve chosen for all rows in the query. Unconstrained except for the filters in the “With” statement. Not very performing to start with, but that full data stream is then received by the BI server, treated again and everything after number 10 cut off:

Basically the TopN function isn’t translated into its Essbase cousin TopCount which would have neatly given us 10 records right from the start. To cut the story short for this bit: I’ve gotten a confirmation from Metalink: “1. Yes none of the Answer based functions are translated into MDX functions for 10G. This may be implemented in 11g.” Bugger.

While the SR was pending I already had started with my alternative approach. Do whatever it takes to constrain the data set with the obvious choice being the usage of TopCount. To test faster I shrink the report. The prompts are kicked out and only generic filters kept.

Major change is the usage of an EVALUATE function which retrieves me the dimension members I want:

Ok. 10 rows queried from Essbase. Performance is at 10 seconds for the whole report Next step: I bring back the prompts constraining on the marketing branch of the hierarchy and the marketing units in particular:

Cute. If I combine TopCount with OBIEE filters and both of them are based on the same dimension, the aggregation level is changed to the one of the filter criteria. Since I use SMU, all numbers are retrieved for this level. Lesson learned.

This one’s a bit more complex to read, but we basically force an intersection of the top 10 descendants of the marketing hierarchy starting member called SMU, 6 generations down (since SMU is generation 2) with the lowest level product.

The results are nothing but staggering in terms of performance increase.

And for switching the actual units:EVALUATE('TopCount(Intersect(Descendants([@{vPromptedSMU}{Business Markets}],5,SELF),%1.dimension.members), @{vPromptedRanking}{10},[Act vs Forecast])',Product."Gen8,Product")

Seems perfect. It does everything I need, is dynamic and flexible enough and is quick. So basically you could stop reading here.Alas, checking the actual result sets I see some differences between the OBIEE TopN based report and the MDX TopCount based one:

TopCount

TopN

The highest in TopCount is only fourth in the TopN while 915-000036, 984-000061 and 984-000061 have been dropped from the TopCount. Finding the reason for this was a bit tedious, but those three members are actually shared members.My whole product dimension is basically an unbalanced hierarchy. The business hierarchy using the products as its lowest level in generation 7 while the marketing hierarchy uses the products as lowest level in generation 8. Logically inconsistent levels (since the groupings etc. differ) – CHECK. Inconsistent depth (7 vs. 8 generations) - CHECK. With most products being shared members on one branch or the other, this kills TopCount.

I have the feeling that transforming the hierarchy into a ragged one would solve the issue since both branches would end on generation 8 (the target of my MDX).But let’s continue down the road where we have control over the bits and pieces: I’m waiting on input on the MDX itself from a new service request I raised. But since I still need a viable solution that supports all my requirements and is fast (everything beyond 10 seconds is slow for me).

It now retrieves descendant products of the prompted marketing unit, ordered by actual vs. forecast. This will still retrieve quite some records, but we have the certainty that even shared members are included.

My search for the TopN now happens again over the filters.

rank(Scenarios.Act vs Forecast) is less than or equal to @{vPromptedRanking}{10}

I left all additional calculated members etc. in, so you can see that even with the added payload, this is a way to go. So let’s have a look at the log:

6 seconds is an acceptable performance for an interim solution. What’s left now is to replace the normal navigation on dimension members retrieved through the rpd with goURL commands on the column populated by the MDX. Oh yeah…and to hope the service request will give a solution for the Topcount issue with shared members.

Thanks to Pradeesh for sharing the workload and also trying out some of the weirder options I came up with.

8 comments:

1. Product hierarchy upto 5 level. Product Family Product Category Product Sub-category Parent Product Product2. I have filters are on Product Family and Product Sub-category and on year from date dimension3. I have got Cost as measure4. I need to find top 25 Products based on Cost

I am trying to use the evaluate function as you have suggested here but somehow the MDX is not generated properly and I am getting an error like -

There's one crucial thing you need to watch out for when using TopCount through an EVALUATE wrapper: if you use - in your example - the Product dimension for doing the TopCount, then the Answers request must not contain any other instance of the product dimension! Neither as a displayed column nor as a filter.

Problem with a second display column from Products: the MDX breaks like in your case above.

Problem with a filter: unfortunately it's a little "trap". I've ntoed this in my post above:"If I combine TopCount with OBIEE filters and both of them are based on the same dimension, the aggregation level is changed to the one of the filter criteria."

So basically if you want the top 25 Products (lowest level of hierarchy) and are filtering for the Product Sub-Category "Semi-Solid", then you will get the cost numbers for the Sub-Category level, not the product level.

Now since you have to work with these two cosntraints the EVALUATE for you must become more complex to handle both the ranking and the filtering at the same time and would read somehtinglike this:

NOTE: If your Product dimension contains shared members, the results will be wrong (see an exmaple in my post above). If this is the case you will need to go for a solution which is halfway between MDX and OBIEE native functions as outlined in my last example above: MDX to get the Products and OBIEE "Rank" to make the ranking.

I have one more queestion. Not related to this but some what linked to it.

This is kind of really questionable question, but its required by business and hence I need to do it this way.

Here is the question, I have two different cube and now I have to use above mentioned result, i.e. top 25 products as input in another cube and find out few values from other dimensions. Product dimension is common between this two cube. Other cube holds more of business information than measures.

Assume we have customers for given product and for each customer we have different terms and conditions based on contract.

I would like to find those terms and coditions for top 25 products based on sales. I can't attach it as attribute or UDA in other cube's product dimension as this will change from time to time. So for same customer and product I can have one set of T&C for 6 months and different for next six month.

The questions is there anyway I can pass the result of first query to second cube and find the result.

It seems you are really hitting the limits of what OBIEE 10g can do against that specific cube. Back when I faced this problem for the first time (3 years ago...so bear with me if I can't remember all the details), performance was ok-ish. In the meantime, it will have grown and undergone structure changes as well, so you may need to pull tricks to get it to work.

Things I see as a problem with how you've written your query in general:

- The Gen2 selection for products of type SMU is done in a filter rather than in the EVALUATE to retrieve only descendants of Gen2 SMUs. You must put this into the EVALUATE and lose the filter. I.e. there shouldn't be a Product2 and a Product8, but only the latter containing the SMU constraint: Descendants([Product].[SMU], [Product].Generations(8),SELF)

- Why is there a TimePeriod4 group?

Hint:Include the "Overdue Backlog" in the analysis and filter for "Overdue Backlog" IS NOT NULL - this will constrain the data stream even more. Oftentimes contraining what's actually asked of the cube and hence cutting off anything not necessary is the best way of getting rid of such timeouts. In any case it's a best practice to only bring back data that's actually relevant for the analysis.