Look’s like Star Schema is back on the menu!

As the release date for Excel 2013 gets nearer (in fact Home Premium has already arrived) the pricing and package offering are becoming clear(er). For many of us the emerging picture is not pretty, mainly due to a u-turn on Microsoft’s part i.e. removing the PowerPivot add-in as an option for most low-end Office 365 packages and ALL retail versions.

Rob Collie goes into the likely back story (and politics) behind such an obviously (well to those of us at the Excel coal face) stupid move. MS not only led some of her most ardent evangelists up the proverbial “garden path” with PowerPivot under Excel 2010 (never before has a major feature been so restricted in a subsequent version) but also gave a false and misleading impression of the new Office 365 packages during the current trial period (i.e. where all versions have the PP add-in (and PowerView!) as an option). Being lied to is the expression the keeps coming to mind.

There is, however, a silver lining, Excel 2010’s Data Model. If I had never seen the power of DAX Measures (which is, in essence, what lack of the add-in deprives you of) I would be completely bowled over by the data handling power of the Data Model (it is, in effect, the Vertipaq Columnar Engine with all its power and glory – minus the ability to create (but can consume) explicit DAX measures & columns ). But I have, so hence my disappointment 😦

But even without the add-in, the Data Model opens up a whole world of data munging potential.

At its simplest, it makes “VLookup hell” a thing of the past. This alone, as any who have laboured under the tyranny of VLookup (and her cousins Index & Match) will agree, is a huge reason to upgrade to Excel 2013.

Also, all versions of Excel (OK, not all: RT and basic Web Apps do not support the DataModel) can consume a DAX-rich model produced by somebody else who has access to the add-in. Now before you get too excited, models produced by the free and widely available 2010 version must still be upgraded by the 2013 version before they can be used by 2013’s Data Model. UPDATE: Previous statement not quite true, it is possible to transfer the binary data file holding the model from 2010 to 2013 (but must be a V2 generated model, V1s or V2s upgraded from V1, will not work). Have done a POC and will likely add this facility as a public command to HAMMER in the near future.

And there’s more! You also use DAX Queries to generate tabular output from a plain Data Model, see Chris Webb’s example. Again, a PP-add-in-free-zone

But why the title, why are Star Schema’s back on the menu and were they ever off the menu?

In this post (and the related posts linked to within) I’ve argued that the strict “laws and practices” of dimensional modelling (as popularised by Kimball et al.) can be relaxed (and often ignored). The world that spawned such ‘best practices’ was not a world of DAX powered, or even MDX powered, data models, it was one where the RDBMS’s version of a hyper-cube – the start schema – ruled.

The only tool was SQL (with its limited aggregate functions) and the single-issue star-schema was the only viable data structure. But sophisticated summary tables techniques (requiring an army of DBAs) and new RDBMS optimizers were needed to get a reasonable response time. Plus, much of the work (and most of the cost) was in the ETL phase. That’s the world I cut my BI teeth in. How we would have wished for a modern columnar engine! And now, 20 or so years later, such a thing exists, in a spreadsheet! So dust off your Kimball Dimensional modelling books, the plain-old ETL’d star-schema is back in town!

But what about the ETL required to build such pure stars? Again, add-in-free DAX Query to the rescue! Like this, to create a Time Dimension. Dimensional data could easily be enhanced (e.g. adding ABC type attributes) using similar techniques.

Fact data could be more of a problem (due to the 1M row limit of an Excel table – the output medium of Excel-based DAX Queries). But again, if you have access to even the most basic of relational databases (like the inbuilt SQLite engine in HAMMER) even the shaping of very large fact tables can be handled. For more demanding requirements maybe use AWS’s Data Pipeline “ETL-in-the-cloud”?

Anybody with experience of using pure stars will of course be aware of the downside i.e. the restrictive nature of the end result. Such data models tend to deal with a limited number of “questions” especially compared to DAX-destined data models (which can be very close to their OLTP sources in structure).

But that can also be an upside, depending on the likely end-users of the “cube”. The simple single-issue pure star-schema is often better suited to an environment where most are “consumers” rather than “explorers” of data.

In any case, it would be nice to have the option to service both end-user types. So please Microsoft, at the very minimum, revert back to the 2010 scenario i.e. PowerPivot as downloadable add-in. If you do, the phrase “Oh, it’s just Excel” will for the history books.

It is unlikely that anyone capable of “data munging” will not acquire the PP-enabled version of Excel 2013. It is the Excel pros that have partially learned DAX and the future ones that have not that will be burned by this decision by Microsoft. Remember the bully that would take a book of matches and place it between the toes of someone and light it? I’ve never heard of a bully that was stupid enough to do that to himself.

I know of a lot individuals in very small professional firms (accountants, economists, other specialists) who are very capable of DAX like data analysis and who’ve pushed PP within their orgs. They have now, in many cases, been left with egg on their faces as many of these companies are simply not ready to commit to the “cloud” and wish to stick to retail versions of Office. It’s one thing to get your boss to upgrade a version of Office another thing entirely to move to something very “new” like O365. It’ll be a long time before many of these “evangelists” trust MS again.

Let alone all the organisations who are not allowed to put their sensitive data into the cloud, such as governments. Luckily though, those organisations usually also take their time to upgrade, so they will probably stay on 2010 for quite a while.

No, I know exactly what o365 is, I’ve trialed nearly every version and have actually subscribed to the Home Office Premium version. O365 is excellent, it really is, and in any case is the way forward for Office whether we like it or not. It’s not really the problem area although it’s still difficult for SME’s to get hold of full PP functionality.

The real problem is with the retail versions, there’s NO way for SMEs who wish to continue buying via retail to get access to full PP.

Just to make sure: 365 allows me do dowload/install from the web, and then use it locally, without constant internet, and I can keep all my (sensitive) data on my own machines (i.e. not in the cloud), without hassle?
Then the only drawbacks are
1) deployment issues and
2) yearly instead of one-time fee?

Then for me personally, it doesn’t suck quite so bad, but of course, being that my current clients are government-like organisations, it still is very bad for them and my business.