Description

Christian Wade joins Scott Hanselman to show you how to unlock petabyte-scale datasets in Azure with a way that was not previously possible. Learn how to use the aggregations feature in Power BI to enable interactive analysis over big data.

The Discussion

kbaig

Hi Guys,

Thanks for the great demo and a great feature. Queries that are not cached are getting processed by spark as mentioned but can you share more details around how a 23 node spark cluster fits into this eco-system ?

@kbaig:thanks for the feedback. The spark cluster is optional. From the Power BI side it works the same way if it's HDI Spark, Azure SQL Data Warehouse, DataBricks and various other sources in Azure (that support DirectQuery). The setup and optimization of these systems is dependent on the system itself and is standard for query perf tuning on that system - there is nothing special about setting up/query optimizing these systems that is different when using aggregations

Christian! You are brilliant we just need to figure out how to travel to Mars and back combining all of NASA data. I can setup that appointment if need be as I know a few smart people there. All the best! I will be using this for a few of our companies. Ezra Gabay

re: Spark query. In order for this query to complete in reasonable time over big data, the data has to be partitioned. But there are limited way you can partition data in Spark (not more than 100 partitions).