Power Query, Power BI And The “Allow Data Preview To Download In The Background” Option

Recently I was asked by a customer to do some tuning on an Excel workbook with a lot of Power Query queries in it. Although all of the data used in the queries was coming from tables in the workbook itself and the data volumes were small, there were fifty Power Query queries and clicking Refresh All resulted in a large, prolonged spike in CPU and memory usage by Excel.

Only a small number of these fifty queries were being loaded into the workbook and none were being loaded into the Excel Data Model. The queries that were being loaded into the workbook were referencing several other queries that in turn referenced several other queries, and indeed there were some reference chains that were over ten queries long. To give you an idea of the complexity here’s what the Query Dependencies view looked like:

I’m a big fan of using references to split complex logic up into separate queries, and in this case it was absolutely the right thing to do because otherwise the workbook would have been unmaintainable. That said, there was clearly something going wrong with the refresh in this case.

On further investigation I found that if I individually refreshed the small number of queries that actually loaded data into the workbook, they all refreshed very quickly and with none of the ill-effects seen with a Refresh All. So if it wasn’t the queries that were being loaded into the workbook, what was the problem? It turns out it was the queries that weren’t being loaded into the workbook.

Both Power Query and Power BI load previews of the data returned by a query for display in the Query Editor; clicking Refresh All in the workbook was obviously triggering a refresh of these previews and this was what was using all the memory and CPU. I found that to prevent this happening I had to use an option that was introduced in Power BI in January 2016 and is also now present in Power Query/Get & Transform in Excel: Allow Data Preview To Download In The Background.

You can find this option in Excel by going to the Query Options dialog:

…and then going to Current Workbook/Data Load. If you then deselect “Allow data preview to download in the background”:

…you will find that the background refresh of query previews stops.

In my case, after I had done this – and with a bit of other tuning using Table.Buffer() – the workbook refreshed very quickly indeed and there was no spike in CPU or memory after a Refresh All.

Other people have run into the same problem in Excel and also in Power BI Desktop (see here and here), so it looks like this is an important property to change if you have a large number of queries in a single workbook or pbix file.

18 responses

Thank you for the hint Re the background processing. Power Query is really cool, but when it comes to cascading more complex data processing logic it sometimes may become a nightmare – compared having the job done in SQL 😉

Further Tuning Options:
Turning off privacy check on the workbook or even global reduced in my case the processing time by more than 50%. To avoid unnecessary I/O with a source like Salesforce Objects (non-SQL), table buffering is not of big advantage, because there is no query folding at all (I guess). But you may store first the extracted source data with Power Query in a second Excel Workbook and then do the rest in your primary workbook referring the data in the “data” workbook.
Unfortunately I couldn’t figure out yet, how to refer from Power Query to data, which is stored in the same Excel workbook itself.

Wondering Question: how did you visualize the dependencies within Excel Power Query.
It would be cool having a similar visualization feature like in Power BI.

I *think* Power Query gets data from Salesforce through some kind of OData connection, which means there *could* be some query folding going on here. Certainly if you find turning off the privacy check speeds things up that suggests that doing so allows query folding to take place. However you have to be extremely careful turning off the privacy check because it can allow some of your data to be sent to external data sources as a side-effect of query folding – a breach of privacy and possibly illegal.

It’s easy to get Power Query to get data from the current workbook: you can use the From Table button on the Excel ribbon to get data from an Excel table, and you can also read data from named ranges (see https://blog.crossjoin.co.uk/2014/07/22/working-with-excel-named-ranges-in-power-query/ ). You can view query dependencies by clicking on the Query Dependencies button on the View tab in the Query Editor and that’s present in both Power BI Desktop and Power Query.

Just Perfect. I’ve really haven’t seen these two so important buttons. Thank you very much !!!

The best overall performance with Salesforce source data, I get, when I load the initial source data with some minor data preparation into an Excel worksheet first. The more complex transformations into the Power Pivot target model are sourced from these embedded Excel worksheets. This helps to avoid, that Salesforce data is loaded and loaded over again thru the different target objects, which refer to the same underlying source data (i.e. record type lookup, customer contacts ).

Regarding the privacy option, I don’t want to do something illegal – I’m not a criminal 😉
But in the context with the client solution, the client is accessing his own CRM data of the Salesforce cloud instance. If Salesforce itself is tracking these queries by its client users, there I can’t see a problem. But it could be, that in the future, some other client data sources (i.e. finance, HR application) are integrated in the Power Query Model.
The queries running against Salesforce objects only relate to Salesforce objects and may contain some parameter information from a static power query table. Before I combine this data with potential other source data, the table is buffered and/or read from the intermediate Excel worksheets (see above).

Question: Is there still a risk or aspect of being “illegal”, that in this scenario still some of this other data is then sent to Salesforce ?

If you are always buffering data in Excel and never getting data direct from Salesforce then I guess no laws will be broken, unless there are regulations that state that some kind of data must always be kept in an encrypted form. I’m not a lawyer though!

Hi Chris,
Having made the change to the Background Data setting you have to Save, exit and reload the Application before the new setting takes effect.
I have a .pbix with 125 Objects, between Queries, Parameters (lots of them) and User defined functions. None of them load to the workbook. Excel had ground to a halt and I was toggling the Background Data setting with no impact. Ken Puls got me an introduction to someone in the Power Query team who put me straight.
There is still a heavy system load and I have looked at Table.Buffer … but I have not a clear vision of where or how to deploy it properly. I have since migrated the App to Power BI Desktop as new developments are available there a couple of months before they hit Excel … and the irony is that Excel 2013 has the changes before the in-built version in 2016!
In my case ALL of the input is coming from Excel Workbooks. There is a client data file … with 12+ worksheets from which input is taken. I do an initial table load and subsequently refer to the staging query for ongoing manipulation and merg ing. All of the tables have less than 10 rows of data. Of course the name and location of this client workbook is read from one of the Power Query parameters!
I have a recollection of reading thatTable.Buffer was ineffective with Excel files.
Would be interested in your experience with the function.
Keep up the Sunday night / Monday blog!

Did you find you had to save/exit/reopen to get this to work in Power BI Desktop? I did not have to do this in Excel 2016.

The speed at which you get new features in Excel 2016 depends on the way you installed Excel 2016. I’ve got the Office 365 click-to-run version and I’m also on the “First Release for Current Channel” track which means I get the new features as soon as they appear. If you’re on a different track or if you’ve installed Excel in the traditional way then yes, you have to wait for the new features 😦

Table.Buffer() can be useful with any data source, and it certainly had an impact in this case with all of the data coming from the same Excel workbook. It’s hard to say exactly when you should use it, but I often find that it has a big impact if you use it before a Merge or a complex calculation (see https://blog.crossjoin.co.uk/2015/05/05/improving-power-query-calculation-performance-with-list-buffer/ for example). If the first step of most of your queries is just a reference to the staging query, try wrapping the name of the staging query in that first step with Table.Buffer().

I have a fairly simple model. I load data from the top 250 movie list from IMDB. I then use a web service to get more information about the movies. This gives me a base query with 250 rows and about 15 columns. Since some columns contain concatenated values. For example, the genre column can contain all the genres a movie belongs to: “action, drama, romance”. Since this is a basic M2M relationship, I create a bridge table (unpivoted values for genre for each movie) and a genre query with the distinct list of genres.

In the queries for the bridge tables, I reference my original query. (in total this query gets references 4 times). In the queries for the “dimensions”, I reference my bridge table and just do a distinct.

This all seems to load quite slow. Probably because of all the references to the source query, which apparently results in way too many calls to the web service. As in your reply, I tried to solve this by using Table.Buffer() in the first step of the referenced query.

The good news: the data seems to load more quickly.
The bad news: when I hit refresh, all queries take a very long time “Evaluating…”

Do you have any idea why PBI Desktop takes such a long time just to evaluate queries?

ps: I tried to solve this by moving Table.Buffer to the end of the source query. The result is the same: long time evaluating, faster data load.

It’s hard to say. Power Query will hit all web services/sites at least twice, even if buffering is working correctly (it’s a feature…). If you have four queries then that means your original query will be hit eight times. Also many web services and sites have throttling in place to stop people scraping data, so it could be that you are being throttled here; the Function.InvokeAfter() function can be useful to introduce a delay in between requests and therefore avoid the throttling: https://blog.crossjoin.co.uk/2015/04/30/using-function-invokeafter-in-power-query/

If I read this correctly, the web service might be hit 10 times (one source query + 4 referenced). Since the web service is called for each row, this results in 2500 calls. Maybe one minute refresh time is not that bad after all 😀

Hi Chris,
Only when you open the query in the Query Editor? Does that mean it will slow down a little for the “builders” of the queries when they are building/debugging the queries? While the refreshment (if the query is loaded to table) will be unaffected?
Thanks

Follow Blog via Email

Social

Need some help?

As well as being a blogger, I'm an independent consultant specialising in Analysis Services, MDX, DAX, Power BI, Power Query and Power Pivot. I work with customers from all round the world solving design problems, performance tuning queries and delivering training courses, and I am happy to work on short-term engagements. For more details see http://www.crossjoin.co.uk