Microsoft Power BI, Analysis Services, MDX, DAX, M, Power Pivot and Power Query

If there’s one feature of Power Query that’s guaranteed to get Excel users very, very excited indeed it’s the ability to combine data from multiple workbooks into a single table. The bad news is that this is something that Power Query can’t do through the user interface (although so many people have asked for it I wouldn’t be surprised if it gets added to the product soon) and it’s not obvious how to do it.

This is a topic that has been blogged about many times over the past year or so (see DutchDataDude, Mike Alexander, Ken Puls, Miguel Escobar – apologies to anyone I’ve missed) so why should I write about it? Well, all these other posts show you the steps you have to go through to build your own function and then use that function inside a query, which is fine, but it involves a lot of clicking and typing code each time you want to do it. It’s all very time-consuming if you don’t know Power Query that well, though, and not something a regular Excel user would want to do. I’ve got an easier way though: a generic function that can combine data from workbooks in any folder you point it at. Once you’ve created it it’s very easy for anyone to use, can be reused over and over, and of course you can share this function through the Power BI Data Catalog if you have a Power BI for Office 365 subscription.

Steps to add the Power Query function to your workbook

You can either follow the steps below to add the function to your workbook, or instead just download the sample workbook containing the function here – which is a lot quicker!

1) Copy the following code onto the clipboard

//Define function parameters
(#"Directory containing Excel files to combine" as text,
optional #"Name of each Excel object to combine" as text,
optional #"Use first rows as headers" as logical) =>
let
//If the optional Excel object name parameter is not set, then default to Sheet1
ExcelName = if #"Name of each Excel object to combine" = null
then "Sheet1"
else #"Name of each Excel object to combine",
//If the optional Use first rows as headers parameter is not set, then default to true
UseFirstRowsAsHeaders = if #"Use first rows as headers"= null
then true
else #"Use first rows as headers",
//Get a list of all the files in the folder specified
Source = Folder.Files(#"Directory containing Excel files to combine"),
//Filter these to only get Excel files
OnlyGetExcelFiles = Table.SelectRows(Source,
each ([Extension] = ".xlsx")
or ([Extension] = ".xls")),
//Find the full path of each file
FullPath = Table.CombineColumns(
OnlyGetExcelFiles ,
{"Folder Path", "Name"},
Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
//Get a list containing each file path
ExcelFiles = Table.Column(FullPath, "Merged"),
//Define a function to get the data from the specified name in each Excel workbook
GetExcelContents = (FileName as text) =>
let
//Connect to the workbook
Source = Excel.Workbook(File.Contents(FileName), UseFirstRowsAsHeaders),
//Get a table of data from the name specified
//If the name doesn't exist catch the error and return null
ExcelData = try Source{[Item=ExcelName]}[Data]
otherwise try Source{[Name=ExcelName]}[Data]
otherwise null
in
ExcelData,
//Call the above function for each Excel file
ReadAllWorkbooks = List.Transform(ExcelFiles, each GetExcelContents(_)),
//Remove any null values resulting from errors
IgnoreNulls = List.RemoveNulls(ReadAllWorkbooks),
//Combine the data from each workbook into a single table
CombineData = Table.Combine(IgnoreNulls)
in
CombineData

2) Open Excel and go to the Power Query tab on the ribbon. Click on the From Other Sources button and then click Blank Query.

3) The Power Query query editor window will open. Go to the View tab and click on the Advanced Editor button.

4) The Advanced Editor will open. Delete all the code in the main textbox and replace it with the code above. Click OK to close the Advanced Editor.

5) In the Query Settings pane on the right-hand side of the Query Editor, change the name of the query to CombineExcel, then go to the Home tab on the ribbon and click the Close & Load button. The Query Editor will close.

6) You can now see your function in the Workbook Queries pane in Excel! It should look like this:

Using the function to combine data from multiple workbooks

To use the function, double-click on it in the Workbook Queries pane or right-click and select Invoke. The following dialog will appear:

You can enter three parameters here:

The path of the directory containing the Excel workbooks that you want to read data from. The function can read from xlsx and xls files (though for the latter to work you need the Access 2010 engine installed, a free download if you only have Excel 2013) and will ignore any other files in the folder. The function will also read any Excel files in any subfolders.

Where you want to get data from in each Excel workbook. This can be the name of a worksheet (for example you could enter Sheet2 here) or a named range or a table. It’s an optional parameter so if you leave it blank it will get data from the worksheet Sheet1. If a workbook doesn’t contain the name you enter here it will be ignored. If the format of the data in each worksheet is not consistent (for example if you have different column names) then be warned: you may get some strange results.

Whether data on your worksheet (if you’re getting data from a worksheet) contains headers. Enter true here if your data does have a header row in every worksheet; false otherwise. This is also an optional parameter and if you leave this box empty the default value is true.

When you click OK, a new Power Query query will be created, the Query Editor window will open and you’ll see all the data from all of the Excel workbooks combined. The first step of this query is a call to the CombineExcel() function and you can carry on working with your data in Power Query as normal.

Disclaimer: I’ve done a reasonable amount of testing on this and I’m pretty sure it works well, but of course there will be bugs. Please leave a comment if you find a bug or can suggest any other improvements.

40 responses

In terms of reusable processes and automation – is there some kind of template that can be created and reused from client to client with Power View’s? Say we want to offer some PowerViews of typical indicators (top 10 customers, gross profit by region, etc), would they have to be built every time? I know we can reuse the DAX/MDX query when connecting to a particular cube in PowerPivot.

Hey, Chris
Thanks so much for this great post. Very strait forward way of handling functions.
If there’s one feature of Power Query that’s guaranteed to get Excel users very, very excited indeed it’s the ability to combine data from multiple workbooks into a single table.

I would love to see Power Query being able to call VBA functions as well. Sometimes we don´t have time to port some VBAs functions to the M language. Therefore, being able to call the functions directly from Power Query would be really very very exciting as well… What do you think Chriss?

Here is my take on it that I developed some time ago. My approach dynamically gets the sheet names from the files – so you could have each file with different number of sheets and named differently. It also takes care of .xlsx/.xlsb/.xls file formats

First Define a Function – GetPath- This gets the Path from a cell named File_Path on a sheet called SET_PARAMETERS

I saw this blog when you first posted it and made a note for future reference. Today I had a need – WOW! this is a fabulous, easy to use solution. I have now saved a master Excel template with this code loaded. In future when I need to do this, I will just start with that Workbook rather than doing the whole copy/paste thing again.

Is it possible to combine the data models between worksheets? I have 2 worksheets, one with 24 million, one with 5 million rows loaded into their respective data models through Power Query. I need to merge/join these on a column. I was hoping not to have to re-download 5 million rows since it is already sitting in Power Pivot. Any ideas on this?

I suppose I can live with that for now. Then, a more serious concern is how refreshing this much data will work. I’m pretty sure it always refreshes the entire data set, right? It’d be great if it could just load changes/new records.

Chris, is there a way that you can pass a more meaningful prompt to the user when you invoke the function? Ie because the input parameters are set as text, the prompt says ‘Example: ABC’. I would like to change this default prompt to something more meaningful. Is this possible?

Chris,
I’ve been using this function since you published it, with great success. One thing I would like to add for debugging is the ability to add a column which contains the file path of the source file as it is combined. Is that possible using this approach?

Also….
Often the raw source files that I want to combine have variations in sheet names (I don’t control file creation), but only one sheet in the file. In the combine function, is there a way to specify the first sheet in the file, as opposed to a default sheet name “Sheet1”?

Here’s a workbook containing a customised version of the function that (should) default to the first sheet in the worksheet if you don’t specify a name, and also adds the name of the file the data was taken from: http://1drv.ms/1MPn39d

I’m newbie, please excuse me for stupid question. Is it possible to manipulate data (for example transpose it) before combining it?
When I work on file and use Source = Excel.CurrentWorkbook(){[Name=”Table1″]}[Content], it’s work. But when I try to use same manipulation for all files in folder, I failed to change code.

Chris – this is awesome. I am using this to load ancient xls workbooks from a client still using Excel 2003. It worked a treat – just had to change the file extension parameter to xls.
Note that tables don’t work – you need to use sheet names instead. Also you need to ensure that the Power BI Desktop, Office 365 (or whatever version) and Access Database Engine are all the same bit – either 32 bit or 64 bit. You can test this by using Get Data and loading an xls workbook before trying this function.

A couple of observations on the function Excel.Workbook(): it can’t read Hidden or xlVeryHidden worksheets. However, it can read ranges that exist on a hidden sheet – a possible workaround. It also only reads ranges that are more than one cell, i.e. single cell ranges are ignored.

Hi Chris – this is great! A colleague of mine discovered your post about 2 years ago and we’ve been using this “magic formula” quite a lot!
I’ve come to face an issue to which you’ll probably answer “why would you want to do that?”. It would be a fair question, but please ignore it and help me out with the solution (if there’s one, that is!) 🙂

Is it possible to ask the query to check some subfolders only, and not all of them? For example, add a line to the function saying “only look into folders that contain “xxxxx” in their folder name”.

Yes, it would be possible – but there’s no need to use this function now that the latest versions of Power BI and Power Query allow you to combine data from multiple Excel workbooks out of the box. Since this is the case, you just need to use the From Folder data source, apply whatever filter you need to get your list of Excel workbooks, then click the Combine button.

Follow Blog via Email

Social

Need some help?

As well as being a blogger, I'm an independent consultant specialising in Analysis Services, MDX, DAX, Power BI, Power Query and Power Pivot. I work with customers from all round the world solving design problems, performance tuning queries and delivering training courses, and I am happy to work on short-term engagements. For more details see http://www.crossjoin.co.uk