Power Query and Fixed-width M(agic)!

Intro by Avi: We know that Power Pivot and Power Query are extremely useful tools in the Finance and Accounting world. Today we have a guest post by Darren Leitzke outlining a very useful scenario and sharing some good tips and tricks. Take it away Darren…

Working in accounting reminds me of all of those movies that take place 40 years later, but when that year comes around, the movie grossly exaggerated the technological advances of the future. Likewise, I thought that once I made it to a large corporation that they would have all of the cool tools: the newest version of every software, the most up-to-date computers, the fastest databases, and of course more help desk support than I could ever use. When I worked behind the scenes on-campus, I thought to myself “This is a Public University. Of course, it is going to have older software.”

You want me to use Excel 2003?

When I got to a large corporation, I thought that using Excel was just for the departments that were “behind”. It took awhile to realize that it was the most valuable tool I would use in my career. Excel skills weren’t just a nice-to-have, they were a need-to-have and the more that I learned about Excel, the more valuable I felt in the organization. Give me a wireframe and some time and I could create an application that turned data into valuable information for the managers running the business — and it didn’t take me months. Programming was now the nice-to-have.

Excel was still lacking some capabilities, so I was very excited to hear about Power Pivot. That led me to this site and this site led me to Power Query. Power Query is a game-changer and has saved me from hours of repetitive copying and pasting as well as macro recording and writing. But you may not be using it because you’re thinking, “Power Query is only for clean databases or table data… We don’t even have CSVs yet!”

Door #1 or Door #2?

You have two options when dealing with legacy reporting:

Get IT to help you with the source data and figure out how to re-create the information

Get Power Query and turn your text file into a Pivot dream

This post is going to address Option 2 for anyone that has fixed-width legacy reports lying around the organization. (Either way, you probably need IT, so be nice to those guys!)

Enough chit chat… Let’s light this candle!

1…2…3…4…Don’t want manual work no more!

You need to start by opening the text file in Notepad (or some other text editor) and count the spacing between words for the columns. For example, in the following sentence, you would count 0, 10, 10…

Darren Wrote This post

There are 10 characters from the start of Darren to the start of Wrote and there are 10 characters from the start of Wrote to the start of This post.

We can use Power Query to clean up the data, so you could also do something like 8, 10 (as long as you put the splits on white space instead of in the middle of data). For the sake of this example, I already did the counting for you (just think of it as a very, very early Christmas present).

The columns for this example should be split at:

0, 13, 32, 60, 74.

Interested in Learning How to Do this Kind of Thing?

Our New Online Power Query Course just launched this year. You’ll learn Power Query from the best in the business, two Excel MVPs Ken Puls and Miguel Escobar. We’ve included 7 Hours of Self-Paced Video over 31 Courses, with lots of example files, and a Digital Badge powered by Acclaim.

Opening the source

We will start with opening Power BI Desktop, choosing my sample file (you’ll need to extract it from the zip archive because the server won’t let me upload a PRN file), and editing the data in the Query Editor

Open Power BI Desktop and choose Get Data

Choose More to get to the text options

Choose Text and then Connect

Open the file and choose Edit

All of that work will be worth it soon…

Next, we’ll split the columns using the wizard and the Advanced Editor. This is probably the most value-added feature that we’ll look at, especially if you have more than a few columns.

Choose Split Column and then By Number of Characters

The first one we will do with the wizard and then I will show you a fancy trick to split as many columns as you want and rename the headers at the same time. Enter in 13, choose Once, as far left as possible, and choose OK.

Power Query automatically tries to figure out the type which is really handy in most cases, but not this case.

Choose the X next to Changed Type to remove that step

Advanced Mode… Activate!

This is the fun part that will save you a lot of mouse clicks in the future.

Open the Advanced Editor

Find the line called #”Split Column by Position” and copy the entire line

Now, close the Advanced Editor by choosing Done and you should have 5 split columns with the headers that we want!

Totally worth it!

You just learned how to split columns by number of characters, rename multiple columns all at once, use the advance editor, and you’ll never have to use that Text to Columns wizard again!

Let’s keep going!

One of the main annoyances of fixed-width files is that they repeat the header information. Really useful for static reports, not so useful for data. We’ll take care of that now by adding a Custom Column and revisiting the Advanced Editor.

Go to the Add Column tab and choose Add Custom Column

In the Add Custom Column window, we are going to use the following code:

This code in plain English says, “If the value in the Account column beings with “Division:” then return any text after the first space. Otherwise, return a null.”

Here is the result:

You can use the same logic with the Page or the Location, you just need to change where it says

Text.StartsWith([Account],”Division:”)

To

Text.StartsWith([Account],”Location:“)

Here are the results (if you go this route, you might want to change the column header):

Three questions might be popping into your head:

What if there isn’t a space? Use a colon or some other character (or group of characters) instead.

if Text.StartsWith([Account],”Division:”) = true then Text.Range([Account], Text.PositionOf([Account],”put the symbol here and keep the double-quotes“)) else null

Why nulls? Because of the next step which is really powerful.

This seems like a lot of work. Is it really worth it? If you ever have to do this report twice, it is worth it. You’re also learning a lot of skills that may be valuable to you in the future. A good data monkey is good to have anywhere.

Time to use the null

As far as I can tell, you need a null to use Fill Down. Fortunately, we took care of that with the last step. Next, we will Fill Down the Division.

Go to the Transform tab, choose Fill and then Down

This transformation will fill in the division to the bottom of the data. Here are the results:

Get rid of useless data

It’s time to finally remove all of the formatting and whitespace. Luckily, there is an account number next to every transaction that starts with 1. We will use that to filter out the data.

Choose the Filter button next to Account, Text Filters, and then Begins With…

Enter 1 and choose OK

Here are the (beautiful) results:

Last steps before analysis. You’re almost there!

Now, we just need to put on the finishing touches. After changing the types and combining the debits/credits, we should be ready for analysis

.

Select both the Debit and Credit columns. Choose Data Type: Text and then choose Decimal Number.

Go to the Add Column tab, choose Standard, and then Add. You’ll have a new column called Sum. Feel free to delete the Debit and Credit columns.

You can multiply the new Sum column by -1, so that Sales Revenue is positive (instead of a negative to represent the credit balance).

Go to the Transform tab, select the Sum column, choose Standard, and then Multiply.

Type in -1 and click OK

Sweet, sweet victory!

You made it! Close & Apply from the Home tab.

Well… Almost!

If you want to move up in your analytical career, you should learn this mantra: Reporting the numbers is not enough. After you get that fancy, beautiful report and walk up to the manager to say, “Look how pretty this is!” The first thing he or she will probably ask is, “OK — What does it tell me?”

Be ready to answer that question BEFORE you ever show someone the report. You don’t have to find some amazing insight that no one has ever thought of before, but you should at least have one or two basic results that the report lets you know about.

The good news is that all of our book categories have revenue.

The bad news is that the random number generator I used to generate the revenue and returns shows that more books were returned than sold! For even more credit with your manager, try to come up with a couple of suggestions (but that’s for another guest post).

Final Tips

Power BI Desktop and Power Query have the same advanced editor (and engine). If you ever need to reuse this code, just go into the advanced editor, copy the entire code, and paste it into the Power Query advanced editor.

Copy and paste the code from the advanced editor into OneNote, a text editor, or maybe even GitHub! This will allow you to share the code and create a backup incase the file gets corrupted.

Instead of saving this report out of your legacy system every time you want to use the report, work with IT to have it automatically saved in a network folder as a Month-to-Date or Quarter-to-Date file. That way, you can look at trends in the data just by refreshing!

If you have too much white space before or after the data, you can use Format > Trim in Power Query from the Transform tab to clean it up.

The split column by position is very useful to be able to do in one instruction, however, this only worked if the splits increased (as they do in your example) when I used a split that decreased it did not work – i.e. I used 0, 15, 20, 9, 16 – the query fell over due to the split 9.
Do you know if there is a way to overcome this?

Great info on using multiple columns in the splitter function. One question for that:

Does anyone know if it is possible to import just a few columns / fixed widths ranges with room in between them?

We have some text files that are 6000 or more characters wide and a few million long. They store a few hundred rows.
We currently use a tool that is outdated but can do stuff like “import positions 12:20 as customer, 21:28 as country, 34:40 as employee, 240:246 as amount, etc) without caring about the data between those specified columns. While Power Query always gets the whole line first and then splits them up continously, it would be better to just choose the ones you need.

It would be possible to write the few hundred rows down but when you only need a handful, this seems like wasted time.

1) Do the splits as usual and remove the extra columns
2) Split out the first column using a text extraction function (such as MID() in Excel)) and just change the starting point/number of characters for each new column

Last steps before analysis doesn’t appear to work anymore. The Standard Add step creates = Table.AddColumn(#”Changed Type”, “Inserted Addition”, each [Debit] + [Credit], type number) with no Sum column and only returns a column of null.

I have a text file that I am trying to apply your query. It all works except that my adding custom column part that goes to specified column and test for specific text and returns anything after first space in that column. Well it happens that my specific text begins with a space so when I run the return anything after space it returns the entire text but I only need the text after the specified text. Any assistance is appreciated.
Sample: Period to date for APR-17…there is a space in front of “Period ” so when query instructs to return all text after first space if returns “Period to date for APR-17” but I only need APR-17.