Working with Variable-Row-Count Adobe Report Builder Queries

SEARCH BLOG

I use Adobe Report Builder a lot. It’s getting to the point where I have to periodically reassure my wife that my relationship with the tool is purely platonic.

One of the situations I often run into with the tool is that I have a query built that will have a variable number of rows, and I then want to have a pivot table that references the data returned from that query. For instance, if I want to put start/end dates for the query in a couple of cells in Excel, and then plot time-series data, the number of rows returned will vary based on the specific start and end dates returned. This can present some challenges when it comes to getting from a raw query to a clean visualization of the returned data. Fortunately, with some crafty use of COUNTA(), pivot tables, and named ranges, none of these challenges are insurmountable.

The example I’m walking through below gets fairly involved, in that it works from a single Report Builder query all the way through the visualization of multiple sparklines (trends) and totals. I chose this example for that reason, even though there are many situations that only use one or two of the techniques described below. As noted at the end of the post, this entire exercise takes less than 10 minutes once you are comfortable with the approach, and the various techniques described are useful in their own right — just steroid-boosted when used in conjunction with each other.

The Example: Channel Breakdown of Orders

Let’s say that we want to look at a channel breakdown of orders (it would be easy enough to have this be a channel breakdown of visits, orders, revenue, and other metrics and still work with a single Report Builder query, but this post gets crazy enough with just a single metric). Our requirements:

The user (with Report Builder installed) can specify start and end dates for the report; OR the start and end dates are dynamically calculated so that the report can be scheduled and sent from within Report Builder

For each of the top 4 channels (by orders), we want a sparkline that shows the daily order amount

We want to call out the maximum and minimum daily values for orders during the period

We want to show the total orders (per channel) for the period

Basically, we want to show something that looks like this, but which will update correctly and cleanly regardless of the start and end data, and regardless of which channels wind up as the top 4 channels:

So, how do we do that?

A Single Report Builder Query

The Report Builder query for this is pretty easy. We just want to use Day and Last Touch Channel as dimensions and Orders as a metric. For the dates, we’ll use cells on the worksheet (not shown) designated as the start and end dates for the query. Pretty basic stuff, but it returns data that looks something like this:

This query goes on a worksheet that gets hidden (or even xlVeryHidden if you want to get fancy).

A Dynamic Named Range that Covers the Results

We’re going to want to make a pivot table from the results of the query. The wrinkle is that the query will have a variable number of rows depending on the start/end dates specified. So, we can’t simply highlight the range and create a pivot table. That may work with the initial range of data, but it will not cover the full set of data if the query gets updated to return more rows (and, if the query returns fewer rows, we’ll wind up with a “(blank)” value in our pivot table, which is messy).

To work around this is a two-step process:

Use the COUNTA() function to dynamically determine the number of rows in the query

Define a named range that uses that dynamic value to vary the scope of the cells included

For the first step, simply enter the following formula in a cell (this can also be entered in a named range directly, but that requires including the sheet name in the column reference):

=COUNTA($A:$A)

The COUNTA() function counts the number of non-blank cells in a range. By referring to $A:$A (or, really, A:A, would work in this case), we will get a count of the number of rows in the Report Builder query. If the query gets refreshed and the number of rows changes, the value in this cell will automatically update.

Now, let’s name that cell rowCount, because we’re going to want to refer to that cell when we make our main data range.

Now, here’s where the magic really starts to happen:

Select Formula >> Name Manager

Click New

Let’s name the new named range rawData

Enter the following formula:

=OFFSET(Sheet1!$A$1,0,0,rowCount,3)

Click OK. If you click in the formula box of the newly created range, you should see a dashed line light up around your Report Builder query.

Do you see what we did here? The OFFSET() function specifies the top left corner of the query (which will always be fixed), tells Excel to start with that cell (the “0,0” says to not move any rows or columns from that point), then specifies a height for the range equal to our count of the rows (rowCount), and a width of the range of 3, since that, too, will not vary unless we update the Report Builder query definition to add more dimensions or metrics.

IMPORTANT: Be sure to use $s to make the first parameter in the OFFSET() formula an absolute reference. There is a bug in most versions of Excel such that, if you use a non-absolute reference (i.e., Sheet1!A1), that “A1” value will pretty quickly change to some whackadoo number that is nowhere near the Report Builder data.

Make Two Pivot Tables from the Named Range

The next step is to make a couple of pivot tables using our rawData named range:

Select Insert >> Pivot Table

Enter rawData for the Table/Range

Specify where you want the pivot table to be located (if you’re working with multiple queries, you may want to put the pivot tables on a separate worksheet, but, for this example, we’re just going to put it next to the query results)

Click OK

You should now have a blank pivot table:

We’re just going to use this first pivot table to sort the channels in descending order (if you want to specify the order of the channels in a fixed manner, you can skip this step), so let’s just use Last Touch Marketing Channel for the rows and Orders for the values. We can then sort the pivot table descending by Sum of Orders. This sort criteria will persist with future refreshes of the the table. Go ahead and remove the Grand Total while you’re at it, and, if you agree that Excel’s default pivot table is hideous…go ahead and change the style. Mine now looks like this:

Tip: If your report is going to be scheduled in Report Builder, then you want to make sure the pivot table gets refreshed after the Report Builder query runs. We can (sort of) do this by right-clicking on the pivot table and select Pivot Table Options. Then, click on the Data tab and check the box next to Refresh data when opening the file.

Now, there are lots of different ways to tackle things from here on out. We’ve covered the basics of what prompted this post, but then I figured I might as well carry it all the way through to the visualization.

For the way I like to do this, we want another pivot table:

Select the initial pivot table and copy it

Paste the pivot table a few cells to the right of the initial pivot table

Add Days as an additional row value, which should make the new pivot table now look something like this:

This second pivot table is where we’ll be getting our data in the next step. In a lot of ways, it looks really similar to the initial raw data, but, by having it in a pivot table, we can now start using the power of GETPIVOTDATA() to dynamically access specific values.

Build a Clean Set of Data for Trending

So, we know the order we want our channels to appear in (descending by total orders). And, let’s say we just want to show the top 4 channels in our report. So, we know we want a “table” (not a true Excel table in this case) that is 5 columns wide (a Date column plus one column for each included channel). We don’t know exactly how many rows we’ll want in it, though, which introduces a little bit of messiness. Here’s one approach:

To the right of our second pivot table, click in a cell and enter Date. This is the heading for the first column.

In the cell immediately to the right of the Data column, enter a cell reference for the first row in the first pivot table we created. If you simply enter “=” and then click in that cell, depending on your version of Excel, a GETPIVOTDATA() formula will appear, which we don’t want. I sometimes just click in the cell immediately to the left of the cell I actually want, and then change the cell reference manually.

Repeat this for three additional columns. Ultimately, you will have something that looks like this:

Are you clear on what we’re doing here? We could just enter column headings for each channel manually, but, with this approach, if the top channels changes in a future run of the report, these headings (and the data — more to come on that) will automatically update such that the four channels included are the top 4 — in descending order — by total orders from the channel.

Now, let’s enter our dates. IF the spreadsheet is such that there is a cell with the start date specified, then enter a reference to that cell in the cell immediately below the Date heading. If not, though, then we can use a similar trick to what we did with COUNTA() at the beginning of this post. That’s the approach described below:

In the cell immediately below the Date heading, enter the following formula

=MIN($A:$A)

This formula finds the earliest date returned from the Report Builder query. If a 5-digit number gets displayed, simply select the entire column and change it to a date format.

Now, in the cell immediately below that cell, enter the following formula:

=IF(OR(N3="",N3>=MAX($A:$A)),"",N3+1)

The N3 in this formula refers to the cell immediately above the one where the formula is being entered. Essentially, this formula just says, “Add one to the date above and put that date here,” and the OR() statement makes sure that a value is returned only if the date that would be entered is within the range of the available data.

Drag the formula entered in step 2 down for as many rows as you might allow in the query. The cells the formula get added to will be blank after the date range hits the maximum date in the raw data. This is, admittedly, a little messy, as you have to determine a “max dates allowed” when deciding how many rows to drag this formula down on.

At this point, you should have a table that looks something like the following:

Now, we want to fill in the data for each of the channels. This simply requires getting one formula set up correctly, and then extending it across rows and columns:

Click in the first cell under the first channel heading and enter an “=”

Click on any (non-subtotal) value in the second pivot table created earlier. A GETPIVOTDATA() formula will appear (in Windows Excel — that won’t happen for Mac Excel, which just means you need to decipher GETPIVOTDATA() a bit, or use the formula example below and modify accordingly) that looks something like this:

That’s messy! But, if you look at it, you’ll realize that all we need to do is replace the DATE() section with a reference to the Date cell for that row, and the “Direct” value with a reference to the column heading. The trick is to lock the column with a “$” for the Date reference, and lock the row for the channel reference. That will get us something like this:

Now, we only want this formula to evaluate if there’s actually data for that day, so let’s wrap it in an IF() statement that checks the Date column for a value and only performs the GETPIVOTDATA() if a date exists:

And, finally, just to be safe (and, this will come in handy if there’s a date where there is no data for the channel), let’s wrap the entire formula in an IFERROR() such that the cell will be blank if there is an error anywhere in the formula:

Now, we’ve got a formula that we can simply extend to cover all four channel columns and all of the possible date rows:

One More Set of Named Ranges

We’re getting close to having everything we need for a dynamically updating visualization of this data. But, the last thin we need to do is define dynamic named ranges for the channel data itself.

First, we’ll need to calculate how many rows of data are in the table we built in the last step. We can calculate this based on the start and end dates that were entered in our worksheet (if that’s how it was set up), or, we can use the same approach that we took to figure out the number of rows in our main query. For the latter, we can simply count the number of number cells in the Date column using the COUNT() function (COUNTA will not work here, because it will count the cells that look blank, but that actually have a formula in them):

Again, we could simply put this formula in as the definition for trendLength rather than putting the value in a cell, but it’s easier to trace it when it’s in a cell.

For the last set of named ranges, we want to define a named range for each of the four channels we’re including. Because the specific channel may vary as data refreshes, it makes sense to simply call these something like: channel1_trend, channel2_trend, channel3_trend, channel4_trend.

We again use the OFFSET() function — this time in conjunction with the trendLength value we just calculated. For each range, we know where the first cell will always be — we know the column and where the first row is — and then the OFFSET() function will let us define how tall the range is:

Select Formulas >> Name Manager

Click New

Enter the name for the range (channel1_trend, channel2_trend, etc.)

Enter a formula like the following:

=OFFSET(Sheet1!$O$3,0,0,trendLength,1)

The “1” at the end is the width of the range, which is only one column. This is a little different from the first range we created, which was 3 columns wide.

Click OK

Repeat steps 2 through 5 for each of the four channels, simply updating the cell reference in the OFFSET() function for each range ($P:$3, $Q:$3, etc.) (named ranges can be created with a macro; depending on how many and how involved I need to create, I sometimes write a macro rather than creating these one-by-one; but, even creating them one-by-one is worth it, in my experience).

Now, we’re ready to actually create our visualization of the data.

The Easy Part: Creating the Visualization

On a new worksheet, set up a basic structure (typically, I would actually have many width=1 columns, as described in this post, but, for the sake of keeping things simple here, I’m using variable-width columns).

Then, it’s just a matter of filling in the rows:

For the channel, enter a formula that references the first pivot table (similar to how we created the column headings for the last table we created on the background sheet)

For the sparkline, select Insert >> Line and enter channel1_trend, channel2_trend, etc.

For the total, use GETPIVOTDATA() to look up the total for the channel from the first pivot table — similar to what we did when looking up the daily detail for each channel:

=GETPIVOTDATA("Orders",Sheet1!$H$2,"Last Touch Marketing Channel",B3)

The B3 reference points to the cell with the channel name in it. Slick, right?

For the maximum value, simply use the MAX() function with channel1_trend, channel2_trend, etc.:

=MAX(channel1_trend)

For the minimum value, simply use the MIN() function with channel1_trend, channel2_trend, etc.:

=MIN(channel2_trend)

When you’re done, you should have a visual that looks something like this:

Obviously, the MIN() and MAX() are just two possibilities, you could also use AVERAGE() or STDEV() or any of a range of other functions. And, there’s no requirement that the trend be a sparkline. It could just as easily be a single chart with all channels on it, or individual charts for each channel.

More importantly, whenever you refresh the Report Builder query, a simple Data >> Refresh All (or a re-opening of the workbook) will refresh the visualization.

Some Parting Thoughts

Hopefully, this doesn’t seem overwhelming. Once you’re well-versed in the underlying mechanics, creating something like this — or something similar — can be done in less than 10 minutes. It’s robust, and is a one-time setup that can then not only let the basic visualization (report, dashboard, etc.) be fully automated, but also provides an underlying structure that can be extended to quickly augment the initial report. For instance, adding an average for each channel, or even providing how the last point in the range compares to the average.

A consolidated list of the Excel functionality and concepts that were applied in this post:

Dynamic named ranges using COUNTA, COUNT, and OFFSET()

Using named ranges as the source for both pivot tables and sparklines

Using GETPIVOTDATA() with the “$” to quickly populate an entire table of data

Using IF() and IFERROR() to ensure values that should remain blank do remain blank

Each of these concepts is powerful in its own right. They become triply so when combined with each other!

Tim has moved on from Analytics Demystified effective 12/31/2017 but his content lives on. If you have questions for Tim please send them to eric@analyticsdemystified.com directly and they will get routed.