Gilligan on Data by Tim Wilsonhttp://www.gilliganondata.com
Thoughts, musings, and, hopefully, not too many redundancies on the world of business data. If you missed the irony in the previous sentence, you may struggle with my writing style.Sat, 04 Oct 2014 18:53:20 +0000en-UShourly1http://wordpress.org/?v=4.2.3http://www.gilliganondata.com
http://www.gilliganondata.com/favicon.icoGilligan on Data by Tim WilsonGilliganOnDatahttps://feedburner.google.com5 Reasons Columbus is Great for Digital Analytics (which you can use to justify your trip to ACCELERATE)http://feedproxy.google.com/~r/GilliganOnData/~3/P0RH84vzyqY/
http://www.gilliganondata.com/index.php/2013/06/25/5-reasons-columbus-is-great-for-digital-analytics-which-you-can-use-to-justify-your-trip-to-accelerate/#commentsWed, 26 Jun 2013 03:47:53 +0000http://www.gilliganondata.com/?p=2205I’m closing in on six full years since I moved from Austin to Columbus. Because I’m a native Texan, I’ll never truly “go native” in Columbus (there’s a Natural Law of Hillbilliness that dictates that), but, with ACCELERATE 2013 rapidly approaching, it seemed like a good time to rattle off why the town is a great place to be for digital analytics.

The town itself, like me, has long-struggled with a bit of an inferiority complex:

On more than one occasion, a long-time local has pointed out to me that, when talking to people from other large cities, the city name itself suffices: LA, Houston, San Francisco, New York, Chicago, Cleveland, Cincinnati. But, Columbus residents always feel like they have to provide a little more detail: “Columbus, Ohio” (it’s true!).

A first-time visitor to the town recently flew in from Austin and assumed he was just flying over an “actual Ohio city” as he descended into Columbus: “It’s a real city! Bigger than I expected!”

So, with my tongue occasionally inserted into my cheek as I type, below are five reasons that Columbus is actually a great town for digital analytics!

#1: Birthplace of Presidents…and Digital Analytics?

Virginia is the “Mother of Presidents,” in that 8 U.S. Presidents were born there. That makes sense — it was a hotbed of colonial activism. 4 of the first 5 presidents, actually, were from Virginia. After that, though things tapered off a bit for the state. Ohio, though, wasn’t even a colony, and, yet, is the birthplace of 7 presidents. Not bad!

On the digital analytics front, did you know that:

Eric Peterson was born in Ohio, just like those 7 presidents (well, presumably, not “just like,” as some of them were born when medical techniques were more primitive).

Avinash Kaushik got his MBA at The Ohio State University, so he spent some seriously formative business years in the town

Jim Sterne put his wife through law school selling lots of software to GE and Wright Patterson AFB in Ohio.

Super-compelling anecdotes like this can’t be sheer coincidences, can they? (Don’t answer that.)

#2: Big Brands with <groan>Big Data</groan>

A number of major brands were founded in Columbus (and stayed), relocated to Columbus as they grew, or have a major presence here. Those companies have a wealth of consumer and digital data…and they rely on sharp analysts to help them put that data to profitable use.

Some logos you might recognize of brands that were founded and continue to be based here (or have been headquartered here long enough that they might as well have been):

That doesn’t include the fact that JPMorgan Chase has a massive presence in Columbus, as does Abbott Labs. And Thirty-One Gifts is now based here and growing like a weed. And P&G is just down the road…

You get the idea.

#3: Same Right Size as Austin… but Definitely Cooler

Austin and Columbus are just about the same size. They’re both easily in the top 20 cities in the U.S. by population. Everyone thinks of Austin as being a cool town — hipsters abound, keepin’ it weird, and so on. And it is a cool town. It just turns out that Columbus is cooler:

At the same time, the frigidness of Columbus in the winter tends to be exaggerated. Notice in the chart above that the temperature didn’t drop below freezing and just stay there for a long period of time. Columbus is in central Ohio, which means it’s far enough from Lake Erie that the “lake effect” that dumps snow early and often on cities like Cleveland and Detroit actually tapers out before getting down to Columbus.

Climate-wise, it’s actually pretty mild.

And, no, this doesn’t really directly have anything to do with digital analytics…but it did include a chart with real data (courtesy of NOAA)!

#4: When Big Blue Does <groan…again>Big Data</groan…again> They Do It In Columbus

IBM. Ever heard of ’em? Well, let me tell you a little story: when they decided to open a client center devoted to advanced analytics, they looked high, then they looked low, then someone said, “Why don’t you look in Ohio?” In the end, they landed in Columbus. The combination of talent, brands, and local government support made it a no-brainer (whether or not it being cooler than Austin may or may not have factored in).

#5: Our Analysts Like to Hang Out and Drink Beer

Digital analysts in Columbus get together about once a month to hang out, eat good food, drink good beer (except for Liz Smalls — she drinks Budweiser), and swap tips and ideas at Web Analytics Wednesday. We’ve had over 50 WAWs in Columbus in the last 5 years, and those will keep on keeping on!

So, what are you waiting for?

Seriously. What are you waiting for? Sign up now for ACCELERATE 2013 so you can check out a bit of this analytically awesome spot!

]]>http://www.gilliganondata.com/index.php/2013/06/25/5-reasons-columbus-is-great-for-digital-analytics-which-you-can-use-to-justify-your-trip-to-accelerate/feed/3http://www.gilliganondata.com/index.php/2013/06/25/5-reasons-columbus-is-great-for-digital-analytics-which-you-can-use-to-justify-your-trip-to-accelerate/EXACTLY Where I’ve Wanted to Behttp://feedproxy.google.com/~r/GilliganOnData/~3/ZImgYpBniXw/
http://www.gilliganondata.com/index.php/2013/06/22/exactly-where-ive-wanted-to-be/#commentsSat, 22 Jun 2013 04:06:23 +0000http://www.gilliganondata.com/?p=2199Ask a career coach how to land your dream job and they’ll tell you: 1) figure out what your dream job is, and 2) develop a plan to get there.* I never consciously did either one, but I realized several years ago that I have the most fun when I’m helping companies figure out how to “do” digital analytics effectively. I even caught myself with a pretty tight description of what that looked like in my ideal form: it looked like what Eric, John, and Adam (just the three of them at the time) were doing over at Web Analytics Demystified. The fact that I got to know them personally, both at conferences and through social media, as well as the rest of the stellar crew of talent they’ve added since then, did nothing but reinforce my beliefs.

And now…I’ll be joining that team in a couple of weeks! As someone that Eric Peterson once referred to as “The Grandmaster of Grump,” I’ve been dealing with an unfamiliar emotion: giddiness. I’m looking forward to joining a fantastic and talented team, including having them all in my adopted home town of Columbus in a few months for ACCELERATE!

* I’ve never actually had a career coach. I tried to read What Color Is Your Parachute? years ago and didn’t make it past the second chapter.

]]>http://www.gilliganondata.com/index.php/2013/06/22/exactly-where-ive-wanted-to-be/feed/2http://www.gilliganondata.com/index.php/2013/06/22/exactly-where-ive-wanted-to-be/Excel Dynamic Named Ranges (with Tables) = Never Manually Updating Your Chartshttp://feedproxy.google.com/~r/GilliganOnData/~3/wQwQbwBSEAQ/
http://www.gilliganondata.com/index.php/2013/06/08/excel-dynamic-named-ranges-with-tables-never-manually-updating-your-charts/#commentsSun, 09 Jun 2013 03:21:16 +0000http://www.gilliganondata.com/?p=2176The single post on this blog that has, for several years now, consistently driven the most traffic to this site, is this one that I wrote almost three years ago. Apparently, through sheer volume of content on the page and some dumb luck with the post title, I consistently do well for searches for “Excel dynamic named ranges” (long live the long tail of SEO!).

The kicker is that I wrote that post before I’d discovered the awesomeness of Excel tables, and before Excel 2010 had really gone mainstream. I’ve been meaning to redo the original post with an example that uses tables, because it simplifies things a bit.

This is that post — 100% plagiarized from the original when it makes sense to do so. The content was created in Excel 2010 for Windows. However, it should work fine on Excel 2007 for Windows, too. Macs are a bit of a crap shoot, unfortunately (but you can always run Parallels, so I hear, and use Excel for Windows!).

This post describes (and includes a downloadable file of the example) a technique that I’ve used extensively to make short work of updating recurring reports. Here are the criteria I was working against when I initially implemented this approach:

User-selectable report date

User-selectable range of data to include in the chart

Single date/range selection to update multiple charts at once

No need to touch the chart itself

Reporting of the most recent value (think sparklines, where you want to show the last x data values in a small chart, and then report the last value explicitly as a number)

No use of third-party plug-ins

No macros — I don’t have anything against macros, but they introduce privacy concerns, version compatibility, odd little warnings, and, in this case, aren’t needed

The example shown here is pretty basic, but the approach scales really well.

Sound like fun?

Setting Up the Basics

One key here is to separate the presentation layer from the data layer. I like to just have the first worksheet as the presentation layer — let’s name it Dashboard — and the second worksheets as the data layer — let’s call that Data. (Note: I abhor many, many things about Excel’s default settings, but, to keep the example as familiar as possible, I’m going to leave those alone. This basic approach is one of the core components in the dashboards I work on every day, and it can be applied to a much more robust visualization of data than is represented here.

Data Tab Setup — Part 1

This is a slightly iterative process that starts with the setup of the Data tab. On that worksheet, we’ll use the first column to list our dates — these could be days, weeks, months, whatever (they can be changed at any time and the whole approach still works). For the purposes of this example, we’ll go with months. Let’s leave the first row alone — this is where we will populate the “current value,” which we’ll get to later. I like to use a simple shading schema to clearly denote which cells will get updated with data and which ones never really need to be touched. And, in this example, let’s say we’ve got three different metrics that we’re updating: Revenue, Orders, and Web Visits. This approach can be scaled to include dozens of metrics, but three should illustrate the point. That leaves us with a Data tab that looks like this:

Now, turn that range of data into a table by selecting the area from A2 to D19 and choosing Insert » Table. Then, click over to the Table Tools / Design group and change the table name from “Table1″ to “Main_Data” (this isn’t required, but I always like to give my tables somewhat descriptive names). The sheet should now look like this:

Because this is now a table, as you add data in additional rows, as long as they are on the rows immediately below the table, the table will automatically expand (and that new data will be included in references to Main_Data, which is critical to this whole exercise).

While we’re on this tab, we should go ahead and defined some named cells and some named ranges. We’ll name the cells in the first row of each metric column (the row labeled “Current–>” as the “current” value for that metric (the cells don’t have to be named cells, but it makes for easier, safer updating of the dashboard as the complexity grows). Name each cell by clicking on the cell, then clicking in the cell address at the top left and typing in the cell name. It’s important to have consistent naming conventions, so we’ll go with <metric>_Current for this (it works out to have the metric identified first, with the qualifier/type after — just trust me!). The screen capture below shows this being done for the cell where the current value for Orders will go, but this needs to be done for Revenue and Web Traffic as well (I just remove the space for Web Traffic — WebTraffic_Current).

And, of course, we’ll actually need data — this would come later, but I’ve gone ahead and dropped some fictitious stuff in there:

That’s it for the Data tab for now…but we’ll be back!

Dashboard Tab Setup — Part 1

Now we jump over to the Dashboard worksheet and set up a couple of dropdowns — one is the report period selector, and the other is the report range (how many months to include in the chart) selector. Start by setting up some labels with dropdowns (I normally put these off to the side and outside the print range…but that doesn’t sit nice with the screen resolution I like to work with on this blog):

Then, set up the dropdowns using Excel data validation:

First, the report period. Click in cell C1, select Data » Data Validation, choose List, and then reference the first column in the Main_Data table (see the “Referencing Tables and Parts of Tables” section in this post for an explanation of the specific syntax used here, including the use of the INDIRECT function):

When you click OK, you will have a dropdown in cell C1 that contains all of the available months. This is a critical cell — it’s what we’ll use to select the date we want to key off of for reporting, and it’s what we’ll use to look up the data. So, we need to make it a named cell — ReportPeriod:

Now, let’s do a similar operation for the report range — this tells the spreadsheet how many months to include in each chart. Click in cell C3, select Data » Data Validation, choose List, and then enter the different values you want as options (I’ve used 3, 6, 9, and 12 here, but any list of integers will work):

And, let’s name that cell ReportRange:

Does this seem like a lot of work? It can be a bit of a hassle on the initial setup, but it will pay huge dividends as the report gets updated each day, week, or month. Trust me!

Before we leave this tab, go ahead and select a value in each dropdown — this will make it easier to check the formulas in the next step.

Data Tab Setup — Part 2

Now is where the fun begins. We’re going to go back over to the Data worksheet and start setting up some additional named ranges. We’ve got Main_Data, which is the table that includes the full range of data. We want to look at the currently selected Report Period (a named range called ReportPeriod) and find the value for each metric that is in the same row as that report period. That will give us the “Current” value for each metric. All you need to do is put the exact same formula in each of the three “Current” cells:

=VLOOKUP(ReportPeriod,Main_Data,COLUMN())

In this example, these are the values for each of the three arguments:

ReportPeriod — Jul-12, the value we selected on the Dashboard tab

Main_Data — this is the full table of data

COLUMN() — this is 2, the column that the current metric is listed in (this function resolves to “3” for Orders and to “4” for Web Traffic)

So, the formula simply takes the currently selected month, finds the row with that value in the data array, and then moves over to the column that matches the current column of the formula:

Slick, huh? And, because the ReportPeriod data validation dropdown on the Dashboard worksheet is referencing the first column of the data tableon the Data tab, the VLOOKUP will always be able to find a matching value. (Read that last sentence again if it didn’t sink in — it’s a nifty little way of ensuring the robustness of the report)

This little bit of cleverness is really just a setup for the next step, which is setting up the data ranges that we’re going to chart. Conceptually, it’s very similar to what we did to find the current metric value, but we want to select the range of data that ends with that value and goes backwardsby the number of months specified by ReportRange. So, in the values we selected above, Jul-09 and “6,” we basically want to be able to chart the following range of data:

We’ll do this by defining a named range called Revenue_Range (note how this has a similar naming convention to Revenue_Current, the name we gave the cell with the single value — this comes in handy for keeping track of things when setting up the dashboard). We can’t use VLOOKUP, because that function doesn’t really work with arrays and ranges of data. Instead, we’ll use a combination of the MATCH function (which is sort of like VLOOKUP on steroids) and the INDEX function (which is a handy way to grab a range of cells). Pull your hat down and fasten your seatbelt, as this one gets a little scary. Ultimately, the formula looks like this:

Working from the outside in, you’ve got a couple of INDEX() functions. Think of those as being INDEX(First Cell) and INDEX(Last Cell).

The range is defined, in pseudocode, as simply:

=INDEX(First Cell):INDEX(Last Cell)

The Last Cell calculation is slightly simpler to understand. As a matter of fact, this is really just trying to identify the cell location (not the value in the cell) of the current value for revenue — very similar to what we did with the VLOOKUP function earlier. The INDEX function has three arguments: INDEX(array,row_num,column_num). Here’s how those are getting populated:

array — this is simply set to Main_Data, the full data table

row_num — this is the row number within the array that we want to use; we’ll come back to that in just a minute

column_num — we use a similar trick that we used on the Revenue_Current function, in that we use the COLUMN() formula; but, since we set up this range simply as a named range (as opposed to being a value in a cell), we can’t leave the value of the function blank; so, we populate the function with the argument of Revenue_Current — we want to grab the column that is the same column as where the current revenue value is populated in the top row.

Now, back to how we determine the row_num value. We do this using the MATCH function, which we need to use on a 1-dimensional array rather than a 2-dimensional array (Main_Data is a multi-column table, which makes it a 2-dimensional array). All we want this function to return is the number of the row in the Main_Data table for the currently selected report period, which, as it turns out, is the same row as the currently selected report period in the first column (“Report Period”). The formula is pretty simple:

MATCH(ReportPeriod,Main_Data[Report Period])

The formula looks in the first column of the Main_Data table for the ReportPeriod value and finds it…in the seventh row of the table. So, row_num is set to 7.

INDEX(First Cell) is almost identical to INDEX(Last Cell), except the row_num value needs to be set to 2 instead of 7 — that will make the full range match the ReportRange value of 6. So, row_num is calculated as:

MATCH(ReportPeriod,Main_Data[Report Period])-ReportRange+1

(The “+1″ is needed because we want the total number of cells included in the range to be ReportRange inclusive.)

Now, that’s not all that scary, is it? We just need to drop the full formula into a named range called Revenue_Range by selecting Formulas » Name Manager » New, naming the range Revenue_Range, and inserting the formula:

Tip: After creating one of these named ranges, while still in the Name Manager, you can select the range and click into the formula box, and the current range of cells defined by the formula will show up with a blinking dotted line around them.

You’re getting sooooooo close, so hang in there! In order for the chart labels to show up correctly, we need to make one more named range. We’ll call it Date_Range and define it with the following formula (this is just like the earlier _Range formulas, but we know we want to pull the dates from the first column, so, rather than using the COLUMN() formula, we simply use a constant, “1”:

If you want, you can fiddle around with the different settings on the Dashboard tab and watch how both the “Current” values and (if you get into Name Manager) the _Range areas change.

OR…you can move on to the final step, where it all comes together!

Dashboard Tab Setup — Part 2 (the final step)

It’s back over to the Dashboard worksheet to wrap things up.

Insert a 2-D Line chart and resize it to be less than totally obnoxious. It will just be a blank box initially:

Right-click on the chart and select Select Data. Click to Add a new series and enter “Revenue” (without the quotes — Excel will add those for you) as the series name and the following formula for the series values:

=DynamicChartsWithTables_Example.xlsx!Revenue_Range

(Change the name of the workbook if that’s not what your workbook is named)

Click to edit the axis labels and enter a similar formula:

=DynamicChartsWithTables_Example.xlsx!Date_Range

You will now have an absolutely horrid looking chart (thank you, Excel!):

Tighten it up with some level of formatting (if you just can’t stand to wait, you can go ahead and start flipping the dropdowns to different settings), drop “=ReportPeriod” into cell E6 and “=Revenue_Current” into cell E7, and you will wind up with something that looks like this:

Okay, so that still looks pretty horrid…but this isn’t a post about data visualization, and I’m trying to make the example as illustrative as possible. In practice, we use this technique to populate a slew of sparklines (no x-axis labels) and a couple of bar charts, as well as some additional calculated values for each metric.

To add charts for orders and web traffic is a little easier than creating the initial chart. Just copy the Revenue chart a couple of times (if you hold down <Ctrl>-<Shift> and then click and drag the chart it will make a copy and keep that copy aligned with the original chart).

Then, simply click on the data line in the chart and look up at the formula box. You will see a formula that looks something like this:

Change the bolded text, “Revenue,” to be “Orders” and the chart will update.

Repeat for a Web Traffic chart, and you’ll wind up with something like this:

And…for the magic…

<drum rollllllllllll>

Change the dropdowns and watch the charts update!

So, is it worth it? Not if you’re going to produce one report a couple of times and move on. But, if you’re in a situation where you have a lot of recurring, standardized reports (not as mindless report monkeys — these should be well-structured, well-validated, actionable performance measurement tools), then the payoff will hit pretty quickly. Updating the report is simply a matter of updating the data on the Data tab (some of which can even be done automatically, depending on the data source and the API availability), then the Report Period dropdown on the Dashboard tab can be changed to the new report period, and the charts get automatically updated! You can then spend your time analyzing and interpreting the results. Often, this means going back and digging for more data to supplement the report…but I’m teetering on the verge of much larger topic, so I’ll stop…

As an added bonus, you can hide the Data tab and distribute the spreadsheet itself, enabling your end users to flip back and forth between different date ranges — a poor man’s BI tool, if ever there was one (in practice, there will seldom be any real insight gleaned from this limited number of adjustable dropdowns, and that’s not the reason to set them up in the first place).

I was curious as to what it would take to create this example from scratch and document it as I went. As it’s turned out, this is a lonnnnnnngggg post. But, if you’ve skimmed it, get the gist, and want to start fiddling around with the example used here, feel free to download it!

Post No. 1: Feras Alhlou’s 3 Key GA Reports

Feras Alhlou of E-Nor recently wrote an article for Practical eCommerce that describes three Google Analytics reports with which he recommends eCommerce site owners become familiar. The third one in his list — Funnel Segments — is particularly intriguing (breaking down your funnels by Medium).

“Hey…look out for…sampling (with conversion rates)”

Sampling in Google Analytics is one of those weird things that people either totally freak out about (especially people who currently or previously worked for the green-themed-vendor-that-has-been-red-for-a-few-years-now) or totally poo-poo as not a big deal at all. Once Google Analytics Premium came out, Google actually started talking about sampling more…because its impact diminishes with Premium.

I actually fell in the “poo-poo” camp for years. The fact was, every time I dug into a metric in a sampled report — when I jumped through hoops to get unsampled data — the result was similar enough for the difference to be immaterial. I patted myself on the back for being a sharp enough analyst to know that an appropriately chosen sample of data can provide a pretty accurate estimate of the total population.

And that’s true.

But, if you start segmenting your traffic and have segments that represent a relatively small percentage of your site’s overall traffic, and if you combine that with a metric like Ecommerce conversion rate (which is a fraction that relies on two metrics: Visitsand Transactions), things can start to get pretty wonky. Ryan at Blast Analytics wrote a post that I found really helpful when I was digging into this on behalf of a client a couple of months back.

Obviously, if you’re running the free Google Analytics and you never see the yellow “your data is sampled” box, then this isn’t an issue. Even if you do see the box, you may be able to slide the sampling slider all the way to the right and get unsampled data. If that doesn’t work, you may want to pull your data using shorter timeframes to remove sampling (which throws Unique Visitors out the window as a metric you can use, of course).

Be aware of sampling! It can take a nice hunk of meat out of your tush if you blithely disregard it.

]]>http://www.gilliganondata.com/index.php/2013/05/30/some-practical-ecommerce-google-analytics-tips/feed/0http://www.gilliganondata.com/index.php/2013/05/30/some-practical-ecommerce-google-analytics-tips/QA: It’s for Analysts, Too (and I’m not talking about tagging)http://feedproxy.google.com/~r/GilliganOnData/~3/yC5mbm1YeOg/
http://www.gilliganondata.com/index.php/2013/05/28/qa-its-for-analysts-too-and-im-not-talking-about-tagging/#commentsTue, 28 May 2013 10:00:23 +0000http://www.gilliganondata.com/?p=2159There is not an analyst on the planet with more than a couple of weeks of experience who has not delivered an analysis that is flawed due to a mistake he made in pulling or analyzing the data. I’m not talking about messy or incomplete data. I’m talking about that sinking feeling when, following your delivery of analysis results, someone-somewhere-somehow points out that you made a mistake.

Now, it’s been a while since I experienced that feeling for something I had produced. <Hold on for a second while I find a piece of wood to knock on… Okay. I’m back.> I think that’s because it’s an ugly enough feeling that I’ve developed techniques to minimize the chance that I experience it!

As a blogger…I now feel compelled to write those down.

I get it. There is a strong urge to skip QA’ing your analysis!

No one truly enjoys quality assurance work. Just look at the number of bugs that QA teams find that would have easily been caught in proper unit testing by the developer. Or, for that matter, look at the number of typos that occur in blog posts (proofreading is a form of QA).

Analysis QA isn’t sexy or exciting work (although it can be mildly stimulating), and, when under the gun to “get an answer,” it can be tempting to hasten to the finish by skipping past a step of QA, but it’s not a wise step to skip.

I mean it. Skipping Analysis QA is bad, bad, BAD!

9 times out of 10, QA’ing my own analysis yields “nothing” – the data I pulled and the way I crunched it holds up to a second level of scrutiny. But, that’s a “nothing” in quotes because “9 times everything checked out” is the wrong perspective. That one time in ten when I catch something pays for itself and the other nine analyses many times over.

You see, there are two costs of pushing out the results of an analysis that have errors in them:

It can lead to a bad business decision. And, once an analysis is presented or delivered, it is almost impossible to truly “take it back.” Especially if that (flawed) analysis represents something wonderful and exciting, or if it makes a strong case for a particular viewpoint, it will not go away. It will sit in inboxes, on shared drives, and in printouts just waiting to be erroneously presented as a truth days and weeks after the error was discovered and the analysis was retracted.

It undermines the credibility of the analyst (or, even worse, the entire analytics team). It takes 20 pristine analyses* that hold up to rigorous scrutiny to recover the trust lost when a single erroneous analysis is delivered. This is fair! If the marketer makes a decision (or advocates for a decision) based on bad data from the analyst, they wind up taking bullets on your behalf.

Analysis QA is important!

With that lengthy preamble, below are my four strategies for QA’ing my own analysis work before it goes out the door.

1. Plausibility Check

Like it or not, most analyses don’t turn up wildly surprising and dramatic insights. When they do – or, when they appear to – my immediate reaction is one of deep suspicion.

My favorite anecdote on this front goes back almost a decade, when a product marcom who had been digging into SEO and making tweaks to his product line’s main landing page, popped his head into my cubicle one day and asked me if I’d seen “what he’d done.” He’d been making minor — and appropriate — updates to his product line’s main landing page to try to improve the SEO. When he looked at a traffic report for the page, he saw a sudden and dramatic increase in visits starting one day in the middle of the prior month. He immediately took a printout of the traffic chart and told everyone he could find — including the VP of marketing — that he’d achieved a massive and dramatic success by updating some meta data and page copy!

Of course…he hadn’t.

I dug into the data and pretty quickly found that a Gomez (uptime/load time monitoring software) user agent was the source of the increased traffic. It turned out that Gomez was pitching my company’s web admins, and they’d turned on a couple of monitors to have data to show to the people in the company to whom they were pitching. (The way their monitors worked, each check of the site recorded a new visit, and none of those monitors were filtered out as bots…until I discovered the issue and updated our bots configuration.)

In other words, “Doh!!!”

That’s a dramatic example, but, to adjust the “if it seems too good to be true…” axiom:

If the data looks too surprising or too counter intuitive to be true…it probably is!

Considering the plausibility of the results is not, in and of itself, actual QA, but it’s a way to get the hairs on your back standing up to help you focus on the other QA strategies!

2. Proofread

Proofreading is tedious in writing, and it’s not much less tedious in analytics. But, it’s valuable!

Here’s how I proofread my analyses for QA purposes:

I pull up each query and segment in the tool I created it in and literally walk back through what’s included.

I re-pull the data using those queries/segments and do a spot-check comparison with wherever I wound up putting the data to do the analysis

I actually proofread the analysis report – no need to have poor grammar, typos, or inadvertently backwards labeling.

That’s really all there is to it for proofreading. It takes some conscious thought and focus, but it’s worth the effort.

3. Triangulation

This is one of my favorite – and most reliable – techniques. When it comes to digital data and the increasing flexibility of digital analytics platforms, there are almost always multiple ways to come at any given analysis. Some examples:

In Google Analytics, you looked at the Ecommerce tab in an events report to check the Ecommerce conversion rate for visits that fired a specific event. To check the data, build a quick segment for visits based on that event and check the overall Ecommerce conversion rate for that segment. It should be pretty close!

In SiteCatalyst, you have a prop and an eVar populated with the same value, and you are looking at products ordered by subrelating the eVar with Products and using Orders as the metric. For a few of the eVar values, build a Visit-container-based segment using the prop value and then look at the Products report. The numbers should be pretty close.

If you’ve used the eCommerce conversion rate for a certain timeframe in your analysis, pull the visits by day and the orders by day for that timeframe, add them both up, and divide to see if you get the same conversion rate.

Use flow visualization (Google Analytics) or pathing (SiteCatalyst) to compare results that you see in a funnel or fallout report – they won’t match, but you should be able to easily explain why when the steps when they differ.

Pull up a clickmap to see what it reports when you’ve got a specific link tracked as an event (GA) or a custom link (SiteCatalyst).

If you have a specific internal link tracked as an event or custom link, compare the totals for that event to the value from the Previous Page report for the page it links to.

You get the idea. These are all web analytics examples, but the same approach applies for other types of digital analysis as well (if your Twitter analytics platform says there were 247 tweets yesterday that included a certain keyword, go to search.twitter.com, search for the term, and see how many tweets you get back).

Quite often, the initial triangulation will turn up wildly different results. That will force you to stop and think about why, which, most of the time, will result in you realizing why that wasn’t the primary way you chose to access the data. The more ass-backwards of a triangulation that you can come up with to get to a similar result, the more confidence you will have that your data is solid (and, when a business user decides to pull the data themselves to check your work and gets wildly different results, you may already be armed to explain exactly why…because that was your triangulation technique!).

4. Phone a friend

Granted, for this one, you have to tap into other resources. But, a fresh set of eyes is invaluable (there’s a reason that development teams generally split developers out from the QA team, and there’s a reason that even professional writers have an editor review their work).

When phoning a friend, you actually can request any or all of the three prior tips:

Ask them if the results you are seeing pass the “sniff test” – do they seem plausible?

Ask them to look at the actual segment or query definitions you used – get them to proofread your work.

Ask them to spot-check your work by trying to recreate the results – this may or may not be triangulation (even if they approach the question exactly as you did, they’re still checking your work).

To be clear, you’re not asking that they completely replicate your analysis. Rather, you’re handing them a proverbial napkin and asking them to quickly and messily put a pen to that napkin to see if anything emerges that calls your analysis into question.

This Is Not As Time-Consuming As It Sounds

I positively cringe when someone excitedly tells me that they “just looked at the data and saw something really interesting!”

If it’s a business user, I shake my head and gently probe for details (“Really? That’s interesting. Let me see if I’m seeing the same thing. How is it that you got this data?…”)

If it’s an analyst, I say a silent prayer that they really have found something really interesting that holds up as interesting under deeper scrutiny. The more surprising and powerful the result, the stronger I push for a deep breath and a second look.

So, obviously, there is a lot of judgment involved when it comes to determining the extent of QA to perform. The more complex the project, and the more surprising the results, the more time it’s worth investing in QA. The more you get used to doing QA, the earlier in the analysis you will be thinking about it (and doing it), and the less incremental time it takes.

]]>http://www.gilliganondata.com/index.php/2013/05/28/qa-its-for-analysts-too-and-im-not-talking-about-tagging/feed/0http://www.gilliganondata.com/index.php/2013/05/28/qa-its-for-analysts-too-and-im-not-talking-about-tagging/Tiger Woods Is Batting .260 Lifetimehttp://feedproxy.google.com/~r/GilliganOnData/~3/VdYCDG5wIN0/
http://www.gilliganondata.com/index.php/2013/05/12/tiger-woods-is-batting-260-lifetime/#commentsMon, 13 May 2013 01:37:01 +0000http://www.gilliganondata.com/?p=2154Tiger Woods won his 78th career PGA event on Sunday at The Players Championship. The commentators were tireless in their mentions of the fact that his was Woods’s 300th PGA event start.

I’m a bad golfer and a worse baseball player, but I found myself wanting to combine the two sports by calculating Woods’s “batting average” for PGA tour events. This required two major definitional leaps:

An “at bat” was a tournament

A “hit” was a win

This is a whopper of a stretch, I realize, but stick with me, anyway.

The batting average math is now simply: with Woods’s win, his career batting average in tour events was 78/300, or .260! In baseball, a “good” hitter bats over .300. Of course, for my definitions to hold up, in real baseball, a player would only get credited with a hit if he hit a game-winning walkoff home run every time he got a hit!

This led me to wonder what Woods’s batting average over his career to date has been. So, using data from Woods’ profile on pgatour.com, I plotted it out (even though Woods was an amateur until 1996, the tournaments he played in before that still counted as PGA tour starts):

As the end of the chart shows, it does look like he is on his way back. Keep in mind that, like a real batting average, the fewer tournaments he’d played in, the more a win would increase his cumulative average and the less a non-win would drop it. That’s one reason that, in baseball, there is more focus on the batting average for the season than on the career batting average.

So, that got me wondering how this tour season compares to Woods’s past seasons. The gray in the chart below shows his average as of the end of each season:

To date, this is his highest win percentage of any year other than 2008, which was severely shortened by a knee injury. In 2008, he won 4 out of 6 PGA events before his season ended. In 2013, he has won 4 out of 7 so far!

]]>http://www.gilliganondata.com/index.php/2013/05/12/tiger-woods-is-batting-260-lifetime/feed/1http://www.gilliganondata.com/index.php/2013/05/12/tiger-woods-is-batting-260-lifetime/#eMetrics Reflection: Privacy Is Getting More Tangiblehttp://feedproxy.google.com/~r/GilliganOnData/~3/wnlqBbZ1tJg/
http://www.gilliganondata.com/index.php/2013/05/04/emetrics-reflection-privacy-is-getting-more-tangible/#commentsSun, 05 May 2013 00:11:44 +0000http://www.gilliganondata.com/?p=2145I’m chunking up my reflections on last month’s eMetrics conference in San Francisco into several posts. I had a list of eight possible topics, and this is the fourth and (probably) final one that I’ll actually get to.

I’ve attended the “privacy” session at a number of recent eMetrics, and the San Francisco one represented a big step forward in terms of specificity. “Privacy” seems to be a powerful word in the #measure industry — it’s a single word that seems to magically turn many people and companies into ostriches! It’s not that we want to avoid the topic, but there is so much complexity and uncertainty that putting our heads in the sand and kicking the can down the road (everyone loves a good mixed metaphor, right?) seems to be the default course of action.

In the session sardonically titled “Attend this Session or Pay €1 Million,” René Dechamps Otamendi of Mind Your Privacy covered European privacy regulations and Joanne McNabb of the California Department of Justice covered California and US privacy regulations.

When Pop Culture Picks It Up…

I was a West Wing fan, but had no memory of this clip that René shared:

When you’ve got mainstream network television referencing a topic, it’s a topic that is at least on the periphery of the mainstream.

“Fundamental Right” vs. “Business/Consumer Negotiation”

René pointed out that many Americans miss the point when it comes to the European privacy regulations — in typical America-centric fashion, we ignore history. We see privacy as a topic that is up for debate — how do we protect consumers with minimal regulation so that businesses can capitalize on as much personal data as possible.

In Europe…there was the Holocaust. René described how, in The Netherlands prior to WWII, the government maintained detailed and accurate records on every citizen. When the Nazis invaded, this data made it very easy for them to identify and persecute Jews. Of the 140,000 Jews who lived in The Netherlands prior to 1940, only 30,000 survived the war, and historians point to the availability of this data as one of the main reasons for this. Yikes! For many Europeans, this sort of history is both deeply embedded and strongly linked to the topic of personal and online privacy.

Thinking of privacy as an undisputed as a fundamental right is somewhat eye-opening.

It Doesn’t Matter Where Your Company Is Based

This isn’t exactly news, but it seems to be one of the excuses marketers use for burying their heads in the sand: “We’re based in Ohio — not California or Europe. So, how much do we have to worry about privacy regulations there?”

The answer comes down to where your customers are. The European Directive, as well as California regulations, do not care where a company is based. They’re focused on where the consumers interacting with those companies are. Pull up your visitor geography reports in your web analytics platform and look at where your traffic is coming from — anywhere that has a non-miniscule percentage of traffic is likely somewhere that you need to understand privacy-regulation-wise.

Why California instead of “the U.S.?”

Joanne pointed out that California is clearly in the forefront when it comes to developing, implementing, and enforcing privacy regulations in the U.S. The California Online Protection and Privacy Act (CalOPPA) has been in effect since 2004 (although not widely understood for the first few years). That’s closing in on a decade!

To me, this sounded a lot like fuel economy standards in the auto industry — California is a large enough market that businesses can’t afford to ignore the state’s residents. At the same time, other states, and the federal government (because the U.S. has a long — and checkered — history of using the states as laboratories for testing ideas), are watching California to see what they figure out. There is a very good chance that what works for California will be a basis for other states and for federal regulations.

Is California the Same As Europe?

Yes and no. They’re the same in that they have a similar orientation towards “individuals’ rights.” They’re the same in that they are increasingly starting to enforce their regulations (with very real fines levied on companies).

They’re different…in that the U.S. and Europe are different — both culturally and structurally.

They follow developments in each others’ worlds, but they’re not actively marching towards a single, unified regulation.

So, Where Should Companies Start?

Step 1: Check your privacy policy. Really. Read it. Read it for your country-specific sites (simply translating your U.S. privacy policy into German doesn’t work!). If you give it a really close read, are you even complying with what you say you are?

Step 2: Learn some details. For Europe, reach out to René at the email address in the image below. He’s got a document that explains the ins and outs of EU privacy regulations (if the number “27” doesn’t mean anything to you, you haven’t learned enough):

For California, one resource is the California Attorney General’s site for online privacy. Unfortunately, it is a bureaucratically built site, so be ready for some heavy document-wading.

Step 3: Educate your company. This one is no small task, because, when asked who to include in that discussion, it seemed like a simpler answer would have come if the question was who not to include. The web team, marketing, legal, and IT are a good start. The best hook is “We could be fined 1,000,000 euros…”

In Short: It’s Still Messy, but Things Are Getting Clearer

The heading says it all. “We” all need to take our heads out of the sand and get smarter on this. If a regulatory agency comes calling, the worst response is, “Tell me who you are again?” The best (but not currently possible) response is, “We’re totally compliant.” A good response is, “We’re working on it, here’s what we’ve done, and here’s our roadmap to do more.”

]]>http://www.gilliganondata.com/index.php/2013/05/04/emetrics-reflection-privacy-is-getting-more-tangible/feed/1http://www.gilliganondata.com/index.php/2013/05/04/emetrics-reflection-privacy-is-getting-more-tangible/#eMetrics Reflection: Data Visualization (Still!) Mattershttp://feedproxy.google.com/~r/GilliganOnData/~3/Pns_Qo-Iwd0/
http://www.gilliganondata.com/index.php/2013/04/24/emetrics-reflection-data-visualization-still-matters/#commentsWed, 24 Apr 2013 11:00:47 +0000http://www.gilliganondata.com/?p=2136I’m chunking up my reflections on last week’s eMetrics conference in San Francisco into several posts. I’ve got a list of eight possible topics, but I seriously doubt I’ll managed to cover all of them.

On Tuesday, I attended Ian Lurie’s presentation: “Data That Persuades: How to Prove Your Point.” This session was a “fist pumper” for me, as Ian is as frustrated by crappy data visualization as I am (he led off the presentation by showing a mouth guard, sharing that he wears one at night because he grinds his teeth, and then noting that the stress of seeing data poorly presented was a big source of the stress driving that grinding!).

One of the ways Ian illustrated the importance of putting care into the way data gets presented was with this image:

I think it’s fair to say this a representation of the three types of memory:

The “lizard brain” represents iconic memory — the “visual sensory register.” It’s where preattentive cognitive processing occurs. If we don’t put something forth that is clear and instantaneously perceptible, then the information won’t get past the lizard brain.

The “human brain” represents longer-term memory — where we actually need to digest the information and develop and implement a response.

Ian also spent a lot of time on Tufte’s data-ink ratio — imploring the audience to be heavily reductionist in the visualization of data by removing extraneous words, lines, tick marks, etc. so that “the data” really comes through.

]]>http://www.gilliganondata.com/index.php/2013/04/24/emetrics-reflection-data-visualization-still-matters/feed/1http://www.gilliganondata.com/index.php/2013/04/24/emetrics-reflection-data-visualization-still-matters/#eMetrics Reflection: Self-Service Analysis in 2 Minutes or Lesshttp://feedproxy.google.com/~r/GilliganOnData/~3/AGAQ1zHj4Yg/
http://www.gilliganondata.com/index.php/2013/04/23/emetrics-reflection-self-service-analysis-in-2-minutes-or-less/#commentsTue, 23 Apr 2013 11:00:27 +0000http://www.gilliganondata.com/?p=2133I’m chunking up my reflections on last week’s eMetrics conference in San Francisco into several posts. I’ve got a list of eight possible topics, but I seriously doubt I’ll managed to cover all of them.

The closing keynote at eMetrics was Matt Wilson and Andrew Janis talking about how they’ve been evolving the role of digital (including social) analytics at General Mills.

Almost as a throwaway aside, Matt noted that one of the ways he has gone about increasing the use of their web analytics platform by internal users is with video:

He keeps a running list of common use cases (types of data requests)

He periodically makes 2-minute (or less) videos of how to complete these use cases

Specifically:

He uses Snagit Pro to do a video capture of his screen while he records a voiceover

If a video lasts more than 120 seconds, he scraps it and starts over

Outside of basic screen caps with annotations, the “video with a voiceover” is my favorite use of Snagit. When I need to “show several people what is happening,” it’s a lot more efficient than trying to find a time for everyone to jump into GoToMeeting or a Google Hangout. I just record my screen with my voiceover, push the resulting video to YouTube (in a non-public way — usually “anyone with the link” mode), and shoot off an email.

I’ve never tried this with analytics demos — as a way to efficiently build a catalog of accessible tutorials — but I suspect I’m going to start!Similar Posts:

One of the first sessions I attended at last week’s eMetrics was Jim Novo’s session titled “The Evolution of an Attribution Resolution.” We’ll (maybe) get to the “attribution” piece in a separate post (because Jim turned on a light bulb for me there), but, for now, we’ll set that aside and focus on a sub-theme of his talk.

Later at the conference, Jennifer Veesenmeyer from Merkle hooked me up with a teaser copy of an upcoming book that she co-authored with others at Merkle called It Only Looks Like Magic: The Power of Big Data and Customer-Centric Digital Analytics. (It wasn’t like I got some sort of super-special hookup. They had a table set up in the exhibit hall and were handing copies out to anyone who was interested. But I still made Jennifer sign my copy!) Due to timing and (lack of) internet availability on one of the legs of my trip, I managed to read the book before landing back in Columbus.

A Long-Coming Shift Is About to Hit

We’ve been talking about being “customer-centric” for years. It seems like eons, really. But, almost always, when I’ve hear marketers bandy about the phrase, they mean, “We need to stop thinking about ‘our campaigns’ and ‘our site’ and ‘our content’ and, instead, start focusing on the customer’s needs, interests, and experiences.” That’s all well and good. Lots of marketers still struggle to actually do this, but it’s a good start.

What I took away from Jim’s points, the book, and a number of experiences with clients over the past couple of years is this:

Customer-centricity can be made much more tangible…and much more tactically applicable when it comes to effective and business-impacting analytics.

This post covers a lot of concepts that, I think, are all different sides of the same coin.

Visitors Trump Visits

Cross-session tracking matters. A visitor who did nothing of apparent importance on their first visit to the site may do nothing of apparent importance across multiple visits over multiple weeks or months. But…that doesn’t mean what they do and when they do it isn’t leading to something of high value to the company.

Caveat (defended) to that:

Does this means visits are dead? No. Really, unless you’re prepared to answer every new analytics question with, “I’ll have an answer in 3-6 months once I see how visitors play out,” you still need to look at intra-session results.

When I asked Jim about this, his response totally made sense. Paraphrasing heavily: “Answering a question with a visit-driven response is fine. But, if there’s a chance that things may play out differently from a visitor view, make sure you check back in later and see if your analysis still holds over the longer term.”

Cohort Analysis

Cohort analysis is nothing more than a visitor-based segment. Now, a crap-ton of marketers have been smoking the Lean Startup Hookah Pipe, and, in the feel-good haze that filled the room, have gotten pretty enamored with the concept. Many analysts, myself included, have asked, “Isn’t that just a cross-session segment?” But “cross-session segment” isn’t nearly as fun to say.

Here’s the deal with cohort analysis:

It is nothing more than an analysis based around segments that span multiple sessions

It’s a visitor-based concept

It’s something that we should be doing more (because it’s more customer-centric!)

The problem? Mainstream web analytics tools capture visitors cross-session, and they report cross-session “unique visitors,” but this is only in aggregate. You can dig into Adobe Discover to get cross-session detail, or, I imagine, into Adobe Insight, but that is unsatisfactory. Google has been hinting that this is a fundamental pivot they’re making — to get more foundationally visitor-based in their interface. But, Jim asked the same question many analysts are:

Having started using and recommending visitor-scope custom variables more and more often, I’m starting to salivate at the prospect of “visitor” criteria coming to GA segments!

Surely, You’ve Heard of “Customer Lifetime Value?”

“Customer Lifetime Value” is another topic that gets tossed around with reckless abandon. Successful retailers, actually, have tackled the data challenges behind this for years. Both Jim and the Merkle book brought the concept back to the forefront of my brain.

It’s part and parcel to everything else in this post: getting beyond, “What value did you (the customer) deliver to me today?” to “What value have you (or will you) deliver to me over the entire duration of our relationship” (with an eye to the time value of money so that we’re not just “hoping for a payoff wayyyy down the road” and congratulating ourselves on a win every time we get an eyeball).

Digital data is actually becoming more “lifetime-capable:”

Web traffic — web analytics platforms are evolving to be more visitor-based than visit-based, enabling cross-session tracking and analysis

Social media — we may not know much about a user (see the next section), but, on Twitter, we can watch a username’s activity over time, and even the most locked down Facebook account still exposes a Facebook ID (and, I think, a name)…which also allows tracking (available/public) behavior over time

Mobile — mobile devices have a fixed ID. There are privacy concerns (and regulations) with using this to actually track a user over time, but the data is there. So, with appropriate permissions, the trick is just handling the handoff when a user replaces their device

Intriguing, no?

And…Finally…Customer Data Integration

Another “something old is new again” is customer data integration — the “customer” angle of of the world of Master Data Management. In the Merkle book, the authors pointed out that the illusive “master key” that is the Achilles heel of many customer data integration efforts is getting both easier and more complicated to work around.

One obvious-once-I-read-it concept was that there are fundamentally two different classes of “user IDs:”

A strong identifier is “specifically identifiable to a customer and is easily available for matching within the marketing database.”

A weak identifier is “critical in linking online activity to the same user, although they cannot be used to directly identify the user.”

Cookie IDs are a great example of a weak identifier. As is a Twitter username. And a Facebook user ID.

The idea here is that a sophisticated map of IDs — strong identifiers augmented with a slew of weak identifiers — starts to get us to a much richer view of “the customer.” It holds the promise of enabling us to be more customer-centric. As an example:

An email or marketing automation system has a strong identifier for each user

Those platforms can attach a subscriber ID to every link back to the site in the emails they send

That subscriber ID can be picked up by the web analytics platform (as a weak identifier) and linked to the visitor ID (cookie-based — also a weak identifier)

Now, you have the ability to link the email database to on-site visitor behavior

This example is not a new concept by any means. But, in my experience, the way each of the platforms involved in a scenario like this has preferred to work is that they set their own strong and weak identifiers. What I took away from the Merkle book is that we’re getting a lot closer to being able to have those identifiers flow between systems.

Again…privacy concerns cannot be ignored. They have to be faced head on, and permission has to be granted where permission would be expected.

Lotta’ Buzzwords…All the Same Thing?

Nothing in this post is really “new.” They’re not even “new to me.” The dots I hadn’t connected was that they are all largely the same thing.