Clicky

The Site Topology analytics techniques that we’ve been
developing are, I believe, a foundational technique in site analytics. The
concept behind these techniques is really pretty simple. Traditional
statistical analysis of site behaviors is largely defeated by the impact of
site structure. You can’t measure the correlation between content views and
site outcomes because the correlation is nearly always a product (or
overwhelmingly influenced by) the structure of your site. That’s why so much
basic statistical analysis of site behavior is banal and uninteresting.
Fortunately, with fairly simple algorithmic techniques, it’s possible to create
a Website topology that maps the Website structure. This topology is useful
in-and-of-itself but it also enables a wide range of statistical analysis
methods on site behavior.

I spoke on Site Topology techniques at the DAA Symposium and
then again with Barry Parshall (iJento) and Kelly Wortham (Dell) in my webinar
last week. I don’t think you’ll ever see a webinar with more content. If you
didn’t get a chance to check it out, I strongly recommend it. Believe me,
there’s ZERO fluff in this hour.

I first elaborated the Site Topology methods in a chapter I
wrote for inclusion in a book that Ralf Haberich is writing. As a follow-up to
that, I did a podcast with Ralf that just got released. It covers some background
of both Semphonic and me (that most readers of this blog probably know) but it
also provides some discussion of Web analytics futures and, the section I think
is most interesting, some discussion of these topology techniques.

And don’t miss the next chapter of Webinarmeggedon Tuesday
morning with a look at the state of enterprise reporting and some in-depth
discussion of why barometers are better than thermometers and how, when it
comes to digital measurement, you can start building barometers.

Webinarmeggedon Update

Awhile back I posted on a the "Perfect Storm" of Whitepapers
Semphonic is releasing. Over the next month, that’s resulted in a Perfect Storm
of Webinars. So I’ve just put together a handy little list with links to
everything…enjoy!

I spent most of this past week on the road – first in
Indianapolis for ExactTarget’s Connections 2012 Conference and then in
Philadelphia for the annual DAA Symposium. Unless I’m missing something on my
calendar, it’s pretty much the end (for me at least) of the fall Conference
season.

It was my first time at ExactTarget’s Connections and I was
surprised at its scope, scale and ambition. It seems about as big as the NEW,
BIG Adobe Summit – I think around 4,000 people attend. It’s parked in a city
chosen, obviously, because of the parent company’s location – not to cater to a
traditional convention audience. But it makes up for that with a whole lot of
fun stuff and glitz. Michael J. Fox and David Blaine were both keynoting –
which is pretty cool – and it seemed like the parties were pretty extensive. I
have to go word of mouth here since there’s pretty much no chance of me ever
going to a party larger than two-dozen without a compelling reason (like wife insisting), but they sure sounded pretty epic. Indy may not have Vegas beat for
party opportunities, but I have to believe (though maybe this is just my native Hoosier
pride showing) it beats Salt Lake City hands-down.

For me, it was pretty nice to be at Conference that’s not measurement focused. The attendees at ExactTarget's Connections Conference are almost all trying to
build real programs, with analytics just another tool to help out. It’s kind of
refreshing to have an audience whose only interest in measurement is what it
can accomplish and I thought it made for a more content rich vendor Conference
than I’ve come to expect.

Segmentation for
Targeted Marketing

There’s a great deal of buzz these days about integrating
Web analytics data with eMail Marketing Systems to do better targeting. There’s
nothing wrong with that; it’s a great idea. But most email marketers struggle
to know what to do with a Web analytics data feed. To simplify their life, they
tend to adopt one of two targeting strategies with the data: funnel
re-marketing or last-product viewed targeting. There’s nothing wrong with
either of these strategies. Both can actually work pretty well.

However, they both suffer from the same problem. With funnel
re-marketing and last-product viewed targeting strategies, the quality of your
targeting ages remarkably quickly. It’s great to market funnel fall-out on a
next day basis. It’s not so useful a month later. The same thing is
true of product view information. The utility of this data for targeting ages
very rapidly. That makes this type of targeting ideal for trigger campaigns but
almost useless for personalizing full-list, regular drops.

With broader behavioral techniques like Two-Tiered
Segmentation, you get a much richer and more comprehensive segmentation. Not
only does this allow you to effectively personalize messaging to segments without recent
product-view information or funnel abandonment, it allows you to personalize
effectively on a continuing basis over a considerable period of time.

I showed several examples of visit-based segmentations along
with case-studies on how they were used to target out-bound email including
this example from the travel industry:

The grid on the left is the behavioral segmentation. One
segment (Snake-Eyes – which happened to be very Las Vegas focused) is detailed
in the upper right. These propensities show the index-weighted likelihood of
visitors in this segment having the characteristic. So, for example, the Snake
Eyes cluster is highly likely to purchase air-travel (Plane tickets) but quite unlikely to
rent a car.

By coding offers along the same propensity dimensions,
it’s fairly easy to build rules matching segment to offer even when there is a
large set of possible offers.

This type of visit-based behavioral segmentation is deeper,
more robust, and has a much slower decay cycle than simple remarketing.

If you’d like a copy of the complete presentation, just drop
me a line.

Website Topology for
Advanced Analysis

At the DAA Symposium in Philadelphia (A terrific program, by
the way, and if you weren’t there I STRONGLY SUGGEST you check out Dr. Peter
Fader’s truly excellent presentation on a methodology for optimizing allocation
decisions within a test. Another fantastic example of the work the Wharton
Customer Analytics Initiative is producing.), I presented a short (20 minute) talk called
“The Taming of the Shrew – The Marriage of Web analytics and Statistical
Analysis”.

Have you ever
wondered why statistical analysis
techniques aren’t more common in classic site analytics? It’s something I’ve
been thinking about quite a bit lately. Lots of folks tend to assume that it’s
a problem with Web analysts and our tools. And it’s true that Web analytics
tools don’t have built-in statistical capabilities and that very few Web
analysts have any statistical training.

Still, I think it begs the question. Because if statistical
analysis techniques worked well when applied to site analytics, I think our
tools and practitioners would reflect that.

What’s more, I find that when you do give Web behavioral
data to statisticians, they often come back with the most banal analyses
imaginable.

The problem, I believe, is that Web sites have a strong
built-in structure. This structure has a profound influence on navigation
patterns and tends to swamp other correlations.

There are, fortunately, techniques for mapping Website
structure so that you can create objective measures of distance between pages
and control for structure when doing statistical analysis.

I have a webinar this Thursday with Barry Parshall of iJento
and Kelly Wortham of Dell in which I’m going to be showing much of the same
material. I’m going to cover the underlying problem and the techniques we’ve
been developing to do topographic analysis. Barry is going to show how you can
extend those techniques to do deep content attribution (particularly if you
have a platform like iJento), and Kelly is going to show how some of the same
intellectual concepts have driven successful testing paradigms at Dell.

Awhile back I posted on a the "Perfect Storm" of Whitepapers
Semphonic is releasing. Over the next month, that’s resulted in a Perfect Storm
of Webinars. So I’ve just put together a handy little list with links to
everything…enjoy!

IBM and Semphonic just partnered on a new Whitepaper
tackling one of the hottest and most challenging topics in digital analytics –
choosing the right big data technology stack. I finished it a couple of weeks back and it’s now gone into
general release. In addition, I’m going to be doing a webinar about it with IBM’s CTO of
Big Data Solutions, Krishnan Parasuraman.

I’m very excited about both.

In the Whitepaper, I got to combine some of the big themes
that have been emerging in our practice: the unique challenges of digital
analytics for traditional statistical and database methods, the impact of those challenges on the selection
of a technology stack, and the best ways to structure a digital analytics
technology initiative to address the issues and build an effective digital big
data solution.

Over the last twelve months, Semphonic has been incredibly
active in this area. We never used to focus that much on strategic
measurement engagements. But the confluence of Big Data and Digital Analytics
has changed that. With our extensive background in database marketing, we’re
comfortable (indeed, eager) to get our hands on the detailed customer data and
the database, BI, and statistical tools that support that deep access. We’ve
had fifteen hard years trying to figure out how to measure, segment, and use
digital data effectively. We’ve also seen first-hand how easy it is to break
traditional technology stacks with digital data, having done it repeatedly! That combination of big data technology and digital measurement chops is pretty unique, and I think that’s why we’ve been getting asked so
often to help large enterprise’s craft a strategy that blends these elements
effectively.

In the Whitepaper, I've tried to distill that experience down
into a useful framework for thinking about digital marketing analytics
in a big data world.

So Just What is Big
Data?

The Whitepaper starts with a pretty deep discussion of the
challenges of digital and why digital is a paradigm case of big data. I know
people are already starting to hate the term big data, and I don’t really blame
them. In the broader market, it doesn’t have a specific meaning. It’s lots of data. We get
that. But how much data is big data? And
why does having lots of data really change anything?

I try to tackle this definitional morass in the Whitepaper.
At Semphonic we’ve come to have a pretty specific view about what big data
means and why it really is somewhat different – not just “more rows than
normal.” We believe that big data is really about a drive to “detail” data and
to algorithmic analytics techniques that don’t work off of aggregates. Yes, volume does
count. But big data isn’t just big, it’s big because we’ve shifted the level of
analysis.

This shift to detail-level analysis has a much bigger impact
than you might suppose. From a technology standpoint, it does drive more row
volume. But from an analysis perspective, it makes many traditional BI
techniques (that depend on cube-based aggregates) impossible or irrelevant.

In digital, it has even deeper implications. Which brings me
to the part of the Whitepaper that I think is the most interesting and
important.

The Challenge of
Stream Data

You’ll often hear digital data described as “unstructured.”
I think that’s wrong (at least in part). Yes, social media data is truly
unstructured. But analytics data collected from the Web and Mobile channels is
certainly structured. The SiteCatalyst data-feed (our most common source of
this information) is just a classic, big, comma-delimited flat file with 400 or
so fields per row. Structure!

In fact, almost every digital data source except social is
structured data.

So why this persistent description of digital data as
unstructured?

Well, digital data does drive IT folks and data architects
crazy. But it’s not the lack of structure that does it, it’s the level of
meaning.

In most digital data, there’s no meaning inherent in a
single detailed row. The server call (or page view) is not, on it's own, the unit of analysis.
Worse, digital data doesn't aggregate cleanly. Adding server calls to create page view counts or time on site isn't, in most cases, the path to meaning. Meaning comes by interpreting a stream of server calls (on the Web this is a
Visit or Path). So digital data is (mostly) semi-structured. Each row is structured just fine, but to get to anything interesting requires interpretation (effectively the addition of structure).

Why is this important?

The vast majority of ETL, query and statistical analysis
techniques have been built to operate on individual rows. That doesn’t work in
digital. In digital, meaning exists only in the combination of multiple rows
(paths) and that combination isn't a straightforward aggregation.

Stream data create a second big problem. Stream data defeats classic
join strategies. One-to-One and One-to-Many joins are almost the only types of
joins ever used in classic database work. With streams, you get Many-to-Many
joins. Many-to-Many joins don’t work well.

We’ve seen a number of cases where our clients dump digital
data streams into a warehouse, find join keys, and think they are done. In a
traditional world, putting two data sources on the same box with a join key
makes it easy for an analyst to put them together. In a stream world, it doesn’t quite solve
the problem.

In the Whitepaper, I take a real deep-dive into this topic
because I think it is, quite simply, the key to understanding the challenge of
digital big data warehousing.

Translating Problems
into Solutions

It’s nice to have a good definition of big data. It’s
certainly interesting to know why digital data is such a challenge. But how
does that knowledge translate into a useful framework for moving forward?

Well, that’s the third part of the Whitepaper. Because once
you understand some of the unique challenges of big data analysis and digital,
you can start to map different applications of digital to specific attributes
of different technology stacks.

In the Whitepaper, I look at a whole set of different
decision factors (from handling very large row counts, to supporting
algorithmic queries, to real-time analytics, to the availability of expertise) and
match them to another set of digital marketing use-cases (things like email
Targeting, Personalization, Customer Analytics and Attribution).

Not every digital marketing application has the same
requirements or puts the same stress on the technology decision factors. So if you know what
types of digital marketing applications you have, the Whitepaper gives you a
great framework for evaluating what types of technology capabilities you need.

If you’re at the point a lot of our clients are, you know that the
range of new technologies and big data capabilities, while welcome, make
choosing the right approach harder not easier. It can just be too many choices.
Without a way to think about which trade-offs are appropriate (and believe me,
EVERY technology has trade-offs), making a decision can feel random.

Yes, IBM has put together a pretty comprehensive big data
solution set. It will probably be on just about any enterprise short-list for
big data. But our (both Semphonic and IBM’s) goal in this Whitepaper wasn’t to
evaluate the IBM solution or even to talk much about it. It was to lay out a
way for ANY organization evaluating ANY big data technology stack to think more
clearly about what’s needed and why.

Something I’ve learned from years of attending Conferences
on Web analytics, it’s that if there’s one topic you won’t hear much about,
it’s how to do analysis. That isn’t necessarily a knock on Conferences –
it’s a pretty hard topic to handle in a
presentation. Kind of like going to a Conference on baseball – you probably
won’t see a presentation on how to swing a bat! But in X Change’s
conversational format, you can tackle some topics that wouldn’t necessarily
work in a more formal presentation.

In “Getting the Data to Tell It’s Secrets”, I was filling in
for Matt Fryer who ran a very successful session on this topic last year. My
plan for the session was fairly straightforward – I wanted the group to work through the major steps
in analysis starting with how analysts choose topics for research and
the extent to which open-ended data exploration is possible and effective.
Next, I wanted to see how and whether you can regularize the process of
analysis. Are there methods for making analysts more productive in exploring
data and answering questions. I also wanted to discuss exactly how analysis
works – what are the core techniques we use when actually deep-dive into a
problem. I knew this part of the discussion might be problematic but I think
it’s critically important to reflect on. Next, I wanted to talk about the
process of putting an analysis together and presenting it to stakeholders.
Socialization is almost as important as actual analysis in the effective use of
information. Finally, I hoped to talk about what to do when an analysis goes
right and you find something big and, conversely, how to handle it when you
screw up an analysis. If you’ve never messed up an analysis, pulled the wrong
data, used the wrong math, or drawn the wrong conclusion it can only be because
you’ve never done an analysis. Mistakes are inevitable, so handling them right
is always on the agenda.

Any researcher will tell you that the single most important
step in the process is deciding what question to answer. How do you know where
to start?

Not surprisingly, our group at X Change had a wide array of answers.
Some analysts seem to prefer a customer-driven approach. Start with Voice of
Customer (VoC) data – whether from feedback systems like OpinionLab or online
survey solutions. Look for places where customers express dissatisfaction or
give the site poor ratings and then focus analysis on broader behaviors to see
if the problems are widespread or unique. I’ve certainly seen this technique
used very effectively and I’m confident that it’s a part of any complete answer
on how to steer a research programme. I’m also fairly certain that it’s not the
whole enchilada. There are too many critical business questions that simply
aren’t going to be surfaced by VoC. At a fundamental level, customers don’t
care whether you’re Website is accomplishing your business goals. They don’t care
if you’re persuading them. They don’t care if you’re strengthening your brand.
They don’t care if you’re maximizing your AOS or putting the right mix of
merchandising levers on a page. It’s critical to find ways to surface research
questions that are business driven not just customer driven.

Another common driver of analysis is variation in reporting.
Since reporting mostly focuses on business drivers (traffic, conversion,
margin, revenue), this seems, intuitively, like a nice counterpart to
customer-driven research. When an important business metric changes
significantly, it almost always requires an analysis to figure out why. So
reporting is a natural and important table-set for analysis.

Even combining these two methods, however, I think you’re
well short of a complete research programme. Both customer and report based
methods are event-driven. They’re great methods for surfacing unforeseen and
important questions. What they won’t necessarily do is find and focus attention
on big business problems that aren’t captured by key metrics or that exist
within the current status quo.

One of the unique pieces we like to create when we do
digital strategy engagements at Semphonic is a data science roadmap. The idea
is to identify key analysis opportunities to create a formal plan for the
analytics team. To create the plan, we don’t use ANY event driven techniques.
Instead, the plan evolves in three stages.

First, we create a high-level model of the client’s
business. This isn’t some fancy predictive model or anything – it’s just a
conceptual model of how the business works. In the second step, we identify any
gaps in understanding that make it hard for the business to measure success or
optimize the process. These gaps become target areas for analysis. In the third
step, we match these target areas to an existing library of internal analysis
methods to see if we have practical methods for answering the research problem
and prioritize methods based on potential business impact.

I’d describe this approach to creating a research programme
as methodological. It forces the analyst to start with the business and guides
research toward areas that are of highest value. It’s a great way to map out a
high-level programme. It also ties in very well with our approach to reporting.
When building segmented reporting systems, we step clients through a formal
process of audience identification, visit intent segmentation, matching
business goals to visit types, and then defining success metrics for each
business goal with respect to the audience and visit type. Sometimes, there are
straightforward metrics for making that tie. Often, there aren’t. If you decide
that the measure of success for a Facebook Campaign is the percentage of
“engaged” Facebook Fans you acquire within your target audience, you’re faced
with a research problem – how do you measure engagement and audience? If you’re
doing your report definition right, you should expect additions to your
research programme.

Of course, the methodological approach has its own limitations.
If you build a research programme but don’t pay attention to customer and
report driven questions, you’ve lost a critical element of responsiveness.
These first two methods are much better at surfacing NEW and unforeseeable
business problems.

Wrapping up this part of the Huddle, we explored the role of
data visualization and unguided exploration in generating research questions.
Can an analyst really just sit down with a tool and explore the data to find
interesting points or important problems?

I’ll admit to deep skepticism on this point. It’s never
really worked for me. However, there was a pretty strong consensus in our
Huddle that, in fact, it does work for some analysts. Given the right tool
(something like Spotfire or Tableau seemed to be the consensus), unguided data
exploration was forcibly supported as a valid method of generating interesting
avenues of exploration.

I’m going to be a recidivist here and claim (with no real
evidence) that what’s probably going on here is sub-conscious application by an
experienced analyst of something like our methodological approach.

Whatever you believe about that, what’s more important from
a practical standpoint is that unguided exploration – with the right tool – can
and does work. Perhaps my attitudes here are colored by the fact that we’re a
consulting company. Nobody pays us (rightly in my view) to do unguided
exploration of the data. As an enterprise, however, it probably makes sense to
create enough space in your research programme to allow your analysts some
unguided time to explore the data.

Which brings us to the next stage in getting the data to
tell its secrets – actually doing an analysis. In my next post, I’m going to digress
briefly to introduce a new Whitepaper that Semphonic created for IBM on
Choosing a Big Data Technology Stack. This is one of the hottest topics in our
practice and it’s a piece I’m very happy with. Then I’ll return with more on
what I think was a fascinating discussion of how to tackle an analysis project.

[And since I'm on the topic of Web analytics methodologies, if you're in the Philadelphia area, please join me at the DAA Philadelphia Symposium on October 18th. My topic is on the application of Statistical Analysis Techniques to Digital Analytics and why it's much more challenging than people think. This is the subject of yet another upcoming Whitepaper and I think it's groundbreaking work. If you're in the Midwest (I'm an Indiana boy), I'm also going to be in Indianapolis for ExactTarget's Connections 2012 Conference on the 17th. Michael J. Fox and David Blaine are both Keynoting there - how cool is that!]