Tag Archives: big data

In my last post, I described four huge deficiencies in the current generation of in-store tracking solutions. The inability to track full customer journeys, do real segmentation, or properly contextualize data to the store make life very hard on a retail analyst trying to do interesting work. And over-reliance on non-analytic heatmaps – a tool that looks nice but is analytically unrewarding – just makes everything worse.

Of course, you don’t need to use one of these solutions. You can build an analytics warehouse and use some combination of extraordinarily powerful general purpose tools like Tableau, Datameer, Watson, and R to solve your problems.

Or can you?

Here are three more problems endemic to the current generation of in-store tracking solutions that limit your ability to integrate them into a broader analytics program.

Too Much or Too Little Associate Data

In retail, the human factor is often a critical part of the customer journey. As such, it needs to be measured. In-store counting solutions have tended toward two bad extremes when it comes to Associate data. Really, really bad solutions have just tracked Associates as customers. That’s a disaster. In the online world, we worked to screen-out the IP addresses of employees from our actual web site counting even though it was a tiny fraction of the overall measurement total. In the store world, it’s not a tiny fraction – especially given the flaws of zone-counting solutions. We’ve seen cases where a small number of associates can look like hundreds of customers. So including associate data in the store customer counts is pretty much a guarantee that your data will be garbage. On the other hand, tracking associates just so you can throw their data away isn’t the right answer either. Those interactions are important – and they are important at the journey level. Solutions that throw this data away or aggregate it up to levels like hour or day counts are missing the point. Your solution needs to be able to identify which visits had interactions, which didn’t, and which were successful. If it can’t do that, it’s not going to solve any real-world problems.

Which brings me to…

Lack of Bespoke Analytics

One of the obvious truths about analytics in the modern world is that no bespoke analytics solution is going to deliver everything you need. Even mature, enterprise solutions like Adobe Analytics don’t deliver all of the visualization and analytics you need. What bespoke analytics tools should deliver is analytics uniquely contextualized to the business problem. This business contextualization is hard to get out of general purpose tools; so it’s the real life-blood of industry and application targeted solutions. If a solution doesn’t deliver this, it’s ripe for replacement by general purpose analytic platforms. But by going exclusively to general purpose solutions, the organization will lose the shorter time to value that targeted analytics can provide.

Unfortunately, the vast majority of in-store customer tracking tools seem to deliver the sort of generic reports and charts that you might expect from an offshore outfit doing $10/hour Tableau reports. The whole point of bespoke solutions is to deliver analytics contextualized to the problem. If they are just doing a bad job of replicating general purpose OLAP tools you have to ask why you wouldn’t just pipe the data into an analytic warehouse.

Which brings me to my final point…

Lack of a True Event Level Data Feed

No matter how good your bespoke analytics solution is, it won’t solve every problem. It isn’t going to visualize data better than Tableau. It won’t be as cognitive as Watson. Or as good a platform for integration as Datameer. And its analytics capabilities are not going to equal SAS or R. Part of being a good analytics solution in today’s world is recognizing that custom-fit solutions need to integrate into a broader data science world. For in-store customer journey tracking, this is especially important because the solution and the data collection mechanism are often bound together (much as they are in most digital analytics). So if you’re solution doesn’t open up the data, you CAN’T use that data in other tools.

That should be a deal killer. Any tool that doesn’t provide a true, event level data feed (not aggregated report-level data which is useless in most of those other solutions) to your analytics warehouse doesn’t deserve to be on an enterprise short-list of customer journey tracking tools.

Open integration and enterprise data ownership should be table stakes in today’s world.

Summing it Up

There’s a lot not to like about the current generation of in-store customer journey solutions. For the most part, they haven’t delivered the necessary capabilities to solve real-world problems in retail. They lack adequate journey tracking, real segmentation, proper store contextualization, bespoke analytics, and open data feeds. These are the tools essential to solving real-world problems. Not surprisingly, the widespread perception among those who’ve tried these solutions is that they simply don’t add much value.

For us at Digital Mortar, the challenge isn’t being better than these solutions. That’s not how we’re measuring ourselves, because being better isn’t enough. We have to be good enough to drive real-world improvement.

That’s much harder.

In my next post(s), I’ll show how we’ve engineered our new platform, DM1, to include these capabilities and how that, in turn, can help drive real-world improvement.

One of my last speaking gigs of the spring season was, for me, both the least typical and one of the most interesting. Space 2.0 was a brief glimpse into a world that is both exotic and fascinating. It’s a gathering of high-tech, high-science companies driving commercialization of space.

Great stuff, but what the heck did they want with me?

Well, one of the many new frontiers in the space industry is the commercialization of geo-spatial data. For years now, the primary consumer of satellite data has been the government. But the uses for satellite imagery are hardly limited to intel and defense. For the array of Space startups and aggressive tech companies, intel and defense are relatively mature markets – slow moving and difficult to crack if you’re not an established player. You ever tried selling to the government? It’s not easy.

So the big opportunity is finding ways to open up the information potential in geo-spatial data and satellite imagery to the commercial marketplace. Now I may not know HyperSpectral from IR but I do see a lot of the challenges that companies face both provisioning and using big data. So I guess I was their doom-and-gloom guy – in my usual role of explaining why everything always turns out to be harder than we expect when it comes to using or selling big data.

For me, though, attending Space 2.0 was more about learning that educating. I’ve never had an opportunity to really delve into this kind of data and hearing (and seeing) some of what is available is fascinating.

Let’s start with what’s available (and keep in mind you’re not hearing an expert view here – just a fanboy with a day’s exposure). Most commercial capture is visual (other bands are available and used primarily for environmental and weather related research). Reliance on visual spectrum has implications that are probably second-nature to folks in the industry but take some thought if you’re outside it. Once speaker described their industry as “outside” and “daytime” focused. It’s also very weather dependent. Europe, with its abundant cloudiness, is much more challenging than the much of the U.S. (though I suppose Portland and Seattle must be no picnic).

Images are either panchromatic (black and white), multi-spectral (like the RGB we’re used to but with an IR band as well and sometimes additional bands) or hyperspectral (lots of narrow bands on the spectrum). Perhaps even more important than color, though, is resolution. As you’d probably expect, black and white images tend to have the highest resolution – down to something like a 30-40cm square. Color and multi-band images might be more in the meter range but the newest generation take the resolution down to the 40-50cm range in full color. That’s pretty fine grained.

How fine-grained? Well, with a top-down 40cm square per pixel it’s not terribly useful for things like people. But here’s an example that one of the speakers gave in how they are using the data. They pick selected restaurant locations (Chipotle was the example) and count cars in the parking lot during the day. They then compare this data to previous periods to create estimates of how the location is doing. They can also compare competitor locations (e.g. Panera) to see if the trends are brand specific or consistent.

Now, if you’re Chipotle, this data isn’t all that interesting. There are easy ways to measure your business than trying to count cars in satellite images. But if you’re a Fund Manager looking to buy or sell Chipotle stock in advance of earnings reports, this type of intelligence is extremely valuable. You have hard-data on how a restaurant or store is performing before everyone else. That’s the type of data that traders live for.

Of course, that’s not the only way to get that information. You may have heard about the recent FourSquare prediction targeted to exactly the same problem. Foursquare was able to predict Chipotle’s sales decline almost to the percentage point. As one of the day’s panelist’s remarked, there are always other options and the key to market success is being cheaper, faster, easier, and more accurate than alternative mechanisms.

You can see how using Foursquare data for this kind of problem might be better than commercial satellite. You don’t have weather limitations, the data is easier to process, it covers walk-in and auto traffic, and it covers a 24hr time band. But you can also see plenty of situations where satellite imagery might have advantages too. After all, it’s easily available, relatively inexpensive, has no sampling bias, has deep historical data and is global in reach.

So how easy is satellite data to use?

I think the answer is a big “it depends”. This is, first of all, big data. Those multi and hyper band images at hi-res are really, really big. And while the providers have made it quite easy to find what you want and get it, it didn’t seem to me that they had done much to solve the real big data analytics problem.

I’ve described what I think the real big data problem is before (you can check out this video if you want a big data primer). Big data analytics is hard because it requires finding patterns in the data and our traditional analytics tools aren’t good at that. This need for pattern recognition is true in my particular field (digital analytics), but it’s even more obviously true when it comes to big data applications like facial recognition, image processing, and text analytics.

On the plus side, unlike digital analytics, the need for image (and linguistic) processing is well understood and relatively well-developed. There are a lot of tools and libraries you can use to make the job easier. It’s also a space where deep-learning has been consistently successful so that libraries from companies like Microsoft and Google are available that provide high-quality deep-learning tools – often tailor made for processing image data – for free.

It’s still not easy. What’s more, the way you process these images is highly likely to be dependent on your business application. Counting cars is different than understanding crop growth which is different than understanding storm damage. My guess is that market providers of this data are going to have to develop very industry-specific solutions if they want to make the data reasonably usable.

That doesn’t necessarily mean that they’ll have to provide full on applications. The critical enabler is providing the ability to extract the business-specific patterns in the data – things like identifying cars. In effect, solving the hard part of the pattern recognition problem so that end-users can focus on solving the business interpretation problem.

Being at Space 2.0 reminded me a lot of going to a big data conference. There’s a lot of technologies (some of them amazingly cool) in search of killer business applications. In this industry, particularly, the companies are incredibly sophisticated technically. And it’s not that there aren’t real applications. Intelligence, environment and agriculture are mature and profitable markets with extensive use of commercial satellite imagery. The golden goose, though, is opening up new opportunities in other areas. Do those opportunities exist? I’m sure they do. For most of us, though, we aren’t thinking satellite imagery to solve our problems. And if we do think satellite, we’re likely intimidated by difficulty of solving the big data problem inherent in getting value from the imagery for almost any new business application.

That’s why, as I described it to the audience there, I suspect that progress with the use and adoption of commercial satellite imagery will seem quite fast to those of us on the outside – but agonizingly slow to the people in the industry.

Forecasting is a foundational activity in analytics and is a fundamental part of everyone’s personal mental calculus. At the simplest level, we live and work constantly using the most basic forecasting assumption – that everything will stay the same. And even though people will throw around aphorisms of the “one constant is change” sort, the assumption that things will stay largely the same is far more often true. The keyword in that sentence, though, is “largely”. Because if things mostly do stay the same, they almost never stay exactly the same. Hence the art and science of forecasting lies in figuring out what will change.

There are two macro approaches to forecasting: trending and modelling. With trending, we forecast future measurements by projecting trends of past measurements. And because so many trends have significant variation and cyclical behaviors (seasonal, time-of-day, business, geological), trending techniques often incorporate smoothing.

Though trending can often create very reliable forecasts, particularly when smoothed to reduce variation and cycles, there’s one thing it doesn’t do well – it doesn’t handle significant changes to the system dynamics.

When things change, trends can be broken (or accelerated). When you have significant change (or the likelihood of significant change) in a system, then modelling is often a better and more reliable technique for forecasting. Modelling a system is designed to capture an understanding of the true system dynamics.

Suppose our sales have declined for the past 14 months. In a trend, the expectation will be that sales will decline in the 15 month. But if we decide to cut our prices or dramatically increase our marketing budget, that trend may not continue. A model could capture the impact of price or marketing on sales and potentially generate a much better prediction when one of the key system drivers is changed.

This weekend, I added a third video to my series on big data – discussion of the changes to forecasting methodology when using big data.

[I’ve been working this year to build a legitimate YouTube channel on digital analytics. I love doing the videos (webinar’s really since they are just slide-shows with a voice-over), but they are a lot of work. I think they add something that’s different from either a blog or a Powerpoint and I’m definitely hoping to keep knocking them out. So far, I have three video series’ going: one on measuring the digital world, one on digital transformation in the enterprise, and one on big data.]

The new video is a redux of a couple recent speaking gigs – one on big data and predictive analytics and one on big data and forecasting. The video focuses more on the forecasting side of things and it explains how big data concepts impact forecasting – particularly from a modelling perspective.

Like each of my big data videos, it begins with a discussion of what big data is. If you’ve watched (or watch) either of the first two videos in the series (Big Data Beyond the Hype or Big Data and SQL), you don’t need to watch me reprise my definition of big data in the first half of Big Data and Forecasting. Just skip the first eight minutes. If you haven’t, I’d actually encourage you to check out one of those videos first as they provide a deeper dive into the definition of big data and why getting the right definition matters.

In the second half of the video, I walk through how “real” big data impacts forecasting and predictive problems. The video lays out three common big data forecasting scenarios: integrating textual data into prediction and forecasting systems, building forecasts at the individual level and then aggregating those predictions, and pattern-matching IoT and similar types of data sources as a prelude to analysis.

Each of these is interesting in its own right, though I think only the middle case truly adds anything to the discipline of forecasting. Text and IoT type analytics are genuine big data problems that involve significant pattern-matching and that challenge traditional IT and statistical paradigms. But neither really generate new forecasting techniques.

However, building forecasts from individual patterns is a fairly fundamental change in the way forecasts get built. Instead of applying smoothing techniques for building models against aggregated data, big data approaches use individual patterns to generate a forecast for each record (customer/account/etc.). These forecasts can then be added up (or treated probabilistically) to generate macro-forecasts or forecasting ranges.

If you’ve got an interest in big data and forecasting problems, give it a listen. The full video is about 16 minutes split into two pretty equal halves (big data definition, big data forecasting).

We’ve spent our spare time in the last six weeks participating in the 538 Academy Awards Prediction Challenge. On Sunday, we’ll find out how we did. But even though we expect to crash and burn on the acting awards and are probably no better than 1-3 in a very close movie race, we ended up quite satisfied with our unique process and the model that emerged. You can get full and deep description of our culture matching model with it’s combination of linguistic analysis and machine learning in this previous post.

What I love about projects like this is that they give people a glimpse into how analytics actually works. Analysis doesn’t get made at all the way people think and in most cases there is far more human intuition and direction than people realize or that anyone reading screeds on big data and predictive analytics would believe. Our culture-matching analysis pushes the envelope more than most we do in the for-pay world, so it’s probably an exaggerated case. But think about the places where this analysis relied on human judgment:

Deciding on the overall approach: Obviously, the approach was pretty much created whole-cloth. What’s more, we lacked any data to show that culture matching might be an effective technique for predicting the Oscars. We may have used some machine learning, but this approach didn’t and wouldn’t have come from throwing a lot of data into a machine learning system.

Choosing potentially relevant corpora for Hollywood and each movie: This process was wholly subjective in the initial selection of possible corpora, was partly driven by practical concerns (ease of access to archival stories), and was largely subjective in the analyst review stage. In addition to selecting our sources, we further rejected categories like “local”, “crime” and “sports”. Might we have chosen otherwise? Certainly. In some cases, we tuned the corpora by running the full analysis and judging whether the themes were interesting. That may be circular, but it’s not wrong. Nearly every complex analysis has elements of circularity.

Tuning themes: Our corpora had both obvious and subtle biases. To get crisp themes, we had to eliminate words we thought were too common or were used in different senses. I’m pretty confident we missed lots of these. I hope we caught most. Maybe we eliminated something important. Likely, we’ll never know.

Choosing our model: If you only do 1 model, you don’t have this issue. But when you have multiple models it’s not always easy to tell which one is better. With more time and more data, we could try each approach against past years. But lots of analytic techniques don’t even generate predictions (clustering, for example). The analyst has to decide which clustering scheme looks better, and the answer isn’t always obvious. Even within a single approach (text analytics/linguistics), we generated two predictions based on which direction we used to match themes. Which one was better? That was a topic of considerable internal debate with no “right” answer except to test against the real-world (which in this case will be a very long test).

Deciding on Black-Box Validity: This one is surprisingly hard. When you have a black-box system, you generally rely on being able to measure it’s predictions against a set of fairly well known decisions before you apply it to the real-world. We didn’t have that and it was HARD to decide how and whether our brute force machine-learning system was working at all. But even in cases where external measurement comparisons exist, it’s the unexpected predictions that cause political problems with analytics adoption. If you’ve ever tried to convince a skeptical organization that a black-box result is right, you know how hard this.

Explaining the model: There’s an old saying in philosophy (from James) that a difference that makes no difference is no difference. If a model has an interesting result but nobody believes it, does it matter? A big part of how interesting, important and valid we think a model is comes from how well it’s explained.

This long litany is why, in the end, the quality of your analysis is always about the quality of your people. We had access to some great tools (Sysomos, Boilerpipe, Java, SPSS, R and Crimson Hexagon), but interesting approaches and interesting results don’t come from tools.

That being said, I can’t resist special call-outs to Boilerpipe which did a really nice job of text extraction and SPSS Text Analytics which did a great job facilitating our thematic analysis and matching.

Thoughts on the Method and Results

So is culture matching a good way to predict the Oscars?

It might be a useful variable but I’m sure it’s not a complete prediction system. That’s really no different that we hoped going into this exercise. And we’ll learn a little (but not much) more on Awards night. It would be better if we got the full vote to see how close our rank ordering was.

Either way, the culture-matching approach is promising as a technique. Looking through the results, I’m confident that it passes the analyst sniff test – there’s something real here. There are a number of extensions to the system we haven’t (and probably won’t) try – at least for this little challenge. We’d like to incorporate sentiment around themes, not just matching. We generated a number of analyst-driven cultural dimensions for machine training that we haven’t used. We’d like to try some different machine-learning techniques that might be better suited to our source material. There is a great deal of taxonomic tuning around themes that might drive better results. It’s rare that an ambitious analytics project is every really finished, though the world often says otherwise.

In this case, I was pleased with the themes we were able to extract by movie. A little less with the themes in our Hollywood corpus. Why? I suspect because long-form movie reviews are unusually rich in elaborating the types of cultural themes we were interested in. In addition, a lot of the themes that we pulled out of the culture corpus are topical. It’s (kind of) interesting to know that terrorism or the presidential campaign were hot topics this last year, but that isn’t the type of theme we’re looking for. I’m particularly interested in whether and how successful we can be in deepening themes beyond the obvious one. Themes around race, inequality and wealth are fairly easy to pick out. But if the Martian scores poorly because Hollywood isn’t much about engineering and science (and I’m pretty sure that’s true), what about its human themes around exploration, courage and loneliness? Those topics emerged as key themes from the movie reviews, but they are hard to discover in the Hollywood corpus. That might be because they aren’t very important in the culture – that’s certainly plausible – but it also seems possible that our analysis wasn’t rich enough to find their implicit representations.

Regardless, I’m happy with the outcome. It seems clear to me that this type of culture matching can be successful and brings analytic rigor to a topic that is otherwise mostly hot-air. What’s more it can be successful in a reasonable timeframe and for a reasonable amount of money (which is critical for non-academic use-cases). From start to finish, we spent about four weeks on this problem – and while we had a large team, it was all part-timers.

This was definitely a problem to fall in love with and we’d kill to do more, expand the method, and prove it out on more substantial and testable data. If you have a potential use for culture matching, give us a call. We probably can’t do it for free, but we will do if for less than cost. And, of course, if you just need an incredible team of analysts who can dream up a creative solution to a hard, real-world problem, pull data from almost anything, bring to bear world-class tools across traditional stats, machine-learning and text analytics, and deliver interesting and useful results…well, that’s fine too.

My Analytics Counseling Family here at EY has been participating in the 538 Academy Award Challenge. Our project involved creating a culture-matching engine – a way to look at pieces of content (in this case, obviously, movies) and determine how well they match a specific community’s worldview. The hypothesis is that the more a movie matches the current Hollywood zeitgeist, the more likely it I to win. In my last post, I described in some detail the way we did that and our results for predicting the Best Movie (The Big Short). We were pretty happy with the way the model worked and the intuitive fit between the movies and our culture-matching engine. Of course, nothing in what we’ve done proves that culture matching is a great way to predict the Oscars (and even if we’re right it won’t prove much in a single year), but that wasn’t really the point. Culture-matching is a general technique with interesting analytics method and if the results are promising in terms of our ability to make a match, we think that’s pretty great.

The second part of our task, however, was to predict the Best Actor and Actress awards. Our method for doing this was similar to our method for predicting the best movie award but there were a few wrinkles. First, we extracted language specific to each character in the nominated movie. This is important to understand. We aren’t looking at how Hollywood talks about DiCaprio or Cranston or Lawrence as people and actors. We aren’t looking at how they are reviewed. We’re entirely focused on how their character is described.

This is the closest analogue we could think of to culture matching movies. However, this was a point of considerable debate internal to our team. To me, it seems intuitively less likely that people will prefer an actor or actress because their character matches our worldview than when discussing a movie as a whole. We all understood that and agreed that our approach was less compelling when it came to ANY of the secondary awards. However, our goal was to focus on culture-matching more than it was to find the best method for predicting acting awards. We could have predicted screenplay, I suppose, but there’s no reason to think the analysis would deviate in the slightest from our prediction around movie.

Once we had key themes around each nominated role, we matched those themes to our Hollywood corpus. In our first go round, we matched to the entire corpus matching actor themes to broad cultural themes. This didn’t work well. It turned out that we were conflating themes about people with themes about other things in ways that didn’t make much sense. So for our second pass, we tightened the themes in the Hollywood corpus to only those which were associated with people.

In essence, we’re saying which roles best correspond to the way Hollywood talks about people and picking the actor/actress who played that role.

So here’s how it came out:

Rank

Actor

1

Bryan Cranston

2

Michael Fassbender

3

Leonardo DiCaprio

4

Eddie Redmayne

5

Matt Damon

And

Rank

Actress

1

Jennifer Lawrence

2

Brie Larson

3

Cate Blanchett

4

Saoirse Ronan

5

Charlotte Rampling

Do I think we’re going to be right? Not a chance.

But that doesn’t mean the method isn’t working pretty well. In fact, I think it worked about as well as we could have hoped. Here, for example, are the themes we extracted for some of the key actors and actresses (by which I mean their nominated roles):

For Matt Damon in the Martian: Humor, Optimism, Engineer, Scientist, and leadership.

If you’ve seen these movies, I think you can agree that the thematic pulls are reasonable. And is it any surprise, as you read the list, that Cranston is our predicted winner? I think not. To me, this says more about whether our method is applicable to this kind of prediction – and the answer is probably not – than whether the method itself is working well. Take away what we know about the actors and the process, and I think you’d probably agree that the model has done the best possible job of culture matching to Hollywood.

I was a bit concerned about the Jennifer Lawrence prediction. I saw the logic of Cranston’s character immediately, but Joy didn’t immediately strike me as an obvious fit to Hollywood’s view of people. When I studied the themes that emerged around her character, though, I thought it made reasonable sense:

WDYT? There are other themes I might have expected to emerge that didn’t, but these seem like a fairly decent set and you can see where something like forceful, in particular, might match well (it did).

In the end, it didn’t make me think the model was broken.

We tried tuning these models, but while different predictions can be forced from the model, nothing we did convinced us that, when it came to culture matching, we’d really improved our result. When you start torturing your model to get the conclusions you think are right, it’s probably time to stop.

It’s all about understanding two critical items: what your model is for and whether or not you think the prediction could be better. In this case, we never expected our model to be able to predict the Academy Awards exactly. If we understand why our prediction isn’t aligned to likely outcomes, that may well be good enough. And, of course, even the best model won’t predict most events with anything like 100% accuracy. If you try too hard to fit your model to the data or – even worse – to your expectations, you remove the value of having a model in the first place.

Just like in the real world, with enough pain you can make your model say anything. That doesn’t make it reliable.

So we’re going down with this particular ship!

Machine Learning

We’ve been experimenting with a second method that focuses on machine learning. Essentially, we’re training a machine learning system with reviews about each movie and then categorizing the Hollywood corpus and seeing which movie gets the most hits. Unfortunately, real work has gotten in the way of some our brute-force machine learning work and we haven’t progressed as much on this as we hoped.

To date, it hasn’t done a great job. Well, that’s being kind. Really it kind of sucks. Our results look pretty random and where we’ve been able to understand the non-random results, they haven’t captured real themes but only passing similarities (like a tendency to mention New York). With all due respect to Ted Cruz, we don’t think that’s a good enough cultural theme to hang our hat on.

As of right now, our best conclusion is that the method doesn’t work well.

We probably won’t have time to push this work further, but right now I’d say that if I was doing this work again I’d concentrate on the linguistic approach. I think our documents were too long and complex and our themes too abstract to work well with the machine learning systems we were using.

In my next post, I have some reflections on the process and it what it tells us about how analytics works.

People have struggled with this (big) data provider model but Factual feels like it’s found a real (and valuable) niche. Would love to see more of this grow since external data is a huge miss in most big data systems.

Targeted VoC is a powerful (and totally neglected) tool for personalization. Facebook’s experience is entirely relevant to ANY content producer. I don’t know if I can take credit for this, but I suggested this to folks at Facebook a couple of years back!

An interesting discussion of the problems in identifying “likely” voters and the benefits of behavioral data integration. Food for thought in the enterprise world as well where the equivalent is often possible but rarely done.