On building with Data

I walked into the movie theater last Sunday evening intending to see Catching Fire. At the touchscreen kiosk, I hit “Buy Tickets” only to find on the next screen that the button labeled Catching Fire had a red background because it was sold out. I hit “Cancel”, and left the theater.

The theater will never know if it should have allocated another auditorium at this time to play Catching Fire to maximize profit, and it won’t be able to adapt tomorrow. The movie ticket kiosk was not designed to understand Win/Loss.

Understanding Win/Loss and its Value

Marketers and salesmen conduct Win/Loss analyses to understand the reasons why their firm earned, or did not earn, a sale or other action from a customer. For example, as a skier, I may choose to buy a flight on airline A rather than airline B because it does not charge $100 to check a pair of skis. After interviewing a number of customers, airline A’s Win/Loss analysis might yield the insight that the airline wins sales in part because it does not assess checked-baggage fees, whereas airline B’s analysis may yield the insight that the airline loses sales because customers do not like checked-baggage fees.

Performing an analysis can be a very lightweight process, and it isn’t necessarily different from good market-driven product development practices. A proper analysis involves retrospectively understanding customer purchase criteria and considerations, whether the criteria were met and why, and how the considerations impacted the outcome. Businesses collect this data via interviews or other means.

Understanding Win/Loss is valuable because it allows businesses to make data-driven decisions to ultimately drive more profit. Depending on the business and the context, understanding Win may be more important than understanding Loss, or visa-versa, but solid insight is crucial. In online retail, for example, where a much higher percentage of customers don’t make purchases than do make purchases, understanding Loss may be both more important and more difficult. If an online retailer sees a 5% conversion rate and understands Loss such that it can convert 1% of the sales losses, it will increase its business by 20%.

Tweak the Movie Kiosk User Interface

Even though using Win/Loss insights to optimize a business can be extremely valuable, not all businesses design to facilitate Win/Loss analysis. In my trek to the theater, a slightly redesigned kiosk user interface would yield significantly more valuable data in facilitating Win/Loss analysis—an analysis that might help it understand how it allocates auditoriums, one of the movie theater’s main levers.

For a simplified, illustrative example, suppose I walked into the movie theater Sunday evening intending to see Catching Fire, and at the touchscreen kiosk, I hit “Buy Tickets”. On the next screen, I still find a button labeled ”Catching Fire”, but this time, we tweak the button’s design so I don’t know the movie is sold out until after I select it. When I hit the ”Catching Fire” button, the theater captures my intention to see the movie independent of whether I can see the move or not; now, because the theater has captured customer intent prior to losing the sale, it has awareness of the lost sale, a requisite first step in capturing it the next time!

Design to Facilitate Win/Loss Analysis

Design products and processes to engage and extract data—purchase intent, purchase criteria, etc.—from the user as early and often as possible to facilitate Win/Loss analysis. In the checked baggage example, if the airline collects the fact that the customer intends to travel with skis, its Win/Loss analysis will be more specific and yield more fruitful results. In the end, designing to collect the purchase intent and purchase criteria to facilitate Win/Loss is no different than basic market-driven business best practices, only with the added consideration, “Thou shall not win or lose a sale without proper data for conducting business-improving analysis.”

The data that drive Win/Loss analyses are functions of current business processes or products. Therefore, better design for Win/Loss means tweaking current processes or products to engage and extract data, then designing and implementing a process to persist the data for aggregate analysis and actionable results. For example, see what the customer’s purchase intent and purchase criteria are before requiring a lengthy sign-up process or a captcha that may push the customer out of the funnel. Or, as a car salesman, use the test drive time to understand the customer’s family, background, and job to understand his purchase intent and purchase considerations.

Problems ought to have a few basic properties before they’re worth solving in the context of building a business.

I’ve often heard from folks smarter than I that a problem worth solving has at least the following three properties:

The problem should exude urgency: your customer shouldn’t be able to live without a solution.

Customers should be willing to pay you money for your solution to the problem.

The problem should be widespread and pervasive in the market.

Equally as important, but sometimes neglected in the Problem-Worthiness Calculus, is this property:

The problem should be complex and diverse enough that you are excited and motivated each day to solve it, whether it takes days, months or years to solve.

Hard problems take time to solve, and businesses take time to build. If you’re not working on something that excites you and motivates you each day, you’re going to find yourself yearning for a different problem that you hope is worth solving.

After spending time building and maintaining the run-time models central to our business at www.intentmedia.com, here are a few observations about communicating to bridge the gap between data and engineering teams, and the rest of the business:

Availability of Data

The business needs to know what data your models rely on. With the help of the business, validate that you’re avoiding privacy issues and that the data is generally kosher. If you don’t directly control all of your model input data, make sure the folks talking to your partners about data availability understand what data is important and what battles are worth fighting. A business team that actively works to prevent “data deprecation” and encourage “data exposure” is a tremendous asset to any data team. Refresh, disseminate and validate data dependencies each time your production models change.

If, on the other hand, your models rely on providers’ feeds, you’re generally in decent shape from an availability perspective: the structure and quality of the supplied data should remain relatively constant. In these cases, one of the biggest risks to data availability is getting and maintaining access to the data at a reasonable price. The better the business understands and is able articulate the value of a certain piece of data to your models, the more likely it is to take care of the feed’s commercials. Articulate why it is important that the data you rely on be made available and remain available.

Available Data Changes, Especially when Modeling Internet Traffic at Large

Traffic and software constantly change. As a result, available data can vary drastically over time. For example, a browser vendor may introduce a privacy setting that blocks cookies by default, a traffic source may change the way a referrer header is set, or seasonality may alter the traffic composition and behavior. Set the expectation that your models require continuous investment and monitoring in order to maintain and understand quality, even if you’ve built an adaptive system. Make an attempt to convince your business to appreciate this fact and understand the resource implications.

Setting Performance Expectations

Often, one of your behavioral models likely controls a decision that influences a key operational performance indicator of your business. (For example, at an ad-serving company, predicting a click through rate could greatly affect top-line revenue.) Given that available data and traffic changes constantly, set the expectation with the business that small fluctuations and variance from target KPIs as a result of model behavior are perfectly normal: depending on the application, it may be difficult (and costly) to manage a predictive model to an arbitrary precision on a daily basis. Like all monitoring, having visibility into performance (of either KPIs or models) provides value until it constantly sends a team into panic as it seeks to understand a KPI change that results from normal operating circumstances.

If refreshing or changing the model definitions can introduce a second order effect on the performance of the business, work to understand that effect and communicate it when you refresh or change your model: not all business stakeholders will have a comparable intuition to your data or engineering teams about how your model can affect the bigger picture.

Prototyping and Ideas

Often, one of your partners or peers likely has experience in an analogous problem that may help you solve your own problem. Don’t be shy: encourage the business to talk to the market about what data they might examine to get your modeling to the next level. Make the business and your market your partners in brainstorming how you can incorporate more data.

Set the expectation that most modeling experimentation results in failure. Convince the business that providing adequate resources to minimize the cost and cycle time of experimentation is a good strategy. Foster a sense of progress by celebrating and communicating quick cycles resulting in either success or failure.

Time and the Value of Data

If your business is like ours, there is a significant time component to the value of data. If we hypothesize that a certain piece of data will be valuable, we need to make an effort to collect it now so we have enough data to make it useful in the future. Working with the business to prioritize resources for data collection when its value hasn’t been proven is difficult: the catch is that you can’t prove it is valuable until you’ve collected and prototyped with it! Set the expectation with the business that spending resources to collect data is valuable and resource-worthy: knowing that a certain piece of data is not valuable is often just as useful as knowing what data is valuable.

The Math and Context

Explaining the math and the approaches behind the modeling to the business is more often only an efficient use of time if it aids in appreciating the context of a problem or an outcome the business must resolve. For example, if there are constraints as to how well a model performs given certain input data and it is important for the business to understand or appreciate that fact, it may be worth providing context with a layman’s math explanation. In most cases, communicating math or detailed mechanics to the business won’t be a productive use of anyone’s time. Instead, regularly communicate context to the business through visualization in your dashboards – business folk love dashboards.

The benefits of clean, tested code cannot be underestimated, and writing tests is an essential part of that process. But when the benefits of testing compete with other considerations, the choice to develop clean, tested code does not always seem so simple. Perhaps you’re weighing the tradeoffs between writing tests and pumping out the next batch of features to prove a set of hypotheses that, if validated, could indicate your business has legs. Why write tests or keep code clean in a resource-scarce environment before knowing whether there will even be a business?

If you believe the choice to invest in writing clean, testable code requires difficult tradeoffs like moving fast or writing tests, carefully consider the fact that while moving fast doesn’t necessarily preclude writing clean, tested code, failing to write clean, tested code can introduce significant risk to velocity the moment you need to add engineers to the team.

So, the next time you’re deciding whether to invest the marginal effort to produce clean, tested code, consider the following in the context of growing your engineering team:

Just like smart engineers want to work with other smart engineers, those who have once before worked in a clean, tested codebase will prefer working in a clean, tested codebase.

Convincing talented engineers to assume the risk of joining your team already carries significant risk – untested spaghetti code is more likely to increase rather than decrease that degree of risk.

Time is a precious resource, and onboarding additional engineers quickly and efficiently is clearly preferable to onboarding engineers slowly and inefficiently. Clean, well-tested code significantly reduces ramp-up time: new team members use code as behavior documentation instead of soaking up the time and interrupting the flow of existing team members.

When an assumption or a choice you made proves to be wrong and your codebase needs refactoring, new team members should have the confidence to make the necessary changes. When new members refuse to make audacious changes because they’re more concerned about breaking things, the probability that your team is eliciting their true creative and technical potential is far lower than it should be.

Every problem and set of circumstances is different and it is ridiculous to make assertions like “everyone ought to achieve 100% coverage”, or “everyone ought to use unit tests”. But while projects and circumstances are generally different, most share at least one commonality: successful projects, businesses and teams do eventually need to grow. As you make decisions, keep considerations like the implications of clean, tested code in the back of your mind so you can efficiently grow your engineering team when the time comes.