The motivation for this blog post is simple: I was having trouble searching Google for a simple formula for the confidence interval of lift. Lift is a very important metric in our industry, and after all the work I put into researching it I want to make sure the next person to google ‘confidence interval of lift’ has an easier time.

At Magnetic we use logistic regression and Vowpal Wabbit in order to determine the probability of a given impression resulting in either a click or a conversion.
In order to decide which variables to include in our models, we need objective metrics to determine if we are doing a good job.
Out of these metrics, only the computation of lift quality (in it’s exact form) is not easily parallelizable.
In this post, I will show how the computation of lift quality can be re-ordered to make it distributable.

When we work on modeling projects, we often need to compute the cumulative sum of a given quantity.
At Magnetic, we are especially interested in making sure that our advertising campaigns spend their daily budgets evenly through out the day.
To do this we need to compute cumulative sums of dollars spent through out the day in order to identify the moment at which a given campaign has delivered half of it’s daily budget.
Another example where being able to compute a cumulative sum comes in handy is transforming a probability density function into a cumulative distribution function.

Because we deal with large quantities of data, we need to be able to compute cumulative sums in a distributed fashion.
Unfortunately, most of the algorithms described in online resources do not work that well when groups are either: large (in which case we can run out of memory) or un-evenly distributed (in which case the largest group becomes the bottle neck).

For our hackathon this week, I, along with several co-workers, decided to
re-implement Vowpal Wabbit
(aka “VW”) in Go as a chance to learn more about how
logistic regression, a
common machine learning approach, works, and to gain some practical
programming experience with Go.

Though our hackathon project focused on learning Go, in this post I want to
spotlight logistic regression, which is far simpler in practice than I had
previously thought. I’ll use a very simple (perhaps simplistic?)
implementation in pure Python to explain how to train and use a logistic
regression model.

VIRB (Variable Incoming Rate Biased) reservoir sampling is a streaming sampling algorithm that
stores a representative fixed-sized sample of events from the recent past (the user
specifies the desired mean age of samples), even when the incoming rate varies. It is heavily
inspired by reservoir sampling.

Using PySpark to process large amounts of data in a distributed fashion is a great way to gain business insights.
However, the machine from which tasks are launched can quickly become overwhelmed.
This article will show you how to run pyspark jobs so that the Spark driver runs on the cluster, rather than on the submission node.

Capturing user intent with brands can be valuable, especially in online advertising.
In the online advertising domain, brand detection can help capture
user interests and improve user modeling, which, in turn, can lead to an
increase in precision of user targeting with ads relevant to their
interests and needs.

One of the most attractive features of Spark is the fine
grained control of what you can broadcast to every executor with very simple
code. When I first studied broadcast variables my thought process centered
around map-side joins and other obvious candidates. I’ve since expanded my
understanding of just how much flexibility broadcast variables can offer.

How do we align all our ideas into a vision that is easy to understand?

How do we turn that vision into something long-lived and actionable
that can be used to drive our cultural growth?

To address these questions, we recently released a Magnetic Engineering Manifesto that we believe
will help us along this path. We’re sharing it here in the hope that it will
inspire other companies to take the time to write down their own thoughts on culture.

The past decade has seen a surge in technologies around “big data,” claiming
to make it easy to process large data sets quickly, or at least scalably, by
distributing work across a cluster of machines. This is not a story of
success with a big data framework. This is a story of a small data set
suffering at the hands of big data assumptions, and a warning to developers
to check what your big data tools are doing for you.

If you have not tried processing data with Spark yet, you should. It’s the next happening framework, centered around processing data up to 100x more efficiently than Hadoop, while leveraging the existing Hadoop’s components (HDFS and YARN). Since Spark is evolving rapidly, in most cases you will want to run the latest released version by the Spark community, rather than the version packaged with your Hadoop distribution. This guide will walk you through what it takes to get the latest version of Spark running on your cluster.

One of the most popular features of the Magnetic
Insight platform is our category rankings for
an advertiser’s audience of page visitors.
The rankings give a completely unbiased look into which search
categories
are the most popular amongst the users that visit a customer’s different web pages.

How do you decide if a predictive model you have built is any good?
How do you compare the performance of two models?
As time goes on, data changes and you have to rebuild your models —
how do you compare the new model’s behavior on the new data with
the old model’s behavior on the old data?

One of the important factors that affects efficiency of our predictive
models is the recency
of the model. The earlier our bidders get new version of prediction model, the better decisions
they can make. Delays in producing the model result in lost money due to incorrect predictions.

The slowest steps in our modeling pipeline are those that require manipulating the full data set —
multiple weeks worth of data. Our sampling process has historically required two full passes over
the data set, and so was an obvious target for optimization.

At the core of our automated campaign optimization algorithms lies a difficult
problem: predicting the outcome of an event before it happens. With a good
predictor, we can craft algorithms to maximize campaign performance, minimize
campaign cost, or balance the two in some way. Without a good predictor, all we
can do is hope for the best.

Here at Magnetic, as a search-retargeting company, our core business model is to provide relevant ads to viewers. Our platform performs this task well, matching viewers up with related ads through various methods including page visits, search queries, and data analytics of each. It currently takes about 15 minutes on average for us to be able to react to new events in our core targeting infrastructure. If we could reduce this time, we could make our engineers, product management, ad operations, and our CEO really happy.

Magnetic’s real-time bidding system, written in pure Python, needs to keep up
with a tremendous volume of incoming requests. On an ordinary weekday, our
application handles about 300,000 requests per second at peak volumes, and
responds in under 10 milliseconds. It should be obvious that at this scale
optimizing the performance of the hottest sections of our code is of utmost
importance. This is the story of the evolution of one such hot section over
several performance-improving revisions.

Magnetic specializes in search retargeting, thus we really need to understand our
users’ searches — it is our bread and butter. We need to recognize what a user’s
search means in an understandable way for both humans and computers. This is why we
map each search to a category (e.g. “Automotive”), brand (e.g. “BMW”), or other intent
data. Our keyword categorization service and Search Keyword Intent Predictor (SKIP)
is our core technology which addresses this need.

The idea for the Hackathon was simple.
We all got together on a Wednesday morning and the bravest among us pitched their ideas for great new products.
The rest of us jumped on board with those projects that seemed most worthy or fun and we were off.

For our project, we decided to predict the future, or more specifically, a specific aspect of the future — the expected number of users an advertising campaign will target.

A good test suite is a developer’s best friend — it tells you what your
code does and what it’s supposed to do. It’s your second set of eyes as
you’re working, and your safety net before you go to production.

By contrast, a bad test suite stands in the way of progress — whenever you
make a small change, suddenly fifty tests are failing, and it’s not clear
how or why the cases are related to your change.

Classification of short text into a predefined hierarchy of categories is a
challenge. The need to categorize short texts arises in multiple domains:
page keywords and search queries in online advertising, improvement of
search engine results, analysis of tweets or messages in social networks, etc.