Oracle Blog

Thoughts, Tips, Rationale

Predicting Likelihood of Click with Multiple Presentations

When using predictive models to
predict the likelihood of an ad or a banner to be clicked on it is common to
ignore the fact that the same content may have been presented in the past to
the same visitor. While the error may be small if the visitors do not often see
repeated content, it may be very significant for sites where visitors come
repeatedly.

This is a well recognized
problem that usually gets handled with presentation thresholds – do not present
the same content more than 6 times.

Observations and measurements
of visitor behavior provide evidence that something better is needed.

Observations

For a specific visitor, during
a single session, for a banner in a not too prominent space, the second
presentation of the

same content is more likely to be clicked on than the first
presentation. The difference can be 30% to 100% higher likelihood for the
second presentation when compared to the first.

That is, for example, if the
first presentation has an average click rate of 1%, the second presentation may
have an average CTR of between 1.3% and 2%.

After the second presentation
the CTR stays more or less the same for a few more presentations. The number of
presentations in this plateau seems to vary by the location of the content in
the page and by the visual attraction of the content.

After these few presentations
the CTR starts decaying with a curve that is very well approximated by an
exponential decay. For example, the 13th presentation may have 90%
the likelihood of the 12th, and the 14th has 90% the
likelihood of the 13th. The decay constant seems also to depend on
the visibility of the content.

Modeling
Options

Now that we know the empirical
data, we can propose modeling techniques that will correctly predict the
likelihood of a click.

Use presentation number as an input to the
predictive model

Probably the most straight
forward approach is to add the presentation number as an input to the
predictive model. While this is certainly a simple solution, it carries with it
several problems, among them:

If the
model learns on each case, repeated non-clicks for the same content will
reinforce the belief of the model on the non-clicker disproportionately.
That is, the weight of a person that does not click for 200 presentations
of an offer may be the same as 100 other people that on average click on the
second presentation.

The
effect of the presentation number is not a customer characteristic or a
piece of contextual data about the interaction with the customer, but it
is contextual data about the content presented.

Models
tend to underestimate the effect of the presentation number.

For these reasons it is not
advisable to use this approach when the average number of presentations of the
same content to the same person is above 3, or when there are cases of having
the presentation number be very large, in the tens or hundreds.

Use presentation number as a partitioning
attribute to the predictive model

In this approach we essentially
build a separate predictive model for each presentation number. This approach
overcomes all of the problems in the previous approach, nevertheless, it can be
applied only when the volume of data is large enough to have these very
specific sub-models converge.

In the next couple of entries we will explore other solutions and a proposed modeling framework.

Great post! Using presentation number as a partitioning attribute sound like a good idea, but I see two possible snags.

Firstly, as with using this data as input to the predictive model, this number "is not a customer characteristic or a piece of contextual data about the interaction with the customer, but it is contextual data about the content presented." Can we partition the models based on attributes of a choice, or would this mean we would need to partition the models for each observed value of the number for each choice available? The number of partitions might be enormous in the latter case.

Secondly, apart from the volumes of data required for the models to converge, I think we should also consider the effects of this partitioning on memory usage.

You can partition by an attribute of a choice, but then you have to make it different for each choice, so what you need to do is to copy the attribute of the choice into the session and call "learn" explicitly. Then change for the next choice and call learn again.

I agree with you regarding the memory usage and there is the additional problem of statistical coverage. So overall it is not a good idea. In the next entries we will be exploring other options to solve the problem.