An ensemble method combines the predictions of
differentalgorithms (the ensemble) to
obtain afinal
prediction. Thecombination of
different
predictions into afinal prediction isalso referred to
asblending".

We develop two mechanisms to solve the data
sparsity problem,including the demographic
clustering and the demographic basedcomplement.

（3）实时过滤机制

方法1，采用时间窗口，基于session过滤数据；

方法2，根据最近的行为做推荐种子。Besides the sliding
window mechanism, we propose a real-timepersonalized filtering technique to serve the individual
users'realtimedemands. For each user, we
record the recent k items that heis
interested in.

基于用户点击数据，构建CNN模型，预测用户对pin的点击率等。We trained a CNN to learn a
mapping from imagesto the probability of a
user bringing up the close-up view orclicking through to the content.

Both CUR and CTR arehelpful for applications like search ranking,
recommendationsystems and ads targeting
since we often need to knowwhich images
are more likely to get attention from usersbased on their visual content.个性化排序

图像特征增量计算方案

基于版本，日期增量计算，

It incrementallyupdates the collection of features under two
mainchange scenarios: new images uploaded
to Pinterest, andfeature evolution
(features added/modified by engineers).

Practical Lessons from Predicting Clicks on Ads at
Facebookcombines decision trees with logistic regression,
outperforming either of thesemethods on its own by over 3%, an improvement with
significant impact to the overall system
performance

Right feature + Right model

The click prediction system needs to be robust and
adaptive, and capable of learning from massive volumes of
data.

At Facebook, ads are not associated with a query,
but instead specify demographic and interest targeting.

实验评价指标

Normalized Entropy:Normalized Cross-Entropy is equivalent to the
average log loss per impression divided by what the average log
loss per impression would be if a model predicted the background
click through rate (CTR) for every impression.

Calibrationis the ratio of the average estimated CTR and
empirical CTR.

Area-Under-ROC (AUC) is also a pretty good metric
for measuring ranking quality without considering
calibration

预测模型结构

Hybrid model structure. Input features are
transformed by means of boosted decision trees. The output of each
individual tree is treated as a categorical input feature to a
sparse linear classifier. Boosted decision trees prove to be very
powerful feature transforms.

特征转换

1
对连续型变量进行离散化，分箱处理

2 cross-feature，For categorical features, the brute force approach
consists in taking the Cartesian product；If the input features are continuous, one can do
joint binning, using for example a k-d tree.

boosted decision tree based transformation as a
supervised feature encoding that converts a real-valued vector into
a compact binary-valued vector.

1. 介绍

In the impression discounting problem we aim to maximize conversion
of recommended items generated by a recommender system by applying
a discounting factor, derived from past impressions, on top of
scores generated by the recommender system.

两个挑战

(1)如何结合用户展示和反馈数据，构建有效的响应模型

(2)how can the model be applied to improve the performance of
existing recommender systems?the number of times an item is
impressed or recommended to a user;when the item was impressed, and
frequency of user visits on the site or user seeing any of the
recommended items.

(3)
Evaluate these regression models on real-world recommendation
systems such as “People You May Know” and “Suggested Skills
Endorsements” to demonstrate their effectiveness both in offline
analysis and in online systems by A/B testing.

Recommending the most relevant search keywords set
to users not only enhances the search engine’s hit rate, but also
helps the user to find the desired information more
quickly.

计算query相关性

1 similarity graph

Similarity Graph for 1 session (left) and 2
sessions (right)

计算方法

2 CONTENT BASED SIMILARITY

cwi(p) andcwi(q) are the weights of the i-th common keyword in
the query p, and q respectively andwi(p) and

wi(q) are the weights of the i-th keywords in the
query p and q respectively.

SF-IDF, which is the search frequency multiplied
by the inverse document frequency.

3 merge

When using the first consecutive query based
method, we can get clusters that reflect all users’ consecutive
search behavior in collaborative filtering. While using the second,
which is content based method, we can group together queries that
have similar composition.