In this paper, we focus on recency search and study a number of algorithms to improve ranking results by leveraging user click feedback. Our contributions are three-fold. First, we use real search sessions collected in a random exploration bucket for \emph{reliable} offline evaluation of these algorithms, which provides an unbiased comparison across algorithms without online bucket tests. Second, we propose a re-ranking approach to improve search results for recency queries using user clicks. Third, our empirical comparison of a dozen algorithms on real-life search data suggests importance of a few algorithmic choices in these applications, including generalization across different query-document pairs, specialization to popular queries, and real-time adaptation of user clicks. [pdf]

In this paper, we propose two new support vector
formulations
for ordinal regression, which optimize multiple thresholds to define
parallel discriminant hyperplanes for the ordinal scales. Both
approaches guarantee that the thresholds are properly ordered at the
optimal solution.

In this paper, we develop a segmental semi-Markov model
(SSMM) for
protein secondary structure prediction which incorporates multiple
sequence alignment profiles with the purpose of improving the
predictive performance. By
incorporating the information from long range interactions in
beta-sheets, this model is also capable of carrying out
inference on contact maps.
[ps][supplement]

In this paper, we describe a gene selection algorithm
based on Gaussian processes to discover consistent gene expression
patterns associated with ordinal clinical phenotypes. The
technique of automatic relevance determination is applied to
represent the significance level of the genes in a Bayesian framework.
[pdf]
[ps]
[code]

In this paper, we present a probabilistic approach to
ordinal
regression in Gaussian processes. In the Bayesian framework of
Gaussian processes, we propose a likelihood function for ordinal
variables that is a generalization of the probit function.
Two inference techniques, based on Laplace approximation and
expectation propagation respectively, are applied for model
selection.
[pdf]
[ps]
[zip]
[code]

In this paper, we use soft insensitive loss
function in likelihood evaluation, and describe a Bayesian
framework in a stationary Gaussian process. Bayesian methods are used
to implement model adaptation, while keeping the merits of support
vector regression, such as quadratic programming and sparseness.
Moreover, confidence interval is provided in prediction.
[pdf]
[ps]
[zip]
[code]

In this paper, we propose Bayesian support vector
classifier
by introducing a novel likelihood function, known as trigonometric
likelihood function. Model adaptation and ARD feature selection could
be implemented intrinsically in hyperparameter inference. Another
benefit is the class probability in making predictions. [pdf][code]

User behavior provides many cues to improve the relevance of search results through personalization. One aspect of user behavior that provides especially strong signals for delivering better relevance is an individual’s history of queries and clicked docu-ments. Previous studies have explored how short-term behavior or long-term behavior can be predictive of relevance. Ours is the first study to assess how short-term (session) behavior and long-term (historic) behavior interact, and how each may be used in isolation or in combination to optimally contribute to gains in relevance through search personalization.[pdf]

Contextual bandit algorithms have become popular tools in online recommendation and advertising systems. Offline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their ``partial-label'' nature. The purpose of this paper is two-fold. First, we review a recently proposed offline evaluation technique. Different from simulator-based approaches, the method is completely data-driven, is easy to adapt to different applications, and more importantly, provides provably unbiased evaluations. We argue for the wide use of this technique as standard practice when comparing bandit algorithms in real-life problems. Second, as an application of this technique, we compare and validate a number of new algorithms based on generalized linear models. Experiments using real Yahoo! data suggest substantial improvement over algorithms with linear models when the rewards are binary. [pdf]

Unlabeled samples can be intelligently selected for labeling
to minimize classification error. In many real-world applications,
a large number of unlabeled samples arrive in a
streaming manner, making it impossible to maintain all the
data in a candidate pool. In this work, we consider the unbiasedness property in the
sampling process, and design optimal instrumental distributions
to minimize the variance in the stochastic process.
Meanwhile, Bayesian linear classifiers with weighted maximum
likelihood are optimized online to estimate parameters. [pdf]

Online auction and shopping are gaining popularity with the growth
of web-based eCommerce. Criminals are also taking advantage of
these opportunities to conduct fraudulent activities against honest
parties with the purpose of deception and illegal profit. In practice,
proactive moderation systems are deployed to detect suspicious
events for further inspection by human experts. Motivated by
real-world applications in commercial auction sites in Asia, we develop
various advanced machine learning techniques in the proactive
moderation system. [pdf]

In this paper we study the contextual ban-
dit problem (also known as the multi-armed
bandit problem with expert advice) for linear
payo. functions. we prove
a high-probability regret upper bound. We also prove
a lower bound for this setting, matching the upper bound up to logarithmic
factors. [pdf]

In this paper, we propose an online learning algorithm that can quickly learn the best re-
ranking of the top portion of the original ranked list based
on real-time users' click feedback. In order to devise our al-
gorithm and evaluate it accurately, we collected exploration
bucket data that removes positional biases on clicks on the
documents for recency-classi.ed queries. Our initial exper-
imental result shows that our scheme is more capable of
quickly adjusting the ranking to track the varying relevance
of documents re
ected in the click feedback, compared to
batch-trained ranking functions.
[pdf]

Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is
featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. Second, the scale of most web services of practical interest calls for solutions that are both fast in learning and computation. In this work, we model personalized recommendation of news articles
as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks. [pdf]

Recommender systems are widely used in online e-commerce applications to improve user engagement and then to increase revenue. A key challenge for recommender systems is providing high quality recommendation to users in ``cold-start" situations. We consider three types of cold-start problems: 1) recommendation on existing items for new users; 2) recommendation on new items for existing users; 3) recommendation on new items for new users. We propose predictive feature-based regression models that leverage all available information of users and items, such as user demographic information and item content features, to tackle cold-start problems. The resulting algorithms scale efficiently as a linear function of the number of observations. We verify the usefulness of our approach in three cold-start settings on the MovieLens and EachMovie datasets, by comparing with five alternatives including random, most popular, segmented most popular, and two variations of Vibes affinity algorithm widely used at Yahoo! for recommendation.

In multiway data, each sample is measured by multiple sets of
correlated attributes. We develop a probabilistic framework for
modeling structural dependency from partially observed
multi-dimensional array data, known as pTucker. Latent components
associated with individual array dimensions are jointly retrieved
while the core tensor is integrated out. The resulting algorithm
is capable of handling large-scale data sets. We verify the
usefulness of this approach by comparing against classical models
on applications to modeling amino acid fluorescence, collaborative
filtering and a number of benchmark multiway array data.
[pdf][third-party pTucker code]

In Web-based services of dynamic content (such as news articles),
recommender systems face the difficulty of timely identifying new
items of high-quality and providing recommendations for new users.
We propose a feature-based machine learning approach to
personalized recommendation that is capable of handling the
cold-start issue effectively. We maintain profiles of content of
interest, in which temporal characteristics of the content, e.g.
popularity and freshness, are updated in real-time manner. We also
maintain profiles of users including demographic information and a
summary of user activities within Yahoo! properties. Based on all
features in user and content profiles, we develop predictive
bilinear regression models to provide accurate personalized
recommendations of new items for both existing and new users. This
approach results in an offline model with light computational
overhead compared with other recommender systems that require
online re-training. The proposed framework is general and flexible
for other personalized tasks. The superior performance of our
approach is verified on a large-scale data set collected from the
Today-Module on Yahoo! Front Page, with comparison against six
competitive approaches.
[pdf][slides]

In this paper, we report a successful large-scale
case study of conjoint analysis on click through stream in
a real-world application at Yahoo!. We consider identifying
users¡¯ heterogenous preferences from millions of click/view
events and building predictive models to classify new users
into segments of distinct behavior pattern. A scalable conjoint
analysis technique, known as tensor segmentation, is
developed by utilizing logistic tensor regression in standard
partworth framework for solutions. [pdf]

We consider the case when relationships are postulated to exist due to hidden common
causes. We discuss how the resulting graphical model differs from Markov
networks, and how it describes different types of real-world relational processes.
A Bayesian nonparametric classification model is built upon this graphical representation
and evaluated with several empirical studies.
GOTO Ricardo Silva's homepage for [pdf], [data] and [code]

In this paper we model relational random variables on the edges of a network using
Gaussian processes (GPs). We describe appropriate GP priors, i.e., covariance
functions, for directed and undirected networks connecting homogeneous or heterogenous
nodes. The framework suggests an intimate connection between link
prediction and transfer learning, which were traditionally two separate topics. [pdf]

Censored targets, such as the time to events in survival
analysis,
can generally be represented by intervals on the real line. In
this paper, we propose a novel support vector technique (named SVCR)
for
regression on censored targets. Interestingly,
this approach provides a general formulation for both standard
regression and binary classification tasks.
[pdf][longer
version]

We consider the problem of utilizing unlabeled data for
Gaussian process inference. Using a geometrically motivated
data-dependent prior, we propose a graph-based construction of
semi-supervised Gaussian processes. We demonstrate this approach
empirically on several classification problems. [pdf]

Correlation between instances is often modelled via a
kernel function using input attributes of the instances. Relational
knowledge can further reveal additional pairwise correlations
between variables of interest. In this paper, we develop a class
of models which incorporates both reciprocal relational
information and input attributes using Gaussian process
techniques. This approach provides a novel non-parametric Bayesian
framework with a data-dependent prior for supervised learning
tasks. We also apply this framework to semi-supervised learning.
Experimental results on several real world data sets verify the
usefulness of this algorithm.
[pdf]

We introduce a Gaussian process (GP) framework,
stochastic relational models
(SRM), for learning social, physical, and other relational phenomena
where interactions
between entities are observed. The key idea is to model the stochastic
structure of entity relationships (i.e., links) via an interplay of
multiple GPs, each
defined on one type of entities.
[pdf]

In this paper, we propose a new basis
selection criterion for building sparse GP regression
models that provides promising gains in accuracy as well as
efficiency over previous methods.
Our algorithm is much faster than that of Smola and Bartlett,
while, in generalization it greatly outperforms the
information gain approach proposed by Seeger et al, especially
on the quality of predictive distributions.
[ps]
[code]

In this paper, we propose a probabilistic kernel
approach to
preference learning based on Gaussian processes. A new likelihood
function is proposed to capture the preference relations in the
Bayesian framework. The generalized formulation is also applicable to
tackle many multiclass problems. [pdf][ps][zip]
[code]

In this paper, we propose two new support vector
formulations
for ordinal regression, which optimize multiple thresholds to define
parallel discriminant hyperplanes for the ordinal scales. Both
approaches guarantee that the thresholds are properly ordered at the
optimal solution.
[pdf]
[ps]
[zip]
[code]

In this paper, we present a graphical model that extends
segmental semi-Markov
models (SSMM) to exploit multiple sequence alignment profiles for
protein structure
prediction. A novel parameterized model is proposed as the likelihood
function
for the SSMM. By incorporating the information from long range
interactions in
beta-sheets, this model is capable of carrying out inference on contact
maps.
[pdf]
[ps]
[zip]
[webserver]

Refereed Workshop

Unlabelled examples in supervised learning tasks can be
optimally
exploited using semi-supervised methods and active learning. We
focus on ranking learning from pairwise instance preference to
discuss these important extensions, semi-supervised learning and
active learning, in the probabilistic framework of Gaussian
processes.
[ps]