We’re looking for a strong candidate for a fully
funded four-year PhD position on a collaborative
project with Microsoft Research Cambridge. The research
will focus on the development of new algorithms for
leveraging data reuse in order to efficiently evaluate
and optimize the behavior of information retrieval
systems. See this page for the advertisement,
further requirements, and conditions. The deadline
for applications is March 22, 2015.

José
van Dijck, Johan Oomen and I obtained an NWO Creative
Industries grant to work on next generation search
engine technologies for for exploring large multimedia
archives. The target users are media-professionals. The
proposed innovations at the interface of computer
science and media studies come in three kinds. First,
we will develop, test and release self-learning search
algorithms that adapt and improve their behavior while
being used. Second, we will create robust methods for
semantically analyzing content in media archives.
Third, we will develop new search engine result page
presentations that provide automatically generated
storylines as narratives for professionals in the
creative industries. The algorithmic solutions will be
implemented in the research environment of the
Netherlands Institute for Sound and Vision and released
as open source search solutions.

A result page of a modern search engine often goes
beyond a simple list of ``ten blue links.'' Many
specific user needs (e.g., News, Image, Video) are
addressed by so-called aggregated or vertical search
solutions: specially presented documents, often
retrieved from specific sources, that stand out from
the regular organic web search results. When it comes
to evaluating ranking systems, such complex result
layouts raise their own challenges. This is especially
true for so-called interleaving methods that have
arisen as an important type of online evaluation: by
mixing results from two different result pages
interleaving can easily break the desired web layout in
which vertical documents are grouped together, and
hence hurt the user experience.

We conduct an analysis of different interleaving
methods as applied to aggregated search engine result
pages. Apart from conventional interleaving methods, we
propose two vertical-aware methods: one derived from
the widely used Team-Draft Interleaving method by
adjusting it in such a way that it respects vertical
document groupings, and another based on the recently
introduced Optimized Interleaving framework. We show
that our proposed methods are better at preserving the
user experience than existing interleaving methods
while still performing well as a tool for comparing
ranking systems. For evaluating our proposed
vertical-aware interleaving methods we use real world
click data as well as simulated clicks and simulated
ranking systems.

An Information Processing & Management paper on
burst-aware data fusion for microblog search by
Shangsong Liang and Maarten de Rijke is online now.

We consider the problem of searching posts in microblog
environments. We frame this microblog post search
problem as a late data fusion problem. Previous work on
data fusion has mainly focused on aggregating document
lists based on retrieval status values or ranks of
documents without fully utilizing temporal features of
the set of documents being fused. Additionally,
previous work on data fusion has often worked on the
assumption that only documents that are highly ranked
in many of the lists are likely to be of relevance. We
propose BurstFuseX, a fusion model that not only
utilizes a microblog post’s ranking information
but also exploits its publication time. BurstFuseX
builds on an existing fusion method and rewards posts
that are published in or near a burst of posts that are
highly ranked in many of the lists being aggregated. We
experimentally verify the effectiveness of the proposed
late data fusion algorithm, and demonstrate that in
terms of mean average precision it significantly
outperforms the standard, state-of-the-art fusion
approaches as well as burst or time-sensitive retrieval
methods.

We consider the problem of automatically assessing
Wikipedia article quality. We develop several models to
rank articles by using the editing relations between
articles and editors. First, we create a basic model by
modeling the article-editor network. Then we design
measures of an editor's contribution and build weighted
models that improve the ranking performance. Finally,
we use a combination of featured article information
and the weighted models to obtain the best performance.
We find that using manual evaluation to assist
automatic evaluation is a viable solution for the
article quality assessment task on Wikipedia.

Expressions of emotion abound in user-generated
content, whether it be in blogs, reviews, or on social
media. Much work has been devoted to detecting and
classifying these emotions, but little of it has
acknowledged the fact that emotionally charged text may
express multiple emotions at the same time. We describe
a new dataset of user-generated movie reviews annotated
for emotional expressions, and experimentally validate
two algorithms that can detect multiple emotions in
each sentence of these reviews.

Location search engines are an important part of
GPS-enabled devices such as mobile phones and tablet
computers. In this paper, we study how users behave
when they interact with a location search engine by
analyzing logs from a popular GPS-navigation service to
find out whether mobile users' location search
characteristics differ from those of regular web
search. In particular, we analyze query- and
session-based characteristics and the temporal
distribution of location searches performed on smart
phones and tablet computers. Our findings may be used
to improve the design of search interfaces in order to
help users perform location search more effectively and
improve the overall experience on GPS-enabled mobile
devices.

In the run-up to the Buma Music meets Tech Award at
Noorderslag 2015, an update to our music discovery
demonstrator Streamwatchr has gone live. An
improved interface that is easier on your
device’s battery life, some new
functionality and a Twitter bot called
@lyricswatchr are the most important ingredients
of the update.