SIRIP 2014 Program

Updated: 02 Jul 2014

The goal of SIRIP is to bring together IR researchers,
practitioners, analysts, and consumers and
to achieve knowledge transfer across the
boundaries. Ideally, everyone goes away with new
understanding and at least one new idea to think about or
work on. SIRIP is being held on a separate day from the main
conference and industry people are able to register just for
it.

SIRIP 2014 is a one-day event starting at 0900 on Monday, 07 Jul. The
below information is now organized in program order and
includes start times for individual talks. Speakers
please check your presentation during the break before your
session, and make yourself known to the session chair.

[09.00] Welcome and Introduction to SIRIP 2014

Isabelle Moulinier & David Hawking

Chair Session 1: Mark Sanderson, RMIT University

[09.15] “OK Glass…Google…Why Do I Need Your Search?.”

The rise of wearable technology signifies yet another shift
in the content consumption and generation habits of consumers and
professionals alike. Such devices and solutions have limited user
interaction touch points as a result of smaller form factors and/or
have relatively few number of gestures available given the limited surface
area. Even solutions like Google Glass utilize less than perfect voice
functionality to stretch that functionality into capabilities such as
navigation and search. In fact, one could argue whether user directed
search is even worth creating as a primary use case for these types of
devices. Instead, the explosion in wearable technology is the perfect
inflection point for the rise of anticipatory computing and zero query
search - where in the near future, our preferences, search history, and expectations (taken
From our behaviors on PC’s, laptops, mobile phones, tablets, etc.) will be
largely known ahead of time and pushed to us on these wearable devices
without our need to ask, hence rendering user initiated search
completely irrelevant.

Bob Schukai is the Head of Advanced Product Innovation for Thomson Reuters. In addition to overseeing the development and execution of the mobile growth strategy across the organization, his remit also includes the development new capabilities and experiences around data visualization and predictive analytics for desktop, mobile, wearable devices, and other products. He serves as an ambassador to New York City and the east coast of the United States for the London Tech City initiative, driven by the British government and the United Kingdom Trade and Investment group. This ‘extraordinary’ second job, as he calls it, allows him to share the story of innovation in Britain and why he advocates that ambitious global companies should operate here to benefit from the right environment for success. In the Queen’s New Year Honours list in 2014, Schukai was awarded an MBE for outstanding community service. He leads Thomson Reuters’ efforts in their headline sponsorship of the Apps for Good programme; designed to transform the way technology is taught in schools and to grow future leaders in mobile technology development.

[10.00] Chinese Search Engine - Baidu's Practice

For the past decade, Baidu grew up from a startup to the
world’s biggest Chinese search engine which serves over 600
million users. In the meanwhile, Baidu witnessed dramatically
transform on how people surfing in the internet and
interacting with search engine. The essence of Baidu’s success
is the know-how on Chinese user’s behavior. A typical Chinese
internet user spends about the same amount of time online per
week as a typical US internet user does. But when it comes to
search engine, Chinese internet users have some very unique
characteristics that are different from their global
counterparts. To provide the best way for people to find
information, Baidu is trying to better understand Chinese
users, Chinese queries and Chinese web pages. In this talk, I
will share our finding and discuss challenges in Chinese
search engine. I will also describe Baidu's practice of
developing Chinese search engine.

Dr WANG Haifeng is a vice president of Baidu, the chair of
Department of Language Information Engineering of Peking University,
and a visiting professor of Harbin Institute of Technology. In 1999,
He received his PhD in Computer Science from Harbin Institute of
Technology. Soon after, he worked as an associate researcher at
Microsoft Research China from 1999 to 2000, a research scientist at
iSilk.com (Hong Kong) from 2000 to 2002, and the chief research
scientist and deputy director at Toshiba (China) R&D Center till
Jan. 2010. Dr. Wang is also the immediate past president of the
Association for Computational Linguistics (ACL). He has served as
program chair, workshop chair, tutorial chair, area chair, industry
chair, and sponsorship chair for several top conferences including
SIGIR, ACL, IJCAI, KDD, COLING, IJCNLP etc., as well as associate
editor, guest editor and reviewers for some academic journals.

[10.45] Coffee

Chair Session 2: Vanessa Murdoch, Bing USA

[11.15] “Computer says no.”

We live in the information age where individuals
have instant access to large volumes of information. This
increased accessibility is leading to a significant
transformation that is affecting how the public sector
operates and in particular, information overload is
becoming increasingly common. As a result,
decision-support technology is being employed to decipher
value from amassed information and to better support
decision-makers. Administrative law provides a framework
for ensuring that government decisions are lawful, free
from bias, rational, effective, efficient, open and fair,
and that decision-makers are held
accountable. Administrative law must keep up with changing
technology but equally, fit-for-purpose technology also
needs to be developed for the administrative
environment. In particular, the fairness of decisions
needs to be considered when building increasingly complex
technology for use by administrative decision-makers. For
example, a recommendation made by a machine needs to be
balanced with the decision-maker’s own interpretation of
evidence. A more flexible approach is needed in this
dynamic environment and academics and technologists need
to better understand the constraints of an administrative
environment in order to assist government in maintaining
effective decision-making under the rule of law. What is
required is improved partnerships between government and
the information research and development community to make
sure that government information is exploited for the
benefit of the public, and that systems are developed in a
way that preserves the integrity and fairness of
decision-making.

Dr Maria Milosavljevic is the Chief Information
Officer at the Australian Crime Commission. Her experience
includes senior roles across government, industry and
academia with responsibility for delivering innovative
solutions. As the Program Manager for the Crime
Commission’s Fusion Centre, Dr Milosavljevic delivered new
capabilities across the organisation that resulted in
significantly improved business practices. This included
establishing advanced information exploitation systems and
teams including a new analytics unit. Dr Milosavljevic has
more than 20 years of experience, has published widely and
has created several world-firsts. She completed a PhD in
Language Technology with a scholarship from Microsoft
Research Institute, and is currently completing the ANZSOG
Executive Masters of Public Administration at ANU.

[11.45] Describe, Discover and Deliver — Challenges in making content available in the digital age.

The State Library of New South Wales has recently embarked on an ambitious programme to digitise 52 of its most significant collections and make them available to all. This expansion of collection content available online brings with it many challenges around how we can manage to describe these collections, make them discoverable and to deliver them in ways that make sense to our users. This paper is about the development of a discovery platform for one part of our collections and the lessons we learnt in that process.

Kate Curr is currently Manager, Online Information Services at
the State Library of NSW where she is responsible for
managing the Library's websites and for the Library Management
Systems. She has had a career in special, academic, and
Parliamentary libraries before joining the State Library in 2008.
She has been involved in Library systems, database design and
development and online services since the late 1980s. She has a particular interest in search and discovery for Library
collections and is currently involved with the State Library’s
Digital Excellence Programme which is a programme of work that will
create over 20 million digital objects of the Library’s vast
collections over the next 10 years. This programme of work is also
replacing the technology infrastructure to store, and make these
collections accessible to the world.

[12.15] This ain't your father's search engine

In just a few short years, search has quickly evolved from being a
small text box in the nether regions of a website to being front and
center in our lives. Increasingly, however, search engine technology
is also being used for practical, real time recommendations, events
processing, complex spatial functionality and time series analysis
capable of not only matching user's queries in text, but also driving
real time decision making and analytics. In fact, open source Apache
Lucene can do all of this and more by taking advantage of new data
structures and algorithms that complement more traditional IR
approaches. In this demo-driven talk, Lucene committer Grant
Ingersoll will take a look at some of the new and exciting ways users
are leveraging Lucene and related technology to drive deeper insight
into information needs that go beyond keywords in a text box.

Grant Ingersoll is the CTO and co-founder of LucidWorks as well as an
active member of the Lucene community – a Lucene and Solr committer,
co-founder of the Apache Mahout machine learning project and a long
standing member of the Apache Software Foundation. Grant’s prior
experience includes work at the Center for Natural Language Processing
at Syracuse University in natural language processing and information
retrieval. Grant earned his B.S. from Amherst College in Math and
Computer Science and his M.S. in Computer Science from Syracuse
University. Grant is also the co-author of “Taming Text” from Manning
Publications.

[12.45] Lunch

Chair Session 3: David Harper, Google Europe

[14.00] The Evolution of WTF: Follower Recommendation Services at Twitter

WTF (Who to Follow) is Twitter's user recommendation service, which is
responsible for creating millions of connections daily between users
based on shared interests, common connections, and other related
factors. In this talk I will discuss the evolution of the WTF service:
the first generation architecture depended on a system called
Cassovary, an open-source in-memory graph processing engine built from
scratch by Twitter specifically for WTF. This approach gave way to a
Hadoop-based machine learning framework, which has recently been
supplemented by a custom architecture for generating real-time
recommendations. I will discuss the tradeoffs between different
architectures, provide a general overview of algorithms, and share
lessons learned in running a large-scale production service.

Jimmy Lin is an Associate Professor in the College of Information
Studies (The iSchool) at the University of Maryland, with a joint
appointment in the Institute for Advanced Computer Studies (UMIACS)
and an affiliate appointment in the Department of Computer Science. He
xsgraduated with a Ph.D. in Electrical Engineering and Computer Science
from MIT in 2004. Lin's research lies at the intersection of
information retrieval and natural language processing; his current
work focuses on large-scale distributed algorithms and infrastructure
for data analytics. From 2010-2012, Lin spent an extended sabbatical
at Twitter, where he worked on services designed to surface relevant
content to users and analytics infrastructure to support data
science. He continues to engage with Twitter on various aspects of big
data and data science.

[14.30] Refereed Papers

This year we have solicited scientific papers relating to
industry applications of IR and will run a session of short
papers. This potentially allows for communication of
impactful work to an industry audience, dissemination of
late-breaking news, interesting work on closed data sets,
and scientific evaluation of theories in practice. We hope
to have the accepted papers and invited talk abstracts included in the ACM
Digital Library. In the interim, you can find them
here.

Accepted Papers

[14.30]

Jing Bai, Jan Pedersen and Mao Yang

Web-Scale
Semantic Ranking

Bing, Microsoft

[14.45]

Ramik Sadana, Bongwon Suh, Eunyee Koh, and Yekyung Kim

A Visual Analytics Approach to Summarizing Tweets

Georgia Tech, Seoul National Uni, Adobe

[15.00]

Yangjie Yao and Aixin Sun

Product Name Recognition and Normalization in Internet Forums

Nanyang Tech Uni, Singapore

[15.15]

Ahmed Tawfik and Ahmed Kamel

On the Interaction between Query Language and Query Domain in Cross-Lingual Web Search

Microsoft Egypt

[15.30] Coffee

Chair Session 4: Chengxiang Zhai, U. Illinois

[16.00] Bing Dialog — Toward richer interactions with Web search

As the SIGIR community celebrates the 21st birthday
of web search, the traditional gateway to adulthood,
we are witnessing dramatic changes in how people
interact with search engines. A multi-year
initiative at Microsoft (called Bing Dialog) aims to
support much richer forms of interaction. It aims
to match a user’s search intents to the knowledge
harvested from the web at the semantic level. Aside
from reactively retrieving information and answering
questions, Bing Dialog Model includes additional
dialog acts, such as confirmation, disambiguation,
refinement and digression, that the search engine
can execute proactively to expedite the process of
getting users with the knowledge they
need. Essentially, the search engine becomes a
collaborative dialog agent, such as those explored
in the AI community but with the scale extended to
the entire web. In this talk, we will share our
findings based on the deployment data collected at
Bing EN-US for over 14 months, and discuss the
web-scale engineering challenges, technically
unsolved problems in knowledge acquisition, user
intent inference, behavioral modeling, interaction
management and metric developments.

Kuansan Wang is a Principal Researcher and manager
of the Internet Service Research Center (ISRC) at
Microsoft Research (MSR), Redmond. He joined MSR
Speech Technology Group in 1998, conducting research
in the areas of speech recognition, spoken language
understanding and multimodal dialog. From 2004 to
2007, he was a software architect at speech product
and business incubation groups, helping create and
commercialize a wide range of award winning speech
products for Microsoft. Since 2007, he has been with
MSR ISRC conducting research on web search and
machine learning. Dr. Wang is an active member in
both academic and industrial communities. He has
published more than 50 peered review articles and
140 patents. He is also the author of 6 ISO and 3
W3C standards in the area of speech processing and
voice communications.

[16.30] Panel: Billionaire or Bust? Commercializing IR Research

Many academic researchers have been involved in efforts to
commercialize their research through start-ups, spin-offs,
or licensing. We are lining up a sample of them to discuss
topics such as: What prompted them to make this move?
(Have you thought of doing it yourself?) How did they
choose between investment and organic growth? Were they
able to secure funding without risking their house? What
are the hurdles, traps and opportunities? What lessons did
they learn? Are they rich now?

Stuart Beil will moderate the panel. He is Senior Policy
Advisor to the Hon. Ian Walker MP, Qld Minister for Science,
Information Technology, Innovation and the Arts. Stuart is an
experienced company director and senior executive, having worked in
both industry and government. He was previously Chairman and Executive
Director of Funnelback Pty Ltd, a web and enterprise search engine
company he founded and sold. Stuart was also General Manager,
Commercialisation at Australia's premier science agency CSIRO where he
was responsible for commercialising CSIRO intellectual property. He
has worked in the financial markets, including at the Sydney Futures
Exchange.

Confirmed Panelists

Michael Cameron

Australia

Michael Cameron is the co-founder of Rome2rio, based in Melbourne, Australia. Rome2rio is organising the world's transport information and offers a multi-modal, door-to-door travel search engine. It returns itineraries for air, train, coach, ferry, mass transit and driving options to and from any location. Michael has a PhD from RMIT University and worked for three years as a senior engineer on Microsoft's Bing search engine.

Arjen de Vries

Netherlands

Arjen P. de Vries leads the Information Access research group at the Centrum Wiskunde & Informatica (CWI) in Amsterdam. He also holds a part-time full professor position at Delft University of Technology. In November 2009, he co-founded CWI spin-off company Spinque to satisfy his interest into the integration of information retrieval and databases. Spinque develops novel search solutions based on "Search by Strategy", an iterative 2-stage search process that separates search strategy deﬁnition (the how) from actual searching and browsing the collection (the what). This way, information specialists can reclaim their expertise in a time dominated by a "do-it-yourself" attitude to search. The technology builds on research in information retrieval (probabilistic relational algebra) and database architecture (column-stores), to turn the engineering of tailored search engines into a simple, flexible and efﬁcient process.

David Lewis

USA

Dave Lewis, Ph.D. is a consulting computer scientist and expert witness working in the areas of information retrieval, data mining, natural language processing, and the evaluation of complex information systems. He formerly held research positions at AT&T Labs, Bell Labs, and the University of Chicago. He was the co-founder of Ornarose, Inc., a data mining software company, and has served as a consultant or advisor to a number of start-up companies. He is a Fellow of the American Association for the Advancement of Science.

Tetsuya Sakai

Japan

1993-2007

Researcher at the Toshiba R&D Center (2000-2001 Postdoc visiting researcher at the University of Cambridge)

Jaime Teevan and Hang Li are the chairs for next year. This is
their opportunity to introduce themselves and to hear any
thoughts you have on the format of this year's
track.
For example, did you like:

Holding it on a separate day to the main conference and
encouraging participation by local industry

Including presentations from IR consumers / practitioners
rather than just researchers and technologists

Including a refereed papers section

Including a panel

Of course you're very welcome to mail comments and suggestions to the email
alias below and we'll pass them on.