Workshop on Data, Text, Web, and Social Network Mining

Friday, April 23, 2010

9:30 AM - 6 PM

Sponsored by Yahoo!, CSE, and SI

Announcements

April 11, 2010: The complete workshop program is now available in MS word format and PDF .

April 7, 2010: The workshop is full. Please register to be added to the waiting list. You will be notified by April 16 whether we have a space for you.

March 31, 2010: The deadline to register as an attendee is April 16. However, space is limited and we already have 70 registrants. We may need to close registration earlier, so please register asap - a registration page is available at http://eecs.umich.edu/dm10/register.php .

The deadline to register as a speaker is March 31. Please follow
the instructions below.

All times listed are still tentative. The final schedule will be
announced by April 16.

Introduction

Over the last several years, the research community at the University
of Michigan focused on mining large amounts of data (whether
structured, semi-structured, textual, or multimedia) has grown
significantly. Faculty interested in developing new data mining
techniques are now hosted in several units, including Computer Science
and Engineering, Information, Statistics, Linguistics, and
Mathematics, and also several domain units in the natural sciences,
medical sciences, social sciences, and humanities, with faculty
interested in the use of data mining techniques to advance science in
their domain.
The goal of this workshop is to bring this group of people together
and to set the agenda for research in the next 10 years and beyond.

Who is invited?

All UM faculty and graduate students working in the fields of text and
data mining, broadly construed to include models and technologies for
statistical data analysis, Web search technology, analysis of user
behavior, social network analysis, data visualization, etc. as well as
related areas. External visitors are also welcome to attend.

How to participate

All faculty doing research in data and text mining and related
areas get an automatic lab overview slot to describe their work and
their interests in the field. Email your lab's name and/or talk title
to dm2010@umich.edu by March 31, 2010 to reserve your spot.

In addition to overview slots, we offer faculty and graduate
students the opportunity to present other work in a range of formats.
Email dm2010@umich.edu by March 31, 2010 and indicate the type of slot
you are interested in: technical presentation, poster, or demo. We
will need an abstract and title, list of authors, as well as a short
introduction specifying whether this talk was presented elsewhere
(e.g., your most recent SIGMOD or SIGIR talk). If the talk is based on
an existing paper (whether published or not), attach the paper as
well.

Additional graduate student demos and posters will be presented
during the afternoon reception. Email dm2010@umich.edu by March 31,
2010 a list of poster titles and the persons presenting them.

Invited Speaker

Raghu Ramakrishnan, Chief Scientist for Audience & Cloud Computing,
and Fellow, Yahoo!: Building and Searching a Web of Concepts

Workshop Program

The workshop consists of invited talk, faculty presentations, discussions, and a poster session. The full program is accessible in MS word format and PDF .

Invited Talk

Raghu Ramakrishnan: Building and Searching a Web of Concepts

Abstract

Search engines are increasingly offering results that are based on a
semantically rich interpretation of the user's intent and the content
available to satisfy that intent. A natural question is to ask how far
along we are in understanding content on the web. The Semantic Web seeks
to enable publication of data with rich markups that facilitate
automated interpretation; Yahoo!'s Search Monkey is an example of a
service in this spirit. However, there is much useful data that is not
semantically marked up, and many domains in which the coverage of
existing structured data feeds is low. In this talk, I will discuss the
goal of constructing a web of "concepts" (a term I use to denote
entities, categories of entities, and relationships) by starting with
the current view of the web (as a collection of hyperlinked pages, or
documents, each seen as a bag of words).

We need to extract concept-centric metadata for a broad and deep set of
important concepts, and stitch it together to create a semantically rich
aggregate view of all the information available on the web for each
concept instance. The goal of building and maintaining such a web of
concepts presents many challenges, but also offers the promise of
enabling many powerful applications, including novel search and
information discovery paradigms. In this talk, I will describe a
research agenda towards this goal and discuss related work, including
the PSOX project at Yahoo!.

Bio

Raghu Ramakrishnan is Chief Scientist for Audience and Cloud Computing
at Yahoo!, and a Yahoo! Fellow, Building and Searching a Web of
Concepts. His work has influenced query optimization in commercial
database systems and the design of window functions in SQL:1999. His
paper on the Birch clustering algorithm received the SIGMOD 10-Year
Test-of-Time award, and he has written the widely-used text "Database
Management Systems" (with Johannes Gehrke). Ramakrishnan is a Fellow
of the ACM and IEEE, and has received several awards, including the
ACM SIGKDD Innovations Award, the ACM SIGMOD Contributions Award, a
Distinguished Alumnus Award from IIT Madras, a Packard Foundation
Fellowship in Science and Engineering, and an NSF Presidential Young
Investigator Award. He is Chair of ACM SIGMOD, on the Board of
Directors of ACM SIGKDD and the Board of Trustees of the VLDB
Endowment. Ramakrishnan was Professor of Computer Sciences at the
University of Wisconsin-Madison, and founder and CTO of QUIQ, a
company that pioneered question-answering communities, powering Ask
Jeeves' AnswerPoint as well as customer-support for companies such as
Compaq. Raghu Ramakrishnan got his B.Tech. from IIT Madras in 1983 and
his Ph.D. from the University of Texas at Austin in 1987.