[PAST EVENT] Low-cost Training and Evaluation of Search Engines

Location

Search engines often require large-scale human labor to improve and evaluate their services. In this talk, we introduce research toward reducing such human labor requirements. We describe a low-cost technique for training search engines called similarity-based distant supervision (SDS), which can automatically generate relevance judgments -- the labels needed for training search result ranking models. The SDS method first locates a relevant query-document pair from well-established knowledge bases such as Wikipedia, and then uses the located document as a "distant" example to assess the relevance of search results in a target corpus. We examined both unsupervised and supervised SDS techniques -- the former generates training labels from scratch without any human labor, while the latter learns from small-scale human judgments to generate automatic judgments of better accuracy. We also introduce low-cost methods for evaluating user experience with search engines, something that normally requires users' explicit feedback. The techniques predict user experience measures such as satisfaction based on search logs and user modeling, yielding better agreement with real user experience than standard evaluation measures such as average precision and NDCG, but requiring little extra human labor.

Bio

Jiepu Jiang is a Ph.D. candidate at the University of Massachusetts Amherst. His research interests lie at the intersection of information retrieval (IR), human-computer interaction (HCI), and natural language processing. Jiepu has published over 30 articles in conferences such as SIGIR, WWW, and WSDM. He has served as PC member and reviewed for many IR and HCI conferences.