Archer: Query-Driven Machine Learning

Archer: Query-Driven Machine Learning

In the Archer project we develop techniques for adapting analytics in response to a query as opposed to general computation. Instead of doing a SELECT * FROM Table, as with typical machine learning problems, we are integrating selection style queries, SELECT * FROM Table WHERE X, into typical analytics.

Knowledge Base Acceleration

Wikipedia is the go to knowledge base for information on events, people and scores of other topics. Wikipedia is collaboratively edited but the number of editors is far below the number of entities so it often takes a long time for important information to be added to the knowledge base.

Query-Driven Entity Resolution

Entity resolution (ER) is the process of determining records (mentions) in a database that correspond to the same real-world entity. Leading ER systems solve this problem by resolving every record in the database; however, for large datasets this is an expensive process. Moreover, such approaches are wasteful because in practice, users are interested in only one or a small subset of the entities mentioned in the database. In this work, we introduce new classes of SQL queries involving ER operators — selection-driven ER and join-driven ER. We develop novel variations of Metropolis Hastings algorithm and introduce selectivity-based scheduling algorithms to support the two classes of ER queries.