Abstract:

While there are powerful keyword search systems that index all kinds of resources
including emails and web pages, people have trouble recalling semantic facts such as
the name, location, edit dates and keywords that uniquely identifies resources in their
personal repositories. Reusing information exasperates this problem. A rarely used
approach is to leverage episodic memory of file provenance. Provenance is
traditionally defined as "the history of ownership of a valued object". In terms of
documents, we consider not only the ownership, but also the operations performed on
the document, especially those that related it to other people, events, or resources. This
thesis investigates the potential advantages of using provenance data in desktop
search, and consists of two manuscripts. First, a numerical analysis using field data
from a longitudinal study shows that provenance information can effectively be used
to identify files and resources in realistic repositories. We introduce the Leyline, the
first provenance-based search system that supports dynamic relations between files
and resources such as copy/paste, save as, file rename. The Leyline allows users to
search by drawing search queries as graphs in a sketchpad. The Leyline overlays
provenance information that may help users identify targets or explore information
flow. A limited controlled experiment showed that this approach is feasible in terms of
time and effort. Second, we explore the design of the Leyline, compare it to previous provenance-based desktop search systems, including their underlying assumptions and
focus, search coverage and flexibility, and features and limitations.