Video organizes paper

By
Kimberly Patch,
Technology Research NewsWith
the notion of the paperless office fading into history, researchers from
the University of Washington are working to more closely integrate the paper
world -- still on the rise -- with the world of electronic data.

The researchers' system uses a computer and overhead video camera
to track physical documents on a desk and automatically link them to appropriate
electronic documents. The researchers have constructed a pair of prototypes
that track paper documents and sort photos.

One advantage of the system is that it doesn't require any special
tags, paper or marks, said Jiwon Kim, a researcher at the University of
Washington. "Our work... allows the user to keep using the old paper while
only adding a single video camera to the environment," she said.

The paper-tracking system allows users to pinpoint the location
of a given document within a stack of documents on the desktop. Users can
find documents by using keywords, appearance or by how recently a paper
was moved. The photo-sorting application allows users to sort digital photographs
using printouts of the photos.

The use of paper is ever on the increase because the user interfaces
of paper and electronic documents complement each other, said Kim. "Paper
enables tangible interactions and therefore is more intuitive to use, while
electronic documents are convenient for editing, sharing [and] indexing,"
she said. Currently, however "these two formats are decoupled from each
other, making it hard to take advantage of the conveniences of both media."

The researchers were able to better integrate the two formats by
taking advantage of recent breakthroughs in computer vision techniques that
allow for fast and reliable object recognition, said Kim.

The researchers' system uses a combination of computer vision techniques
to infer the structure of a stack of papers. "Such a video sequence usually
differs from regular video in that the scene changes infrequently," said
Kim. The user moves paper X from stack A to stack B, and after a pause moves
paper Y from stack A to stack C, and so forth, she said. "Therefore, we
first split the video into these individual movements... then we interpret
each event to figure out which document moved."

After such events are processed, the system reconstructs the structure
of the paper stacks and is ready to answer such queries as "Where is my
W-2 form?", said Kim.

The user can interact with the system in several ways, said Kim.
Users can choose a document from a group of thumbnails on the computer screen
or can perform keyword searches on the title or author of the document.
The system shows the user the location of the document by showing the desktop
stack containing the document, then expanding the stack and highlighting
the document in red.

The system can also begin with a desk that is already full of documents,
said Kim. "We didn't want to force the user to start with an empty desk,"
she said. "Instead, the system gradually discovers the paper documents on
the desk over time, as the user moves them around."

Users can also browse desktops in remote locations by clicking and
dragging on an image of the remote desk, said Kim.

Although the video image resolution is not high enough for a human
observer to read the text, making it difficult to distinguish documents
with a similar layout, the researchers used an existing feature-based object
recognition technique, dubbed Scale-Invariant Feature Transform, that was
able to differentiate them, said Kim.

The researchers developed another algorithm that models the evolution
of paper stacks by modeling it as a sequence of graphs.

The researchers' first prototype application allows users to find
physical documents buried in stacks of documents on a desk. "The user can
issue queries like 'where is the paper written by John Smith?', or 'this
looks like the thumbnail of my tax form. Where is it?'," said Kim.

The researchers' second prototype application allows users to more
easily sort digital photographs. "Sorting a large number of digital photographs
is not an easy task [because it entails] having to drag-and-drop each file
into different folders," said Kim. "In contrast, people are adept at hand-sorting
printed photographs into piles on a desk."

The researchers printed out digital photographs on paper and recorded
a video of the user sorting them into physical stacks on the desk. The system
automatically organizes the corresponding digital photographs into folders
on the computer corresponding to the physical grouping on the desk.

This way of merging the digital and physical worlds has the potential
to prove useful beyond just tracking the locations of physical objects,
said Kim. The system could be taken further to allow for querying the history
of documents, lifting written annotations from document surfaces, recognizing
text, and attaching reminders to documents, she said.

A similar tracking and recognition system could be applied to objects
other than documents, said Kim. "Applications may range from finding lost
objects [like a] key, pen or wallet, to indexing books or CDs on the shelf,
to keeping track of items in supermarkets or warehouses," she said. "In
many of these cases computer vision-based tracking could be used in combination
with other dedicated technologies like radio frequency ID tags."

Radio frequency ID tags are small computer chips that, when hit
with a certain frequency radiowave, use the energy from the radiowave to
emit a unique identification number.

The researchers are working on improving the existing system to
handle a wider range of user interactions with documents, to optimize the
performance of the video analysis engine so the input video can be processed
in real-time, and to build other applications, said Kim. "We can imagine
supporting queries like 'find me all documents I haven't used for the past
three weeks so I can clean them off the desk' or 'find me all documents
that look similar to this credit card bill so I can file them together',"
she said.

The system can be used now for applications similar to the researchers'
prototype applications. The system could be ready for practical use on general
desktops in three to four years, said Kim.

Kim's research colleagues were Steven M. Seitz and Maneesh Agrawala.
The researchers presented their work at User Interface Software and Technology
2004 (UIST '04), held in Santa Fe, New Mexico, October 24 to 27, 2004. The
research was funded by the National Science Foundation (NSF) and Intel Corporation.