Abstract

We previously created the HyperMeeting system to support a chain of geographically and temporally distributed meetings in the form of a hypervideo. This paper focuses on playback plans that guide users through the recorded meeting content by automatically following available hyperlinks. Our system generates playback plans based on users' interests or prior meeting attendance and presents a dialog that lets users select the most appropriate plan. Prior experience with playback plans revealed users' confusion with automatic link following within a sequence of meetings. To address this issue, we designed three timeline visualizations of playback plans. A user study comparing the timeline designs indicated that different visualizations are preferred for different tasks, making switching among them important. The study also provided insights that will guide research of personalized hypervideo, both inside and outside a meeting context.

Abstract

While synchronous meetings are an important part of collaboration, it is not always possible for all stakeholders to meet at the same time. We created the concept of hypermeetings to support meetings with asynchronous attendance. Hypermeetings consist of a chain of video-recorded meetings with hyperlinks for navigating through the video content. HyperMeeting supports the synchronized viewing of prior meetings during a videoconference. Natural viewing behavior such as pausing generates hyperlinks between the previously recorded meetings and the current video recording. During playback, automatic link-following guided by playback plans present the relevant content to users. Playback plans take into account the user's meeting attendance and viewing history and match them with features such as speaker segmentation. A user study showed that participants found hyperlinks useful but did not always understand where they would take them. The study results provide a good basis for future system improvements.

Abstract

Distributed teams must co-ordinate a variety of tasks. To do so they need to be able to create, share, and annotate documents as well as discuss plans and goals. Many workflow tools support document sharing, while other tools support videoconferencing, however there exists little support for connecting the two. In this work we describe a system that allows users to share and markup content during web meetings. This shared content can provide important conversational props within the context of a meeting; it can also help users review archived meetings. Users can also extract shared content from meetings directly into other workflow tools.

Abstract

Media Embedded Target, or MET, is an iconic mark printed in a blank margin of a page that indicates a media link is associated with a nearby region of the page. It guides the user to capture the region and thus retrieve the associated link through visual search within indexed content. The target also serves to separate page regions with media links from other regions of the page. The capture application on the cell phone displays a sight having the same shape as the target near the edge of a camera-view display. The user moves the phone to align the sight with the target printed on the page. Once the system detects correct sight-target alignment, the region in the camera view is captured and sent to the recognition engine which identifies the image and causes the associated media to be displayed on the phone. Since target and sight alignment defines a capture region, this approach saves storage by only indexing visual features in the predefined capture region, rather than indexing the entire page. Target-sight alignment assures that the indexed region is fully captured. We compare the use of MET for guiding capture with two standard methods: one that uses a logo to indicate that media content is available and text to define the capture region and another that explicitly indicates the capture region using a visible boundary mark.

Abstract

When searching or browsing documents, the genre of a document is an important consideration that complements topical characterization. We examine design considerations for automatic tagging of office document pages with genre membership. These include selecting
features that characterize genre-related information in office documents, examining the utility of text-based features and image-based features, and proposing a simple ensemble method to improve genre identification performance. In the open-set identification of four office document genres, our experiments show that when combined with image-based
features, text-based features do not significantly influence performance. These results provide support for a
topic-independent approach to genre identification of office documents. Experiments also show that our simple ensemble method significantly improves performance relative to using a support vector machine (SVM) classifier alone.
We demonstrate the utility of our approach by integrating our automatic genre tags in a faceted search and browsing application for office document collections.

Abstract

Modern office work practices increasingly breach traditional boundaries of time and place, making it difficult to interact with colleagues. To address these problems, we developed myUnity, a software and sensor platform that enables rich workplace awareness and coordination. myUnity is an integrated platform that collects information from a set of independent sensors and external data aggregators to report user location, availability, tasks, and communication channels. myUnity's sensing architecture is component-based,
allowing channels of awareness information to be added, updated, or removed at any time. Multiple channels of input are combined and composited into a single, high-level presence state. Early studies of a myUnity deployment have demonstrated that the platform allows quick access to core awareness information and show that it has become a useful tool for supporting communication and collaboration in the modern workplace.

Abstract

Embedded Media Markers (EMMs) are nearly transparent icons printed on paper documents that link to associated digital media. By using the document content for retrieval, EMMs are less visually intrusive than barcodes and other glyphs while still providing an indication for the presence of links. An initial implementation demonstrated good overall performance but exposed difficulties in guaranteeing the creation of unambiguous EMMs. We developed an EMM authoring tool that supports the interactive authoring of EMMs via visualizations that show the user which areas on a page may cause recognition errors and automatic feedback that moves the authored EMM away from those areas. The authoring tool and the techniques it relies on have been applied to corpora with different visual characteristics to explore the generality of our approach.

Abstract

Modern office work practices increasingly breach traditional boundaries of time and place, making it difficult to interact with colleagues. To address these problems, we developed myUnity, a software and sensor platform that enables rich workplace awareness and coordination. myUnity is an integrated platform that collects information from a set of independent sensors and external data aggregators to report user location, availability, tasks, and communication channels. myUnity's sensing architecture is component-based, allowing channels of awareness information to be added, updated, or removed at any time. Our current system includes a variety of sensor and data input, including camera-based activity classification, wireless location trilateration, and network activity monitoring. These and other input channels are combined and composited into a single, high-level presence state. Early studies of a myUnity deployment have demonstrated that use of the platform allows quick access to core awareness information and show it has become a useful tool supporting communication and collaboration in the modern workplace.

Abstract

Modern office work practices increasingly breach traditional boundaries of time and place, increasing breakdowns workers encounter when coordinating interaction with colleagues. We conducted interviews with 12 workers and identified key problems introduced by these practices. To address these problems we developed myUnity, a fully functional platform enabling rich workplace awareness and coordination. myUnity is one of the first integrated platforms to span mobile and desktop environments, both in terms of access and sensing. It uses multiple sources to report user location, availability, tasks, and communication channels. A pilot field study of myUnity demonstrated the significant value of pervasive access to workplace awareness and communication facilities, as well as positive behavioral change in day-to-day communication practices for most users. We present resulting insights about the utility of awareness technology in flexible work environments.

Abstract

User-generated video from mobile phones, digital cameras, and other devices is increasing, yet people rarely want to watch all the captured video. More commonly, users want a single still image for printing or a short clip from the video for creating a panorama or for sharing. Our interface aims to help users search through video for these images or clips in a more efficient fashion than fast-forwarding or "scrubbing" through a video by dragging through locations on a slider. It is based on a hierarchical structure of keyframes in the video, and combines a novel user interface design for browsing a video segment tree with new algorithms for keyframe selection, segment identification, and clustering. These algorithms take into account the need for quality keyframes and balance the desire for short navigation paths and similarity-based clusters. Our user interface presents keyframe hierarchies and displays visual cues for keeping the user oriented while browsing the video. The system adapts to the task by using a non-temporal clustering algorithm when a the user wants a single image. When the user wants a video clip, the system selects one of two temporal clustering algorithm based on a measure of the repetitiveness of the video. User feedback provided us with valuable suggestions for improvements to our system.

Abstract

Embedded Media Marker (EMM) identification system allows users to retrieve relevant dynamic media associated with a static paper document via camera phones. The user supplies a query image by capturing an EMM-signified patch of a paper document through a camera phone; the system recognizes the query and in turn retrieves and plays the corresponding media on the phone. Accurate image matching is crucial for positive user experience in this application. To address the challenges posed by large datasets and variations in camera-phone-captured query images, we introduce a novel image matching scheme based on geometrically consistent correspondences. Two matching constraints - "injection" and "approximate global geometric consistency" (AGGC), which are unique in EMM identification, are presented. A hierarchical scheme, combined with two constraining functions, is designed to detect the "injective-AGGC" correspondences between images. A spatial neighborhood search approach is further proposed to address challenging cases with large translational shift. Experimental results on a 100k+ dataset show that our solution achieves high accuracy with low memory and time complexity and outperforms the standard bag-of-words approach.

Abstract

This paper describes research activities at FX Palo Alto Laboratory (FXPAL) in the area of multimedia browsing, search, and retrieval. We first consider interfaces for organization and management of personal photo collections. We then survey our work on interactive video search and retrieval. Throughout we discuss the evolution of both the research challenges in these areas and our proposed solutions.

Abstract

Photo libraries are growing in quantity and size, requiring better support for locating desired photographs. MediaGLOW is an interactive visual workspace designed to address this concern. It uses attributes such as visual appearance, GPS locations, user-assigned tags, and dates to filter and group photos. An automatic layout algorithm positions photos with similar attributes near each other to support users in serendipitously finding multiple relevant photos. In addition, the system can explicitly select photos similar to specified photos. We conducted a user evaluation to determine the benefit provided by similarity layout and the relative advantages offered by the different layout similarity criteria and attribute filters. Study participants had to locate photos matching probe statements. In some tasks, participants were restricted to a single layout similarity criterion and filter option. Participants used multiple attributes to filter photos. Layout by similarity without additional filters turned out to be one of the most used strategies and was especially beneficial for geographical similarity. Lastly, the relative appropriateness of the single similarity criterion to the probe significantly affected retrieval performance.

Abstract

Browsing and searching for documents in large, online enterprise document repositories are common activities. While internet search produces satisfying results for most user queries, enterprise search has not been as successful because of differences in document types and user requirements. To support users in finding the information they need in their online enterprise repository, we created DocuBrowse, a faceted document browsing and search system.
Search results are presented within the user-created document hierarchy, showing only directories and documents matching selected facets and containing text query terms. In addition to file properties such as date and file size, automatically detected document types, or genres, serve as one of the search facets. Highlighting draws the user’s attention to the most promising directories and documents while thumbnail images and automatically identified keyphrases help select appropriate documents. DocuBrowse utilizes document similarities, browsing histories, and recommender system techniques to suggest additional promising documents for the current facet and content filters.

Abstract

Browsing and searching for documents in large, online enterprise document repositories is an increasingly common problem. While users are familiar and usually satisfied with Internet search results for information, enterprise search has not been as successful because of differences in data types and user requirements. To support users in finding the information they need from electronic and scanned documents in their online enterprise repository, we created an automatic detector for genres such as papers, slides, tables, and photos. Several of those genres correspond roughly to file name extensions but are identified automatically using features of the document. This genre identifier plays an important role in our faceted document browsing and search system. The system presents documents in a hierarchy as typically found in enterprise document collections. Documents and directories are filtered to show only documents matching selected facets and containing optional query terms and to highlight promising directories. Thumbnail images and automatically identified keyphrases help select desired documents.

Abstract

Hyper-Hitchcock consists of three components for creating and viewing a form of interactive video called detail-on-demand video: a hypervideo editor, a hypervideo player, and algorithms for automatically generating hypervideo summaries. Detail-on-demand video is a form of hypervideo that supports one hyperlink at a time for navigating between video sequences. The Hyper-Hitchcock editor enables authoring of detail-on-demand video without programming and uses video processing to aid in the authoring process. The Hyper-Hitchcock player uses labels and keyframes to support navigation through and back hyperlinks. Hyper-Hitchcock includes techniques for automatically generating hypervideo summaries of one or more videos that take the form of multiple linear summaries of different lengths with links from the shorter to the longer summaries. User studies on authoring and viewing provided insight into the various roles of links in hypervideo and found that player interface design greatly affects people's understanding of hypervideo structure and the video they access.

Abstract

WebNC is a browser plugin that leverages the Document Object
Model for efficiently sharing web browser windows or recording web browsing sessions to be replayed later. Unlike existing screen-sharing or screencasting tools, WebNC is optimized to work with web pages where a lot of scrolling happens. Rendered pages are captured as image tiles, and transmitted to a central server through http post. Viewers can watch the webcasts in realtime or asynchronously using a standard web browser: WebNC only relies on html and javascript to reproduce the captured web content. Along with the visual content of web pages, WebNC also captures their layout and textual content for later retrieval. The resulting webcasts require very little bandwidth, are viewable on any modern web browser including the iPhone and Android phones, and are searchable by keyword.

Abstract

In 2008 FXPAL submitted results for two tasks: rushes summarization and interactive search. The rushes summarization task has been described at the ACM Multimedia workshop [1]. Interested readers are referred to that publication for details. We describe our interactive search experiments in this notebook paper.

Abstract

We designed an interactive visual workspace, MediaGLOW, that supports users in organizing personal and shared photo collections. The system interactively places photos with a spring layout algorithm using similarity measures based on visual, temporal, and geographic features. These similarity
measures are also used for the retrieval of additional photos. Unlike traditional spring-based algorithms, our approach provides users with several means to adapt the layout to their tasks. Users can group photos in stacks that in turn attract neighborhoods of similar photos. Neighborhoods partition the workspace by severing connections outside the neighborhood. By placing photos into the same stack, users can express a desired organization that the system can use to learn a neighborhood-specific combination of distances.

Abstract

We have developed an interactive video search system that allows the searcher to rapidly assess query results and easily pivot off those results to form new queries. The system is intended to maximize the use of the discriminative power of the human searcher. The typical video search scenario we consider has a single searcher with the ability to search with text and content-based queries. In this paper, we evaluate a new collaborative modification of our search system. Using our system, two or more users with a common information need search together, simultaneously. The collaborative system provides tools, user interfaces and, most importantly, algorithmically-mediated retrieval to focus, enhance and augment the team's search and communication activities. In our evaluations, algorithmic mediation improved the collaborative performance of both retrieval (allowing a team of searchers to find relevant information more efficiently and effectively), and exploration (allowing the searchers to find relevant information that cannot be found while working individually). We present analysis and conclusions from comparative evaluations of the search system.

Abstract

Retail establishments want to know about traffic flow and patterns of activity in order to better arrange and staff their business. A large number of fixed video cameras are commonly installed at these locations. While they can be used to observe activity in the retail environment, assigning personnel to this is too time consuming to be valuable for retail analysis. We have developed video processing and visualization techniques that generate presentations appropriate for examining traffic flow and changes in activity at different times of the day. Taking the results of video tracking software as input, our system aggregates activity in different regions of the area being analyzed, determines the average speed of moving objects in the region, and segments time based on significant changes in the quantity and/or location of activity. Visualizations present the results as heat maps to show activity and object counts and average velocities overlaid on the map of the space.

Abstract

DOTS (Dynamic Object Tracking System) is an indoor, real-time, multi-camera surveillance system, deployed in a real office setting. DOTS combines video analysis and user interface components to enable security personnel to effectively monitor views of interest and to perform tasks such as tracking a person. The video analysis component performs feature-level foreground segmentation with reliable results even under complex conditions. It incorporates an efficient greedy-search approach for tracking multiple people through occlusion and combines results from individual cameras into multi-camera trajectories. The user interface draws the users' attention to important events that are indexed for easy reference. Different views within the user interface provide spatial information for easier navigation. DOTS, with over twenty video cameras installed in hallways and other public spaces in our office building, has been in constant use for a year. Our experiences led to many changes that improved performance in all system components.

Abstract

Our analysis and visualization tools use 3D building geometry
to support surveillance tasks. These tools are part of DOTS,
our multicamera surveillance system; a system with over 20
cameras spread throughout the public spaces of our building.
The geometric input to DOTS is a floor plan and information
such as cubicle wall heights. From this input we construct
a 3D model and an enhanced 2D floor plan that are the bases
for more specific visualization and analysis tools. Foreground
objects of interest can be placed within these models and dynamically
updated in real time across camera views. Alternatively,
a virtual first-person view suggests what a tracked person
can see as she moves about. Interactive visualization tools
support complex camera-placement tasks. Extrinsic camera
calibration is supported both by visualizations of parameter
adjustment results and by methods for establishing correspondences
between image features and the 3D model.

Abstract

DOTS (Dynamic Object Tracking System) is an indoor, real-time, multi-camera surveillance system, deployed in a real office setting. DOTS combines video analysis and user interface components to enable security personnel to effectively monitor views of interest and to perform tasks such as tracking a person. The video analysis component performs feature-level foreground segmentation with reliable results even under complex conditions. It incorporates an efficient greedy-search approach for tracking multiple people through occlusion and combines results from individual cameras into multi-camera trajectories. The user interface draws the users' attention to important events that are indexed for easy reference. Different views within the user interface provide spatial information for easier navigation. DOTS, with over twenty video cameras installed in hallways and other public spaces in our office building, has been in constant use for a year. Our experiences led to many changes that improved performance in all system components.

Abstract

We describe a new interaction technique that allows users to control nonlinear video playback by directly manipulating objects seen in the video. This interaction technique is simi-lar to video "scrubbing" where the user adjusts the playback time by moving the mouse along a slider. Our approach is superior to variable-scale scrubbing in that the user can con-centrate on interesting objects and does not have to guess how long the objects will stay in view. Our method relies on a video tracking system that tracks objects in fixed cameras, maps them into 3D space, and handles hand-offs between cameras. In addition to dragging objects visible in video windows, users may also drag iconic object representations on a floor plan. In that case, the best video views are se-lected for the dragged objects.