Abstract

As video-mediated communication reaches broad adoption, improving immersion and social interaction are important areas of focus in the design of tools for exploration and work-based communication. Here we present three threads of research focused on developing new ways of enabling exploration of a remote environment and interacting with the people and artifacts therein.

Abstract

Tutorials are one of the most fundamental means of conveying knowledge. In this paper, we present a suite of applications that allow users to combine different types of media captured from handheld, standalone, or wearable devices to create multimedia tutorials. We conducted a study comparing standalone (camera on tripod) versus wearable capture (Google Glass). The results show that tutorial authors have a slight preference for wearable capture devices, especially when recording activities involving larger objects.

Abstract

Video Text Retouch is a technique for retouching textual content found in many online videos such as screencasts, recorded presentations and many online e-learning videos. Viewed through our special, HTML5-based player, users can edit in real-time the textual content of the video frames, such as correcting
typos or inserting new words between existing characters. Edits are overlaid and tracked at the desired position for as long as the original video content remains similar. We describe the interaction techniques, image processing algorithms and give implementation details of the system.

Abstract

Distributed teams must co-ordinate a variety of tasks. To do so they need to be able to create, share, and annotate documents as well as discuss plans and goals. Many workflow tools support document sharing, while other tools support videoconferencing, however there exists little support for connecting the two. In this work we describe a system that allows users to share and markup content during web meetings. This shared content can provide important conversational props within the context of a meeting; it can also help users review archived meetings. Users can also extract shared content from meetings directly into other workflow tools.

Abstract

Online video is incredibly rich. A 15-minute home improvement YouTube tutorial might include 1500 words of narration, 100 or more significant keyframes showing a visual change from multiple perspectives, several animated objects, references to other examples, a tool list, comments from viewers and a host of other metadata. Furthermore, video accounts for 90% of worldwide Internet traffic. However, it is our observation that video is not widely seen as a full-fledged document; dismissed as a media that, at worst, gilds over substance and, at best, simply augments text-based communications. In this piece, we suggest that negative attitudes toward multimedia documents that include audio and video are largely unfounded and arise mostly because we lack the necessary tools to treat video content as first-order media or to support seamlessly mixing media.

Abstract

Video content creators invest enormous effort
creating work that is in turn typically viewed passively.
However, learning tasks using video requires users
not only to consume the content but also to engage,
interact with, and repurpose it. Furthermore, to
promote learning with video in domains where content
creators are not necessarily videographers, it is
important that capture tools facilitate creation of
interactive content. In this paper, we describe some
early experiments toward this goal. A literature review
coupled with formative field studies led to a system
design that can incorporate a broad set of
video-creation and interaction styles.

Abstract

Video tends to be imbalanced as a medium. Typically, content creators invest enormous effort creating work that is then watched passively. However, learning tasks require that users not only consume video but also engage, interact with, and repurpose content. Furthermore, to promote learning across domains where content creators are not necessarily videographers, it is important that capture tools facilitate creation of interactive content. In this paper, we describe some early experiments toward this goal. Specifically, we describe a needfinding study involving interviews with amateur video creators as well as our experience with an early prototype to support expository capture and access. Our findings led to a system redesign that can incorporate a broad set of video-creation and interaction styles.

Abstract

Unlike text, copying and pasting parts of video documents is challenging. Yet, the huge amount of video documents now available in the form of how-to tutorials begs for simpler techniques that allow users to easily copy and paste fragments of video materials into new documents. We describe new direct video manipulation techniques that allow users to quickly copy and paste content from video documents such as how-to tutorials into a new document. While the video plays, users interact with the video canvas to select text regions, scrollable regions, slide sequences built up across many frames, or semantically meaningful regions such as dialog boxes. Instead of relying on the timeline to accurately select sub-parts of the video document, users navigate using familiar selection techniques such as mouse-wheel to scroll back and forward over a video region where content scrolls, double-clicks over rectangular regions to select them, or clicks and drags over textual regions of the video canvas to select them. We describe the video processing techniques that run in real-time in modern web browsers using HTML5 and JavaScript; and show how they help users quickly copy and paste video fragments into new documents, allowing them to efficiently reuse video documents for authoring or note-taking.

Abstract

People frequently capture photos with their smartphones, and some are starting to capture images of documents. However, the quality of captured document images is often lower than expected, even when applications that perform post-processing to improve the image are used. To improve the quality of captured images before post-processing, we developed a Smart Document Capture (SmartDCap) application that provides real-time feedback to users about the likely quality of a captured image. The quality measures capture the sharpness and framing of a page or regions on a page, such as a set of one or more columns, a part of a column, a figure, or a table. Using our approach, while users adjust the camera position, the application automatically determines when to take a picture of a document to produce a good quality result. We performed a subjective evaluation comparing SmartDCap and the Android Ice Cream Sandwich (ICS) camera application; we also used raters to evaluate the quality of the captured images. Our results indicate that users find SmartDCap to be as easy to use as the standard ICS camera application. Additionally, images captured using SmartDCap are sharper and better framed on average than images using the ICS camera application.

Abstract

We describe direct video manipulation interactions applied to screen-based tutorials. In addition to using the video timeline, users of our system can quickly navigate into the video by mouse-wheel, double click over a rectangular region to zoom in and out, or drag a box over the video canvas to select text and scrub the video until the end of a text line even if not shown in the current frame. We describe the video processing techniques developed to implement these direct video manipulation techniques, and show how there are implemented to run in most modern web browsers using HTML5's CANVAS and Javascript.

Abstract

Faithful sharing of screen contents is an important collaboration feature. Prior systems were designed to operate over constrained networks. They performed poorly even without such bottlenecks. To build a high performance screen sharing system, we empirically analyzed screen contents for a variety of scenarios. We showed that screen updates were sporadic with long periods of inactivity. When active, screens were updated at far higher rates than was supported by earlier systems. The mismatch was pronounced for interactive scenarios. Even during active screen updates, the number of updated pixels were frequently small. We showed that crucial information can be lost if individual updates were merged. When the available system resources could not support high capture rates, we showed ways in which updates can be effectively collapsed. We showed that Zlib lossless compression performed poorly for screen updates. By analyzing the screen pixels, we developed a practical transformation that significantly improved compression rates. Our system captured 240 updates per second while only using 4.6 Mbps for interactive scenarios. Still, while playing movies in fullscreen mode, our approach could not achieve higher capture rates than prior systems; the CPU remains the bottleneck. A system that incorporates our findings is deployed within the lab.

Abstract

DisplayCast is a many to many screen sharing system that is targeted towards Intranet scenarios. The capture software runs on all computers whose screens need to be shared. It uses an application agnostic screen capture mechanism that creates a sequence of pixmap images of the screen updates. It transforms these pixmaps to vastly improve the lossless Zlib compression performance. These algorithms were developed after an extensive analysis of typical screen contents. DisplayCast shares the processor and network resources required for screen capture, compression and transmission with host applications whose output needs to be shared. It balances the need for high performance screen capture with reducing its resource interference with user applications. DisplayCast uses Zeroconf for naming and asynchronous location. It provides support for Cisco WiFi and Bluetooth based localization. It also includes a HTTP/REST based controller for remote session initiation and control. DisplayCast supports screen capture and playback in computers running Windows 7 and Mac OS X operating systems. Remote screens can be archived into a H.264 encoded movie on a Mac. They can also be played back in real time on Apple iPhones and iPads. The software is released under a New BSD license.

Abstract

The ways in which we come to know and share what we know with others are deeply entwined with the technologies that enable us to capture and share information. As face-to-face communication has been supplemented with ever-richer media––textual books, illustrations and photographs, audio, film and video, and more––the possibilities for knowledge transfer have only expanded. One of the latest trends to emerge amidst the growth of Internet sharing and pervasive mobile devices is the mass creation of online instructional videos. We are interested in exploring how smart phones shape this sort of mobile, rich media documentation and sharing.

Abstract

The Active Reading Application (ARA) brings the familiar experience of writing on paper to the tablet. The application augments paper-based practices with audio, the ability
to review annotations, and sharing. It is designed to make it easier to review, annotate, and comment on documents by
individuals and groups. ARA incorporates several patented technologies and draws on several years of research and experimentation.

Abstract

Mobile media applications need to balance user and group goals, attentional constraints, and limited screen real estate. In this paper, we describe the iterative development and testing of an application that explores these tradeoffs. We developed early prototypes of a retrospective, time-based system as well as a prospective and space-based system. Our experiences with the prototypes led us to focus on the prospective system. We argue that attentional demands dominate and mobile media applications should be lightweight and hands-free as much as possible.

Abstract

A variety of applications are emerging to support streaming video from mobile devices. However, many tasks can benefit from streaming specific content rather than the full video feed which may include irrelevant, private, or distracting content. We describe a system that allows users to capture and stream targeted video content captured with a mobile device. The application incorporates a variety of automatic and interactive techniques to identify and segment desired content, allowing the user to publish a more focused video stream.

Abstract

While there are many commercial systems designed to help people browse and compare products, these interfaces are typically product centric. To help users more efficiently identify products that match their needs, we instead focus on building a task centric interface and system. With this approach, users initially answer questions about the types of situations in which they expect to use the product. The interface reveals the types of products that match their needs and exposes high-level product features related to the kinds of tasks in which they have expressed an interest. As users explore the interface, they can reveal how those high-level features are linked to actual product data, including customer reviews and product specifications. We developed semi-automatic methods to extract the high-level features used by the system from online product data. These methods identify and group product features, mine and summarize opinions about those features, and identify product uses. User studies verified our focus on high-level features for browsing and low-level features and specifications for comparison.

Abstract

NudgeCam is a mobile application that can help users capture more relevant, higher quality media. To guide users to capture media more relevant to a particular project, third-party template creators can show users media that demonstrates relevant content and can tell users what content should be present in each captured media using tags and other meta-data such as location and camera orientation.
To encourage higher quality media capture, NudgeCam provides real time feedback based on standard media capture
heuristics, including face positioning, pan speed, audio quality, and many others. We describe an implementation of
NudgeCam on the Android platform as well as field deployments of the application.

Abstract

The use of whiteboards is pervasive across a wide range of work domains. But some of the qualities that make them successful—an intuitive interface, physical working space, and easy erasure—inherently make them poor tools for archival and reuse. If whiteboard content could be made available in times and spaces beyond those supported by the whiteboard alone, how might it be appropriated? We explore this question via ReBoard, a system that automatically captures whiteboard images and makes them accessible through a novel set of user-centered access tools. Through the lens of a seven week workplace field study, we found that by enabling new workflows, ReBoard increased the value of whiteboard content for collaboration.

Abstract

Paper is static but it is also light, flexible, robust, and has high resolution for reading documents in various scenarios. Digital devices will likely never match the flexibility of paper, but come with all of the benefits of computation and networking. Tags provide a simple means of bridging the gap between the two media to get the most out of both. In this paper, we explore the tradeoffs between two different types of tagging technologies – marker-based and content-based – through the lens of four systems we have developed and evaluated at our lab. From our experiences, we extrapolate issues for designers to consider when developing systems that transition between paper and digital content in a variety of different scenarios.

Abstract

Reading documents on mobile devices is challenging. Not only are screens small and difficult to read, but also navigating an environment using limited visual attention can be difficult and potentially dangerous. Reading content aloud using text-to-speech (TTS) processing can mitigate these problems, but only for content that does not include rich visual information. In this paper, we introduce a new technique, SeeReader, that combines TTS with automatic content recognition and document presentation control that allows users to listen to documents while also being notified of important visual content. Together, these services allow users to read rich documents on mobile devices while maintaining awareness of their visual environment.

Abstract

The Usable Smart Environment project (USE) aims at designing easy-to-use, highly functional
next-generation conference rooms. Our first design prototype focuses on creating a "no wizards"
room for an American executive; that is, a room the executive could walk into and use by himself,
without help from a technologist. A key idea in the USE framework is that customization is one of the best ways to create a smooth user experience. Since the system needs to fit both with the personal leadership style of the executive and the corporation's meeting culture, we began the design process by exploring the work flow in and around meetings attended by the executive.
Based on our work flow analysis and the scenarios we developed from it, USE developed a flexible, extensible architecture specifically designed to enhance ease of use in smart environment technologies. The architecture allows customization and personalization of smart environments for particular people and groups, types of work, and specific physical spaces. The first USE room was designed for FXPAL's executive "Ian" and installed in Niji, a small executive conference room at FXPAL.
The room Niji currently contains two large interactive whiteboards for projection of presentation
material, for annotations using a digital whiteboard, or for teleconferencing; a Tandberg teleconferencing system; an RFID authentication plus biometric identification system; printing via network; a PDA-based simple controller, and a tabletop touch-screen console. The console is used for the USE room control interface, which controls and switches between all of the equipment mentioned above.

Abstract

Most mobile navigation systems focus on answering the question,“I know where I want to go, now can you show me exactly how to get there?” While this approach works well for many tasks, it is not as useful for unconstrained situations in which user goals and spatial landscapes are more fluid, such as festivals or conferences. In this paper we describe the design and iteration of the Kartta system, which we developed to answer a slightly different question: “What are the most interesting areas here and how do I find them?”