Abstract

Tutorials are one of the most fundamental means of conveying knowledge. Ideally when the task involves physical or digital objects, tutorials not only describe each step with text or via audio narration but show it as well using photos or animation. In most cases, online tutorial authors capture media from handheld mobile devices to compose these documents, but increasingly they use wearable devices as well. In this work, we explore the full life-cycle of online tutorial creation and viewing using head-mounted capture and displays. We developed a media-capture tool for Google Glass that requires minimal attention to the capture device and instead allows the author to focus on creating the tutorial's content rather than its capture. The capture tool is coupled with web-based authoring tools for creating annotatable videos and multimedia documents. In a study comparing standalone (camera on tripod) versus wearable capture (Google Glass) as well as two types of multimedia representation for authoring tutorials (video-based or document-based), we show that tutorial authors have a preference for wearable capture devices, especially when recording activities involving larger objects in non-desktop environments. Authors preferred document-based multimedia tutorials because they are more straightforward to compose and the step-based structure translates more directly to explaining a procedure. In addition, we explored using head-mounted displays (Google Glass) for accessing tutorials in comparison to lightweight computing devices such as tablets. Our study included tutorials recorded with the same capture methods as in our access study. We found that although authors preferred head-mounted capture, tutorial consumers preferred video recorded by a camera on tripod that provides a more stable image of the workspace. Head-mounted displays are good for glanceable information, however video demands more attention and our participants made more errors using Glass than when using a tablet, which was easier to ignore. Our findings point out several design implications for online tutorial authoring and access methods.

Abstract

Video telehealth is growing to allow more clinicians to see patients from afar. As a result, clinicians, typically trained for in-person visits, must learn to communicate both health information and non-verbal affective signals to patients through a digital medium. We introduce a system called ReflectLive that senses and provides real-time feedback about non-verbal communication behaviors to clinicians so they can improve their communication behaviors. A user evaluation with 10 clinicians showed that the real-time feedback helped clinicians maintain better eye contact with patients and was not overly distracting. Clinicians reported being more aware of their non-verbal communication behaviors and reacted positively to summaries of their conversational metrics, motivating them to want to improve. Using ReflectLive as a probe, we also discuss the benefits and concerns around automatically quantifying the “soft skills” and complexities of clinician-patient communication, the controllability of behaviors, and the design considerations for how to present real-time and summative feedback to clinicians.

Abstract

For tourists, interactions with digital public displays often depend on specific technologies that users may not be familiar with (QR codes, NFC, Bluetooth); may not have access to because of networking issues (SMS), may lack a required app (QR codes), or device technology (NFC); may not want to use because of time constraints (WiFi, Bluetooth); or may not want to use because they are worried about sharing their data with a third-party service (text, WiFi). In this demonstration, we introduce ItineraryScanner, a system that allows users to
seamlessly share content with a public travel kiosk system.

Abstract

In this paper, we describe DocHandles, a novel system that allows users to link to specific document parts in their chat applications. As users type a message, they can invoke the tool by referring to a specific part of a document, e.g., “@fig1 needs revision”. By combining text parsing and document layout analysis, DocHandles can find and present all the figures “1” inside previously shared documents, allowing users to explicitly link to the relevant “document handle”. Documents become first-class citizens inside the conversation stream where users can seamlessly integrate documents in their text-centric messaging application.

Abstract

The proliferation of workplace multimedia collaboration applications has meant on one hand more opportunities for group work but on the other more data locked away in proprietary interfaces. We are developing new tools to capture and access multimedia content from any source. In this demo, we focus primarily on new methods that allow users to rapidly reconstitute, enhance, and share document-based information.

Abstract

In this paper we describe DocuGram, a novel tool to capture and share documents from any application. As users scroll through pages of their document inside the native application (Word, Google Docs, web browser), the system captures and analyses in real-time the video frames and reconstitutes the original document pages into an easy to view HTML-based representation. In addition to regenerating the document pages, a DocuGram also includes the interactions users had over them, e.g. mouse motions and voice comments. A DocuGram acts as a modern copy machine, allowing users to copy and share any document from any application.

Abstract

Most teleconferencing tools treat users in distributed meetings monolithically: all participants are meant to be connected to one another in the more-or-less the same manner. In reality, though, people connect to meetings in all manner of different contexts, sometimes sitting in front of a laptop or tablet giving their full attention, but at other times mobile and involved in other tasks or as a liminal participant in a larger group meeting. In this paper we present the design and evaluation of two applications, Penny and MeetingMate, designed to help users in non-standard contexts participate in meetings.

Abstract

We present MixMeetWear, a smartwatch application that allows users to maintain awareness of the audio and visual content of a meeting while completing other tasks. Users of the system can listen to the audio of a meeting and also view, zoom, and pan webcam and shared content keyframes of other meeting participants' live streams in real time. Users can also provide input to the meeting via speech-to-text or predefined responses. A study showed that the system is useful for peripheral awareness of some meetings.

Abstract

Remote meetings are messy. There are an ever-increasing number of support tools available, and, as past work has shown, people will tend to select a subset of those tools to satisfy their own institutional, social, and personal needs. While video tools make it relatively easy to have conversations at a distance, they are less adapted to sharing and archiving multimedia content. In this paper we take a deeper look at how sharing multimedia content occurs before, during, and after distributed meetings. Our findings shed light on the decisions and rationales people use to select from the vast set of tools available to them to prepare for, conduct, and reconcile the results of a remote meeting.

Abstract

Establishing common ground is one of the key problems for any form of communication. The problem is particularly pronounced in remote meetings, in which participants can easily lose track of the details of dialogue for any number of reasons. In this demo we present a web-based tool, MixMeet, that allows teleconferencing participants to search the contents of live meetings so they can rapidly retrieve previously shared content to get on the same page, correct a misunderstanding, or discuss a new idea.

Abstract

Web-based tools for remote collaboration are quickly becoming an established element of the modern workplace. During live meetings, people share web sites, edit presentation slides, and share code editors. It is common for participants to refer to previously spoken or shared content in the course of synchronous distributed collaboration. A simple approach is to index with Optical Character Recognition
(OCR) the video frames, or key-frames, being shared and let user retrieve them with text queries. Here we show that a complementary approach is to look at the actions users
take inside the live document streams. Based on observations of real meetings, we focus on two important signals: text editing and mouse cursor motion. We describe the detection
of text and cursor motion, their implementation in our WebRTC-based system, and how users are better able to search live documents during a meeting based on these detected and indexed actions.

Abstract

Tutorials are one of the most fundamental means of conveying knowledge. In this paper, we present a suite of applications that allow users to combine different types of media captured from handheld, standalone, or wearable devices to create multimedia tutorials. We conducted a study comparing standalone (camera on tripod) versus wearable capture (Google Glass). The results show that tutorial authors have a slight preference for wearable capture devices, especially when recording activities involving larger objects.

Abstract

As video-mediated communication reaches broad adoption, improving immersion and social interaction are important areas of focus in the design of tools for exploration and work-based communication. Here we present three threads of research focused on developing new ways of enabling exploration of a remote environment and interacting with the people and artifacts therein.

Abstract

Video Text Retouch is a technique for retouching textual content found in many online videos such as screencasts, recorded presentations and many online e-learning videos. Viewed through our special, HTML5-based player, users can edit in real-time the textual content of the video frames, such as correcting
typos or inserting new words between existing characters. Edits are overlaid and tracked at the desired position for as long as the original video content remains similar. We describe the interaction techniques, image processing algorithms and give implementation details of the system.

Abstract

Distributed teams must co-ordinate a variety of tasks. To do so they need to be able to create, share, and annotate documents as well as discuss plans and goals. Many workflow tools support document sharing, while other tools support videoconferencing, however there exists little support for connecting the two. In this work we describe a system that allows users to share and markup content during web meetings. This shared content can provide important conversational props within the context of a meeting; it can also help users review archived meetings. Users can also extract shared content from meetings directly into other workflow tools.

Abstract

Online video is incredibly rich. A 15-minute home improvement YouTube tutorial might include 1500 words of narration, 100 or more significant keyframes showing a visual change from multiple perspectives, several animated objects, references to other examples, a tool list, comments from viewers and a host of other metadata. Furthermore, video accounts for 90% of worldwide Internet traffic. However, it is our observation that video is not widely seen as a full-fledged document; dismissed as a media that, at worst, gilds over substance and, at best, simply augments text-based communications. In this piece, we suggest that negative attitudes toward multimedia documents that include audio and video are largely unfounded and arise mostly because we lack the necessary tools to treat video content as first-order media or to support seamlessly mixing media.

Abstract

Video content creators invest enormous effort
creating work that is in turn typically viewed passively.
However, learning tasks using video requires users
not only to consume the content but also to engage,
interact with, and repurpose it. Furthermore, to
promote learning with video in domains where content
creators are not necessarily videographers, it is
important that capture tools facilitate creation of
interactive content. In this paper, we describe some
early experiments toward this goal. A literature review
coupled with formative field studies led to a system
design that can incorporate a broad set of
video-creation and interaction styles.

Abstract

Video tends to be imbalanced as a medium. Typically, content creators invest enormous effort creating work that is then watched passively. However, learning tasks require that users not only consume video but also engage, interact with, and repurpose content. Furthermore, to promote learning across domains where content creators are not necessarily videographers, it is important that capture tools facilitate creation of interactive content. In this paper, we describe some early experiments toward this goal. Specifically, we describe a needfinding study involving interviews with amateur video creators as well as our experience with an early prototype to support expository capture and access. Our findings led to a system redesign that can incorporate a broad set of video-creation and interaction styles.

Abstract

Unlike text, copying and pasting parts of video documents is challenging. Yet, the huge amount of video documents now available in the form of how-to tutorials begs for simpler techniques that allow users to easily copy and paste fragments of video materials into new documents. We describe new direct video manipulation techniques that allow users to quickly copy and paste content from video documents such as how-to tutorials into a new document. While the video plays, users interact with the video canvas to select text regions, scrollable regions, slide sequences built up across many frames, or semantically meaningful regions such as dialog boxes. Instead of relying on the timeline to accurately select sub-parts of the video document, users navigate using familiar selection techniques such as mouse-wheel to scroll back and forward over a video region where content scrolls, double-clicks over rectangular regions to select them, or clicks and drags over textual regions of the video canvas to select them. We describe the video processing techniques that run in real-time in modern web browsers using HTML5 and JavaScript; and show how they help users quickly copy and paste video fragments into new documents, allowing them to efficiently reuse video documents for authoring or note-taking.

Abstract

People frequently capture photos with their smartphones, and some are starting to capture images of documents. However, the quality of captured document images is often lower than expected, even when applications that perform post-processing to improve the image are used. To improve the quality of captured images before post-processing, we developed a Smart Document Capture (SmartDCap) application that provides real-time feedback to users about the likely quality of a captured image. The quality measures capture the sharpness and framing of a page or regions on a page, such as a set of one or more columns, a part of a column, a figure, or a table. Using our approach, while users adjust the camera position, the application automatically determines when to take a picture of a document to produce a good quality result. We performed a subjective evaluation comparing SmartDCap and the Android Ice Cream Sandwich (ICS) camera application; we also used raters to evaluate the quality of the captured images. Our results indicate that users find SmartDCap to be as easy to use as the standard ICS camera application. Additionally, images captured using SmartDCap are sharper and better framed on average than images using the ICS camera application.

Abstract

We describe direct video manipulation interactions applied to screen-based tutorials. In addition to using the video timeline, users of our system can quickly navigate into the video by mouse-wheel, double click over a rectangular region to zoom in and out, or drag a box over the video canvas to select text and scrub the video until the end of a text line even if not shown in the current frame. We describe the video processing techniques developed to implement these direct video manipulation techniques, and show how there are implemented to run in most modern web browsers using HTML5's CANVAS and Javascript.

Abstract

Faithful sharing of screen contents is an important collaboration feature. Prior systems were designed to operate over constrained networks. They performed poorly even without such bottlenecks. To build a high performance screen sharing system, we empirically analyzed screen contents for a variety of scenarios. We showed that screen updates were sporadic with long periods of inactivity. When active, screens were updated at far higher rates than was supported by earlier systems. The mismatch was pronounced for interactive scenarios. Even during active screen updates, the number of updated pixels were frequently small. We showed that crucial information can be lost if individual updates were merged. When the available system resources could not support high capture rates, we showed ways in which updates can be effectively collapsed. We showed that Zlib lossless compression performed poorly for screen updates. By analyzing the screen pixels, we developed a practical transformation that significantly improved compression rates. Our system captured 240 updates per second while only using 4.6 Mbps for interactive scenarios. Still, while playing movies in fullscreen mode, our approach could not achieve higher capture rates than prior systems; the CPU remains the bottleneck. A system that incorporates our findings is deployed within the lab.

Abstract

DisplayCast is a many to many screen sharing system that is targeted towards Intranet scenarios. The capture software runs on all computers whose screens need to be shared. It uses an application agnostic screen capture mechanism that creates a sequence of pixmap images of the screen updates. It transforms these pixmaps to vastly improve the lossless Zlib compression performance. These algorithms were developed after an extensive analysis of typical screen contents. DisplayCast shares the processor and network resources required for screen capture, compression and transmission with host applications whose output needs to be shared. It balances the need for high performance screen capture with reducing its resource interference with user applications. DisplayCast uses Zeroconf for naming and asynchronous location. It provides support for Cisco WiFi and Bluetooth based localization. It also includes a HTTP/REST based controller for remote session initiation and control. DisplayCast supports screen capture and playback in computers running Windows 7 and Mac OS X operating systems. Remote screens can be archived into a H.264 encoded movie on a Mac. They can also be played back in real time on Apple iPhones and iPads. The software is released under a New BSD license.

Abstract

The ways in which we come to know and share what we know with others are deeply entwined with the technologies that enable us to capture and share information. As face-to-face communication has been supplemented with ever-richer media––textual books, illustrations and photographs, audio, film and video, and more––the possibilities for knowledge transfer have only expanded. One of the latest trends to emerge amidst the growth of Internet sharing and pervasive mobile devices is the mass creation of online instructional videos. We are interested in exploring how smart phones shape this sort of mobile, rich media documentation and sharing.