Users of complex software applications often learn concepts and skills through step-by-step tutorials. Today, these tutorials are published in two dominant forms: static tutorials composed of images and text that are easy to scan, but cannot effectively describe dynamic interactions; and video tutorials that show all manipulations in detail, but are hard to navigate. We hypothesize that a mixed tutorial with static instructions and per-step videos can combine the benefits of both formats. We describe a comparative study of static, video, and mixed image manipulation tutorials with 12 participants and distill design guidelines for mixed tutorials. We present MixT, a system that automatically generates step-by-step mixed media tutorials from user demonstrations. MixT segments screencapture video into steps using logs of application commands and input events, applies video compositing techniques to focus on salient information, and highlights interactions through mouse trails. Informal evaluation suggests that automatically generated mixed media tutorials were as effective in helping users complete tasks as tutorials that were created manually.

Powerful image editing software like Adobe Photoshop and GIMP have complex interfaces that can be hard to master. To help users perform image editing tasks, we introduce tutorial-based applications (tapps) that retain the step-by-step structure and descriptive text of tutorials but can also automatically apply tutorial steps to new images. Thus, tapps can be used to batch process many images automatically, similar to traditional macros. Tapps also support interactive exploration of parameters, automatic variations, and direct manipulation (e.g., selection, brushing). Another key feature of tapps is that they execute on remote instances of Photoshop, which allows users to edit their images on any Web-enabled device. We demonstrate a working prototype system called TappCloud for creating, managing and using tapps. Initial user feedback indicates support for both the interactive features of tapps and their ability to automate image editing. We conclude with a discussion of approaches and challenges of pushing monolithic direct-manipulation GUIs to the cloud.

Audio producers often use musical underlays to emphasize key moments in spoken content and give listeners time to reflect on what was said. Yet, creating such underlays is time-consuming as producers must carefully (1) mark an emphasis point in the speech (2) select music with the appropriate style, (3) align the music with the emphasis point, and (4) adjust dynamics to produce a harmonious composition. We present UnderScore, a set of semi-automated tools designed to facilitate the creation of such underlays. The producer simply marks an emphasis point in the speech and selects a music track. UnderScore automatically refines, aligns and adjusts the speech and music to generate a high-quality underlay. UnderScore allows producers to focus on the high-level design of the underlay; they can quickly try out a variety of music and test different points of emphasis in the story. Amateur producers, who may lack the time or skills necessary to author underlays, can quickly add music to their stories. An informal evaluation of UnderScore suggests that it can produce high-quality underlays for a variety of examples while significantly reducing the time and effort required of radio producers.

Video tutorials provide a convenient means for novices to learn new software applications. Unfortunately, staying in sync with a video while trying to use the target application at the same time requires users to repeatedly switch from the application to the video to pause or scrub backwards to replay missed steps. We present Pause-and-Play, a system that helps users work along with existing video tutorials. Pause-and-Play detects important events in the video and links them with corresponding events in the target application as the user tries to replicate the depicted procedure. This linking allows our system to automatically pause and play the video to stay in sync with the user. Pause-and-Play also supports convenient video navigation controls that are accessible from within the target application and allow the user to easily replay portions of the video without switching focus out of the application. Finally, since our system uses computer vision to detect events in existing videos and leverages application scripting APIs to obtain real time usage traces, our approach is largely independent of the specific target application and does not require access or modifications to application source code. We have implemented Pause-and-Play for two target applications, Google SketchUp and Adobe Photoshop, and we report on a user study that shows our system improves the user experience of working with video tutorials.

To create collections, like music playlists from personal media libraries, users today typically do one of two things. They either manually select items one-by-one, which can be time consuming, or they use an example-based recommendation system to automatically generate a collection. While such automatic engines are convenient, they offer the user limited control over how items are selected. Based on prior research and our own observations of existing practices, we propose a semi-automatic interface for creating collections that combines automatic suggestions with manual refinement tools. Our system includes a keyword query interface for specifying high-level collection preferences (e.g., "some rock, no Madonna, lots of U2,") as well as three example-based collection refinement techniques: 1) a suggestion widget for adding new items in-place in the context of the collection; 2) a mechanism for exploring alternatives for one or more collection items; and 3) a two-pane linked interface that helps users browse their libraries based on any selected collection item. We demonstrate our approach with two applications. SongSelect helps users create music playlists, and PhotoSelect helps users select photos for sharing. Initial user feedback is positive and confirms the need for semi-automated tools that give users control over automatically created collections.

We present a system for creating interactive exploded view diagrams using 2D images as input. This image-based approach enables us to directly support arbitrary rendering styles, eliminates the need for building 3D models, and allows us to leverage the abundance of existing static diagrams of complex objects. We have developed a set of semi-automatic authoring tools for quickly creating layered diagrams that allow the user to specify how the parts of an object expand, collapse, and occlude one another. We also present a viewing system that lets users dynamically filter the information presented in the diagram by directly expanding and collapsing the exploded view and searching for individual parts. Our results demonstrate that a simple 2.5D diagram representation is powerful enough to enable a useful set of interactions and that, with the right authoring tools, effective interactive diagrams in this format can be created from existing static illustrations with a small amount of effort.