Tutorials are vital for helping people perform complex software-based tasks in domains such as programming, data science, system administration, and computational research. However, it is tedious to create detailed step-by-step tutorials for tasks that span multiple interrelated GUI and command-line applications. To address this challenge, we created Torta, an end-to-end system that automatically generates step-by-step GUI and command-line app tutorials by demonstration, provides an editor to trim, organize, and add validation criteria to these tutorials, and provides a web-based viewer that can validate step-level progress and automatically run certain steps. The core technical insight that underpins Torta is that combining operating-system-wide activity tracing and screencast recording makes it easier to generate mixed-media (text+video) tutorials that span multiple GUI and command-line apps. An exploratory study on 10 computer science teaching assistants (TAs) found that they all preferred the experience and results of using Torta to record programming and sysadmin tutorials relevant to classes they teach rather than manually writing tutorials. A follow-up study on 6 students found that they all preferred following the Torta tutorials created by those TAs over the manually-written versions.

Complex software tasks often require intricate
coordination across multiple GUI and command-line tools. For instance,
if you want to start building a modern full-stack web app, you may need
to first install Node.js and the npm package manager, run a slew of npm
commands to configure a custom toolchain with a CSS preprocessor and a
JavaScript code bundler, adjust OS environment variables to detect all
required library dependencies and execution paths, customize your IDE
to hook up to that toolchain, install and configure web browser
extensions for debugging, and set up a pipeline to deploy code to
production servers. All these acrobatic contortions must be done before
you can even write a simple “hello world” web app! (This
example was made in 2017. I'm sure that details will drastically change
in the future, but the underlying complexity will undoubtedly remain.)

The same kinds of arcane command-line (and GUI)
BS afflict data scientists,
computational researchers, system administrators, and anyone else who
has to work with computers in a non-trivial way.

To help novices learn to do these tasks, experts create step-by-step
tutorials in one of two ways:

Hand-written: They can create a written tutorial by painstakingly
enumerating all steps, describing shell commands, expected outputs,
and side effects, and taking and annotating screenshots to demonstrate
GUI-based actions. This process is very tedious and time-consuming for
the creator (e.g., it's easy to forget or gloss over certain steps!)
but can lead to high-quality tutorials for learners if done well.
However, it has the drawback of not capturing motion-based actions
that are helpful for GUI apps. Which brings us to ...

Screencast videos: They can create a video tutorial by simply
demonstrating the requisite actions on their computer while narrating
by voice. This has the advantage of being much easier for creators and
also capturing motion-based actions. But videos are much harder for
learners to navigate and search, and learners can't copy-and-paste
shell commands and filesystem metadata from video clips like they
can with hand-written tutorials. Videos are also much harder to edit
later.

What if we could combine the best of both worlds? To try to do so, we
created a macOS app called Torta (Transparent
Operating-system Recording for Tutorial
Acquisition) that makes it easy to create and consume
mixed-media tutorials that contain the best properties of hand-written
and screencast video formats. Here's how Torta works (click image to
enlarge):

The tutorial creator first demonstrates the intended actions on their
computer by running shell commands, launching GUI applications, and
interacting with application windows just like they would normally do.
Torta automatically records a screencast video of their desktop along
with a timestamped trace of OS-level activity that includes filesystem
modifications, shell commands, window positions, and keystrokes. From
this single demonstration, Torta generates a mixed-media tutorial that
hierarchically segments the screencast video by foreground GUI windows,
executed commands, and versions of saved files. It displays each segment
as an individual step on a tutorial webpage.

However, since this initial demonstration likely contains
redundancies or errors due to the difficulty of recording a pristine and
error-free video demo in one take, Torta provides a user interface for
editing tutorials prior to publishing. The tutorial editor UI uses data
from both the recorded screencast video and OS-level activity traces to
allow creators to compress and summarize portions of the tutorial, add
textual annotations, insert file path templates that generalize the
tutorial's contents across machines, and add checkpoints for viewers to
validate their progress.

Torta-generated tutorials (“Tortorials”) are simply
ordinary webpages that mix text and video, so people can consume them
just like any web-based tutorial. Tortorials are also hierarchical, so
users can zoom in to view more details on demand. If someone wants
interactive feedback as they are following along, they can optionally
install a Torta viewer app on their computer. Doing so enables them to
use an augmented tutorial viewer that provides checkpoints to validate
their progress at each step. The viewer app can also automatically run
certain steps for the user.

This schematic shows the step-by-step structure of a hypothetical
Tortorial demonstrating a sequence of web browser, terminal, and text
editor actions. The sub-steps within each step represent shell command
invocations and file save events:

In sum, Torta points toward a future where making complex software
tutorials becomes as simple as interacting normally with the desired
applications and adding some annotations afterward. For tutorial
creators, Torta provides the best of both modalities—the fluid
ease of demonstrating a set of computer actions in-situ, and the
detailed rigor of writing text-based tutorials. And for tutorial consumers,
Torta allows them to browse hierarchically at the level of detail
suitable for their needs and to get step-by-step feedback on their
incremental progress.

Tutorials are vital for helping people perform complex software-based tasks in domains such as programming, data science, system administration, and computational research. However, it is tedious to create detailed step-by-step tutorials for tasks that span multiple interrelated GUI and command-line applications. To address this challenge, we created Torta, an end-to-end system that automatically generates step-by-step GUI and command-line app tutorials by demonstration, provides an editor to trim, organize, and add validation criteria to these tutorials, and provides a web-based viewer that can validate step-level progress and automatically run certain steps. The core technical insight that underpins Torta is that combining operating-system-wide activity tracing and screencast recording makes it easier to generate mixed-media (text+video) tutorials that span multiple GUI and command-line apps. An exploratory study on 10 computer science teaching assistants (TAs) found that they all preferred the experience and results of using Torta to record programming and sysadmin tutorials relevant to classes they teach rather than manually writing tutorials. A follow-up study on 6 students found that they all preferred following the Torta tutorials created by those TAs over the manually-written versions.