Abstract

Of all narrative textual forms, the motion picture screenplay may be the most
perfectly pre-disposed for computational analysis. Screenplays contain
capitalized character names, indented dialogue, and other formatting conventions
that enable an algorithmic approach to analyzing and visualizing film
narratives. In this article, the authors introduce their new tool, ScripThreads,
which parses screenplays, outputs statistical values which can be analyzed, and
offers four different types of visualization, each with its own utility. The
visualizations represent character interactions across time as a single 3D or 2D
graph. The authors model the utility of the tool for the close analysis of a
single film (Lawrence Kasdan’s Grand Canyon
[1991]). They also model how the tool can be used for "distant reading" by identifying patterns of character presence
across a dataset of 674 screenplays.

1. Introduction

In November 2009, the website xkcd.com published a series of info-graphics that
visualized character interactions in movies such as the Lord of the Rings trilogy (2001-2003), the Star Wars trilogy (1977-1983), and Jurassic
Park (1993). Exhibiting both attention to detail and a nice sense of
humor, the xkcd charts allowed time to play out across the x-axis and showed how
the characters exist in different spaces within the narrative world (the
difference between storylines occurring on the Death Star and Tatooine, in the
Star Wars example). These info-graphics became
the motivation for Tanahasi and Ma’s "Design Considerations
for Optimizing Storyline Visualizations," a computer science paper
published in IEEE Transactions on Visualization and
Computer Graphics (2012). Tanahasi and Ma propose an algorithmic
approach to generating the design of these types of visualizations. While these
types of visualizations were of interest to scholars and researchers, they came
with several short-comings, most notably that all of the information that was to
be visualized needed to be gathered externally. This meant that a human would
need to watch a film and manually fill in information as to when and where
certain characters appeared and interacted. It greatly diminished the utility of
these visualizations as a tool of data exploration; the analysis more or less
needed to be completed before the tool could be used.

Figure 1.

xkcd’s 2009 charts of character interactions in movies proved to be
popular on the Internet and influential to visualization designers.

The challenge of conveying complex information (character appearances, relations
with other characters, and absences) as it changes over time — along with our
shared interests in the Digital Humanities and our more singular scholarly
focuses on cinema and television (Hoyt), visualization and storytelling (Roy),
and computational approaches (Ponto) — spurred us to develop a tool that
algorithmically analyzes and visualizes screenplays. In building a tool to aid
Humanists, we sought to heed Abello, Broadwell, and Tanghrelini’s call for
computational approaches that "combine distant reading and
close reading"
[Abello, Broadwell, and Tangherlini 2012]. One appeal of a fully algorithmic approach is that
it scales nicely — enabling the "distant reading"
that Franco Moretti first proposed and, in our case, allowing a researcher to
study 1,000 screenplays rather than 10 or 20 [Moretti 2000].
Keeping in mind the value of close reading, though, we also sought to create a
narrative analysis tool that offered direct access to every line of a screenplay
— similar to the "Reading View" window in the
Watching the Script software prototype developed for the visualization of
theatrical text [Roberts-Smith et al. 2013]. In our case, we wanted to
connect this "Reading View" window that displays
lines from the screenplay to a series of visualizations that account for scenes,
pacing, and character interactions. Rather than reducing a screenplay simply to
statistical aggregates, we wanted to map the way a screenplay unfolds as it
moves from page to page. From generating several hundred of these narrative
profiles, we can compare and contrast large numbers of screenplays across
decades, by author, by genre, by narrative structure and more.

In this article, we introduce our tool, ScripThreads, and discuss some of our
initial research findings from using ScripThreads to analyze and visualize
hundreds of screenplays from the American Film Scripts Online collection. We
model how the tool can be productively used in film analysis as a tool for close
reading by analyzing and comparing two screenplays co-written and directed by
Lawrence Kasdan, The Big Chill (1983) and Grand Canyon (1991). We also model how the tool can be
used for distant reading by searching across hundreds of screenplays for the
pattern of the "hyper-present protagonist" — movies that
place a main character in every scene or nearly every scene.

In the process of building the tool’s prototype and writing this article, we have
come to appreciate the many ways that a computer reads a screenplay differently
from you or me. Humans gather insight from watching and experiencing the
emotion, tension, and dynamics of a movie or screenplay. Rather than attempting
to train a computer to understand a film in the exact same way we do, we would
prefer to ask a computer to do tasks that it is designed to do well and that we
humans struggle with. Humans have memory limitations when it comes to matters of
sequential timing and the entrances and exits of dozens of characters. In
contrast, computers are excellent at gathering and recording these sorts of
details from structured texts. Lev Manovich has suggested that one of the most
valuable things that comes from combining computational analysis and the
visualization of vast amounts of information in a single image is that it
defamiliarizes our understanding of the works that we study in the Humanities
[Manovich 2012]. As we hope to demonstrate, ScripThreads is a
powerful framework for defamiliarization, provoking new questions, and producing
new answers through the combined strengths of the human analyst and the
computer.

2. Researching the Screenplay: Opportunities and Challenges

Of all narrative textual forms, the motion picture screenplay may be the most
perfectly pre-disposed for computational analysis. As Murtagh, Ganz, and
Reddington explain, "a filmscript is a semi-structured
textual document in that it is subdivided into scenes and sometimes other
structural units. Both character dialog and also descriptive and background
metadata is provided in a filmscript. The metadata is partially
formalized"
[Murtagh, Ganz, and Reddington 2011]. As semi-structured documents with formatting
conventions analogous to a metadata schema, screenplays are ideally suited for
automated computer parsing. There is no need for laborious TEI encoding to
detect character dialogue exchanges and interactions. When a character speaks,
his or her name is capitalized and centered on the page. The character’s
dialogue generally appears one line below. As we discuss below, there are
variations within this general format that create parsing challenges.
Nevertheless, the extraction of character interactions in a screenplay is far
easier to automate than in a novel or epic poem.

Our research and software development contributes to a lively and growing area of
research about screenwriting history, form, theory, and practice. As a
discipline, Film Studies has long lived in the shadow of the "auteur theory," which views the director as the author
of a film. Film scholars and critics seem unable to let go of an "auteur desire," in the words of Dana Polan [Polan 2001]. Even scholars who acknowledge that film is a highly
collaborative medium will refer to "Scorsese’s Taxi Driver" — elevating the director to
the status of author rather than the studio (Columbia Pictures), screenwriter
(Paul Schrader), or cinematographer (Michael Chapman). Over the last decade,
however, screenwriting studies has emerged as its own sub-field within the
discipline of film and media studies. The Screenwriting Research Network held
its first conference in 2008; the Journal of
Screenwriting published its first issue in 2010.

A small contingent of screenwriting researchers are, like us, pursuing the
computational analysis of screenplays [Marinov 2011]
[Marinov and Stitts 2013a]
[McKie 2014]
[Murtagh, Ganz, and Reddington 2011]. The 2013 launch of Samuel Marinov and Brock
Stitts’ Screenplay Owl analysis tool marked an especially exciting development.
With its emphasis on dialogic exchange frequencies, Screenplay Owl differs
significantly from ScripThreads. However, both ScripThreads and Screenplay Owl
show that the initial speculations about computational screenplay analysis are
becoming realities. As we worked on revising this article for DHQ in early-2014, we witnessed the launch of another
promising screenwriting analytics tool: ScriptFAQ, developed by Stewart McKie.
McKie’s tool is designed for practicing screenwriters who want to ask questions
about a screenplay they are working on (for example, how many scenes is one
character in compared to another?). For ScriptFAQ to answer these questions, the
screenplay must be imported in the XML-based Final Draft file format and the
user must enter a substantial number of additional metadata fields, which
require detailed knowledge of the screenplay. For a writer working on a
screenplay, these metadata fields are quite easy to complete and the results are
well worth the effort. For a researcher wanting to quickly compare hundreds of
screenplays across history, however, the need for many additional metadata,
prior knowledge about each screenplay, and the Final Draft file format (which
most older screenplays aren’t readily available in) make ScripThreads better
suited than ScriptFAQ for distant reading. Ultimately, for a researcher
interested in computer-enhanced close reading, we believe the combination of
analyzing a screenplay in ScripThreads, ScreenplayOwl, and ScriptFAQ may
generate the richest results of all.

Regardless of the software platform, there are some inherent limitations to an
automated approach to screenplay analysis. First, a screenplay nearly always
differs to some degree from the completed film version, which may include
improvised dialogue from the film’s production or lack scenes or characters that
were cut in post-production. Screenplay analysis requires us to qualify
arguments we may want to make about the entire film. Another major challenge to
this line of research is access — specifically, digital access to the
authoritative versions of screenplays. There are several websites offering free,
downloadable screenplays of contemporary Hollywood movies. The provenance,
authoritativeness, and legal status of these screenplays, though, are not clear.
Moreover, these websites generally have very few screenplays for movies produced
prior to the 1980s.

American Film Scripts Online (AFSO) is the digital resource that we believe
provides access to the largest number of authoritative, digitized screenplays.
The resource’s creator, Alexander Street Press, licensed 1,009 screenplays from
Warner Bros., Universal, and a number of other rights holders. Roughly half of
the screenplays are available as PDF facsimiles of the original documents, and
all of the screenplays are available as HTML documents, which have been
re-keyed, eliminating most of the problems that come from uncorrected OCR text.
The HTML mark-up is not consistently or semantically structured, but this
productively forced us to code ScripThreads’s parsing algorithm so that it could
handle a wider variety of screenplays and not depend on rigid mark-up standards.
As for AFSO’s selection of screenplays, the collection is stronger in some areas
than others. Over half of the 1,009 screenplays derive from 1930s and 1940s
Hollywood movies — primarily, productions from Warner Bros., RKO, and MGM (it’s
no coincidence that Time Warner holds the rights to all three of these studio
film libraries). AFSO is well suited, then, for research questions focusing on
Hollywood’s "Golden Age." Other strength areas of
AFSO include 1990s American films (both studio and independent) and large script
collections from certain contemporary screenwriters, including Paul Schrader,
Lawrence Kasdan, and John Sayles. As a result, AFSO is also well suited for
researching questions involving screenplay authorship, a topic we explore later
in this essay.

3. ScripThreads: Parsing Method and Forms of Visualization

The ScripThreads software prototype is a cross-platform tool for the analysis and
visualization of screenplays. The tool is written in C++ and utilizes the QT
toolkit for its graphical user interface, making it easily ported to multiple
systems. Figure 2 shows a screenshot of the
graphical user interface, which offers separate windows for the Reading View,
Visualization View, and Character and Settings View. By September 2014, users
will be able to download the ScripThreads prototype at http://scripthreads.org. ScripThreads
takes in text and HTML file screenplays as an input, parses these files and
generates data for visualization and analysis. The features of the tool are
described below and showcased with visualizations from The
Big Chill (1983) and other Hollywood screenplays.

Figure 2.

The ScripThreads graphical user interface separate windows for the Reading
View, Visualization View, and Character and Settings View.

3.1. Parsing Method

While screenplays contain a structure that is far more defined than other
mediums, we still found substantial variation between different works. For
instance while most works contained indentation and then the character name in
full capitalization before each paragraph of dialogue, the number of spaces this
indentation took varied greatly between different works. Furthermore, some
authors used indentation and capitalization to indicate other screenplay
attributes such as sound effects, locations, and times.

For these reasons we used a two-step process to automatically find characters in
a screenplay. In the first pass, each line was analyzed to determine if it was
potentially the indication of a character. Lines which were considered
candidates were pushed to a list. After the first pass, the list was analyzed to
determine the most likely amount of indentation before a character name. For
instance, a given screenplay may have 12 spaces of indentation before a
character name with the spoken lines included in the paragraph below.

A second pass was then undertaken to generate information of which characters
were in which scene. This pass consisted of parsing for three different items,
character names, scene breaks, and meta-information. Character names were
gathered from the first pass. Additional names could also be entered by the
user. Scenes were determined by looking for a user defined set of keywords, such
as "int." or "ext.". In practice, it was determined that a list of
10-12 keywords well captured scene changes. Finally, meta-information data, such
as page numbers, were found using simple matching techniques.

This second pass generated data for each scene as to how long the scene lasted,
which characters were involved, and for how many pages in the original
screenplay the scene encompassed. From this data, four different types of
visualization are available to users, each with its own utility.

3.2. Force Directed Graph

After the characters are detected, ScripThreads applies the framework of network
theory by drawing a relationship (edge) between characters (nodes) who share the
same scene. Force directed graphs are a common method visualizing this type of
information. Digital Humanities scholars have used force directed graphs to
represent networks of characters, literary authors, and topics [Simeone 2012]. In a force directed graph, edges are represented as
virtual springs and nodes are represented as virtual masses. During runtime, a
virtual physics simulation is run which attempts to find an equilibrium position
for all nodes. Figure 3 is an example of a typical
force directed graph. In this case, we used Gephi to graph the network of
characters from the 1983 film, The Big Chill.

Figure 3.

A typical force directed graph of a character network. This graph shows
the network of characters from The Big Chill
(1983).

Unfortunately, this approach is only designed for representing connectivity
information, and is not designed for a time-based approach. One approach to
encode temporal information along with connectivity information is to treat each
time-step as an individual force directed graph. These graphs can then be
converted into a 3D data structure which can be analyzed from arbitrary
positions. The limitation of this approach on its own is that each time-step is
handled on an individual basis; there is no guarantee that the nodes will appear
as temporally connected threads.

To overcome this limitation, ScripThreads accounts for the temporal dimension
through use of a single 3D data structure. Each node is placed on a series of
time-step planes on the z-axis and are not only connected to the other relevant
nodes in their time-step, but are also connected to the previous and future
states. This enables each character to be viewed as a virtual thread that can
become entangled in other threads when relationships occur. An added value to
the vertical alignment of character threads, as opposed to the horizontal
arrangement of the xkcd examples, is that the entire script is visible and able
to be scrolled alongside the visualization — enabling the integrated close and
distant reading noted in the introduction. The rendering system also simulates
the idea of a virtual thread, by rendering each character as a continuous
character thread — thick when the character is active, thin when not active.
This enables unbiased color blending to occur on a model level, often referred
to as "color weaving"
[Hagh-Shenas et al. 2007]. Lines of dialogue between the characters can
be seen as interconnects between the thread with the color representing the
speaker.

The force directed visualizations reveal insights into a screenplay’s narrative
structure, especially for films featuring episodic segments or parallel
narratives, parallel protagonists, or parallel lines of action. In screenwriting
manuals, "narrative structure" is often synonymous with "three act
structure" and the goal-oriented protagonist whose pursuit of some goal
pushes the story from one act to the next. The goal-oriented protagonist is
fundamental to Hollywood storytelling, and we acknowledge that ScripThreads does
not capture this important dimension of screenplay structure.[1] However, we would also argue
that there is more to narrative structure than simply the pursuit of goals
across three acts. Supporting characters will be introduced along the way, but
will they return and, if so, how and when? Are there parallel lines of action as
other characters pursue their own goals or serve as thematic foils for the
protagonist? The force directed graph of Paul Schrader’s Mishima: A Life in Four Chapters (1985) clearly captures the
specific character threads contained in four sections of the script (see Figure 4). Similar visual patterns in other force
directed graphs would suggest the screenplay may be episodic in its overall
structure and the way in which it uses supporting characters.

Figure 4.

Force directed graph of Mishima: A Life in Four
Chapters (1985) by Paul Schrader. Circle annotations mark the
episodic segment breaks.

Another reoccurring visual pattern we have detected is alternating convergences
of different color character threads. Such alternating clusters strongly suggest
the screenplay features parallel protagonists, parallel narratives, or parallel
lines of action that play out across different spaces. The force directed graph
of The Lord of the Rings: The Return of the King
(2003), for instance, reveals the parallel lines of action as the hobbits and
their allies pursue their goals across different spaces (Figure 5). The different lines of action and character threads
converge for the climax and lengthy epilogue.

Figure 5.

Parallel lines of action (their intersections circled for emphasis)
revealed in the force directed graph of Lord of the
Rings: The Return of the King (2003).[2]

In contrast to the Mishima’s identifiable episodes
and the alternating lines of action in The Return of the
King, the force directed graph of The Big
Chill (1983) is an example of a film that clusters its characters
together in the same spaces and scenes throughout the movie (Figure 6). The narrative of The Big Chill, directed by Lawrence Kasdan and written by Kasdan
and Barbara Benedek, centers on a group of seven college friends who reunite
after a shared friend, Alex, dies. Most of the film’s action occurs after the
funeral at a spacious vacation house — a setting that facilitates many different
types of character interactions, ranging from two characters speaking privately
to scenes that bring the entire group together. The frequency of these
interactions between the same group of characters is visibly evident in the
tight clustering of the threads colored red (Sam), pink (Harold), orange
(Michael), light orange (Nick), green (Meg), and light green (Sarah) toward the
center of the force directed graph. The character whose thread veers in the
largest arc from the central group is Chloe, Alex’s girlfriend who did not
attend college with the rest of the group and speaks far less than the other
characters.

Figure 6.

ScripThreads force directed graph of The Big
Chill (1983), written by Lawrence Kasdan and Barbara
Benedek.

3.3. Absence Graph

In ScripThreads’ "absence graph," the x-axis measures presence and absence.
A thread’s distance from the center of x-axis conveys length of absence as a
character — measured forward and backward in time. The resulting visualization
can be read like a bus map: characters run parallel routes when they both appear
in a scene. When a character is not in a scene, his or her bus route splits off.
The Big Chill’s absence graph (Figure 7) calls our attention to the purple thread
of Richard, who is Karen’s husband and an outsider from the core group of
college friends. Karen’s sub-plot in the film centers on her decision about
whether to stay with Richard or leave him for her college boyfriend, Sam. As the
purple thread’s wide arcs reveal, Richard is absent for most of the film.
However, the graph also shows that he is more important to the narrative than a
character who only matters to a single scene.

Figure 7.

ScripThreads absence graph of The Big
Chill.

3.4. Presence Graph

The "presence graph" provides a quick glance as to when a character is
active in a scene. The size of the thread is wider when the character is active
and is smaller when the character is not active. Time is shown in the y-axis
from top to bottom. Horizontal lines indicate dialog between the characters.

If we return to The Big Chill, this screenplay’s
presence graph (Figure 8) helps us see that the
storytellers do not treat the core group of friends with equal emphasis. The
male characters of Sam, Harold, Michael, and Nick speak in more scenes and
appear with greater frequency than any of the female characters (the character
statistics CSV supports this claim). The character of Sam (red thread) is
integral to advancing the plotlines of multiple characters and helps motivate
transitions between scenes. However, his love interest, Karen (blue), is largely
absent unless the entire group comes together for a scene or the focus turns to
her plotline with Sam.

Figure 8.

ScripThreads presence graph of The Big
Chill.

3.5. Increasing Graph

The increasing graph is useful for communicating, in a single image, character
activity and storytelling techniques across the course of a narrative. Unlike
the force directed and convergence graphs, ScripThreads’ increasing graph is not
rooted in the network theory. Perhaps for this reason, though, we’ve found that
Humanities researchers unfamiliar with networks tend to find the increasing
graph the fastest to grasp and interpret.

The increasing graph rotates the axes from the convergence graph: the x-axis
becomes time and the y-axis becomes character presence. If a character is
present in a scene, then his or her colored thread vertically increases. If a
character is not present in a scene, then her thread remains flat.

The Big Chill’s increasing graph (Figure 9) shows that the Sam and Harold characters
(red and pink) have roughly an equal level of presence throughout the
screenplay. Slightly less present are Michael and Nick (orange and light
orange), who also appear roughly equally, and they, in turn, are followed very
closely by Meg and Sarah (green and light green). The increasing graph
reinforces our earlier observation that the male characters play active roles in
more scenes than the female characters. The Big
Chill’s gender imbalance is small, though, compared to the vast
majority of Hollywood screenplays. Additionally, the proximity of the seven
threads representing the seven college friends is, relatively speaking,
extremely tight. We have yet to find another "ensemble" or
"multi-character" screenplay with such a tight range of presence levels
for this many characters.

Figure 9.

Increasing graphs of three films directed and written or co-written by
Lawrence Kasdan: from left to right, Body Heat
(1981), The Big Chill (1983), and Silverado (1985).

The screenplays that Kasdan wrote and directed immediately before and after
The Big Chill are more typical of American
screenwriting (Figure 9). In most Hollywood
movies, the story centers on the goals of one or two protagonists. These
protagonists — antiheroes, in the case of the neo-noir Body
Heat — are the red and pink threads. The orange threads in Body Heat and Silverado
reveal another common storytelling technique that our graphs help to
see: introducing a character in the first ten pages who will re-emerge in the
second act to increase conflict and complicate the protagonist’s pursuit of his
goals. Interestingly, and fittingly for the western and crime noir genres,
Kasdan made this character an authority figure in both screenplays: the sheriff
in Silverado and the district attorney
friend-turned-threat memorably played by Ted Danson in Body
Heat.

When we looked at hundreds of screenplay increasing graphs, we noticed a
sub-group in which the red thread shoots up diagonally in a straight line, far
exceeding any other thread line (Figure 13). These
graphs are indicative of screenplays that focus on a single protagonist and in
which the protagonist appears in every scene or nearly every scene. We analyze
this pattern in greater depth in Section 5 of this article.

3.6. Scene Stats and Character Stats

ScripThreads also gives users the option to export two different types of data:
scene statistics and character statistics. In the "Scene
Stats" CSV, each row represents one scene, arranged in the order they
occur within the script. The fields (columns) indicate the scene’s number of
lines, number of characters, starting page, ending page, and location (interior
or exterior). In the "Character Stats" CSV, the rows
represent the screenplay’s characters, arranged in descending order of scene
activity. The fields here indicate the character’s number of active scenes,
number of dialogue lines, and percentage of involvement across the film.

4. Close Reading Case Study: Lawrence Kasdan’s Grand
Canyon

Thus far, we have demonstrated that ScripThreads generates visualizations that
reveal storytelling techniques, character interactions, and character activity
within a screenplay. In sharing our work, though, we have been asked: how does
this tool yield knowledge that couldn’t be gained simply through reading the
screenplay, watching the film closely, or turning to the existing body of
scholarship on narratology, cognitivism, and Hollywood storytelling? While we
are enthusiastic about the potential of ScripThreads for distant reading, we
also recognize that the close analysis of individual films will always be an
important activity of film criticism and scholarship. In this section, we model
how ScripThreads can be used as an interpretative tool that enhances — rather
than replaces — the use of narrative theory and the method of close reading. To
model how a scholar’s engagement with ScripThreads can enrich an understanding
of a film’s narrative structure, we will continue our focus on Lawrence Kasdan’s
work and analyze the screenplay for Grand Canyon
(1991).

Figure 10.

The marketing of Grand Canyon (1991), written
by Lawrence Kasdan and Meg Kasdan, emphasizes the film’s ensemble of actors
and invites us to think of the film in relationship to Big Chill (1983).

"In the 80’s he brought us 'The Big
Chill.' Welcome to the 90’s." So reads the tag line on the
movie poster for Grand Canyon (1991), directed by
Lawrence Kasdan and written by Lawrence Kasdan and Meg Kasdan (Figure 10). Grand
Canyon’s marketing suggests a close relationship between it and
The Big Chill, a comparison emphasized further
by both films’ casting of Kevin Kline and emphases on an ensemble of actors. The
question, then, arises — how similar or different are the two films?

The ScripThreads visualizations for Grand Canyon
(1991) show that its scene structure and character interactions are
significantly different from The Big Chill (1983).
The wider thread arcs and non-intersecting threads of Grand
Canyon’s force directed graph (Figure
11) show that there are characters who appear numerous times in the
film but never share the same scene. Davis (green thread) and Deborah (purple
thread), for example, are never present in the same scene. Other characters,
such as Dee (light green) and Claire (pink), are only present once in the same
scene. Whereas The Big Chill is about a network of
old friends who physically reunite in the same space, Grand
Canyon is an example of what some film scholars have referred to as
a "network narrative" — a multi-protagonist film that
follows numerous characters whose lives intersect at different moments [Bordwell 2006, 94–103]. Of course, one does not need a
computer visualization to recognize this difference between Grand Canyon and The Big Chill. From
viewing both films, the difference between the characters gathered spatially in
The Big Chill’s house and the characters
dispersed across the city of Los Angeles in Grand Canyon
is quite apparent. The racially diverse cast of Grand Canyon and the film’s overriding interests in relations
across race and social class are also starkly different from The Big Chill, a film in which no non-white character
holds narrative significance. So, the question remains — what does ScripThreads
offer that simply viewing the films does not?

If we put aside the comparative question and instead focus on the details of
Grand Canyon’s multi-character structure, then
more interesting insights begin to emerge. In The Way
Hollywood Tells It, film scholar David Bordwell writes:

In Lawrence Kasdan’s Grand Canyon
(1991), the married couple Mack and Claire and the brother-sister pair
of Simon and [Deborah] are given roughly equal emphasis… other plotlines
show Mack’s son falling in love with a girl he meets at camp,
[Deborah]’s son being alienated, and Mack’s friend Davis vowing to stop
making ultraviolent movies. The subsidiary characters don’t encounter
all the customary obstacles and setbacks, yet their wants are developed
beyond the limits of a traditional subplot, providing thematic echoes or
counterpoints.
[Bordwell 2006, 96]

When we look at the increasing graph for Grand
Canyon, it is striking to note how uneven the distribution of character
presence is across the film (see Figure 12). The
Kevin Kline character of Mack (red thread) appears in 50% more scenes than any
other character. The next two most active threads are those of Mack’s wife,
Claire (pink), and Simon (orange), the tow truck driver who helps Mack after a
car breakdown in one of L.A.’s worst neighborhoods. Simon’s sister, Deborah
(purple), appears as one of many of the subsidiary characters clustered toward
the bottom. She is present in fewer scenes than either Davis (green) or Mack’s
son Roberto (light orange).

Figure 12.

Increasing graph of Grand Canyon (1991),
written by Lawrence Kasdan and Meg Kasdan.

Does this mean that David Bordwell’s analysis of the film is incorrect? No.
Bordwell never claims that Mack, Claire, Simon, and Deborah are given equal
screen time. Instead, he’s suggesting that the storytelling techniques invite
the audience to think of the characters as equally important. In fact,
Bordwell’s book offers insights that explain the discrepancy between Mack’s
on-screen involvement and the audience’s understanding of Mack as one of
multiple roughly equal characters. Bordwell describes Grand
Canyon in the context of contemporary "ensemble
films" in which "several protagonists are given
equal emphasis, based on screen time, star wattage, control over events, or
other spotlighting maneuvers." The star wattage and spotlighting
maneuvers are especially significant to our interpretation of Grand Canyon as an ensemble drama. As already
discussed, the film’s marketing emphasized the ensemble of actors. In terms of
star power, Danny Glover and Steve Martin had both headlined more commercially
successful movies than Kevin Kline prior to the movie’s release.

The screenplay’s two major spotlighting maneuvers, which occur at the beginning
and end, further encourage us to perceive Mack (Kevin Kline) and Simon (Danny
Glover) as equal in narrative importance. The script’s most important
spotlighting maneuver occurs from pages 6 to 16: the white lawyer Mack’s car
breaks down at night in South Central Los Angeles; armed young black men
approach him and tell him to get out of the car; Simon, a black tow truck
driver, pulls up, tells the armed men that Mack is his responsibility, and takes
Mack back to his house in an affluent neighborhood. The screenplay interweaves
scenes of Mack waiting for the tow truck and scenes of his family back in
Brentwood. When Simon and Mack are leaving South Central, Simon’s line, "My sister and her kids live near here," motivates a
transition to a scene that introduces Simon’s sister and her troubled son. The
Mack and Simon characters are contrasted by their race and social class, yet
treated as equals through the attention to each one’s family and their shared
sense that, in the words of Simon, "the world ain't supposed
to work like this... Everything's supposed to be different than it
is." This loss of faith in the social and moral order —
counterbalanced by the possibility for human kindness, growth, and, even,
miracles — provides the thematic glue for the entire film. The final scene of
Mack, Simon, and their loved ones gazing in wonder at the Grand Canyon
reestablishes our understanding of the narrative parity between the lives of
Mack and Simon. Simon’s question, "What do you
think?" and Mack’s response, "I think… it’s not
all bad. Not at all," provides an affirmative, glass-half-full answer
to the existential questions that have run throughout the 136-page screenplay.

The many scenes that occur between the car breakdown and visit to the Grand
Canyon further encourage the audience to think of "the
married couple Mack and Claire and the brother-sister pair of Simon and
[Deborah]" as "roughly equal [in] emphasis"
[Bordwell 2006]. Bordwell points out that “their lines of action follow Thompson’s
four-part template,” referring to the four act structure that Kristin Thompson
identifies as running through most Hollywood movies, including films with
multiple protagonists [Thompson 1999]. In Grand Canyon, Mack and Simon both move through the four act cycle
of setup, complicating action, development, and climax and epilogue. As Thompson
argues, it is characters in pursuit of goals that defines classical Hollywood
storytelling more than any other feature. The goals often change, and characters
generally have both long-term and short-term goals. Mack and Simon are both
seeking to restore their faith in the universe and humanity. Yet there are a
series of more concrete short-term goals (Mack wanting to return a favor to
Simon), appointments (Simon’s date with Jane), and deadlines (Mack and Claire’s
need to make a decision about the baby) that move these characters through the
four act structure and make Grand Canyon a
Hollywood film rather than one of the existential dramas of Ingmar Bergman.

As our analysis has shown, the ScripThreads graphs can help scholars, critics,
and practitioners better appreciate how storytelling techniques shape the
audience’s perception of a narrative. In the case of Grand
Canyon, the force directed graph (Figure
11) offers an additional insight: the character of Mack functions
structurally as the film’s key bridge node. Mack (red thread) is active in
scenes with nearly all the major characters. He introduces Simon (orange thread)
to Jane (blue thread), with whom Simon becomes romantically linked. More
importantly, Mack motivates our introductions to his co-worker and one-time
lover Dee (light green thread) and movie producer friend Davis (green thread).
Searching for happiness and answers in their own lives, Dee and Davis advance
the central themes of the film. Yet unlike some modern films, these thematically
linked storylines do not occur in isolation to one another. Mack motivates our
introductions to these characters and appears in subsequent scenes with them.
When Mack visits the wounded Davis, the focus is squarely on Davis and his
storyline — encouraging us to think of it as Davis’s scene, despite Mack’s
presence, and furthering the overall notion of Grand Canyon
as an ensemble film. The ScripThreads force directed graph is useful,
then, for reminding us that Mack, as a bridge node, is still vital to the
structuring of multiple storylines that are not his own.

5. Distant Reading Case Study: Locating the Hyper-Present Protagonist

Some research in the Digital Humanities begins with a fixed research question
and a clear process for gathering evidence. But as Sinclair, Ruecker, and
Radzikowska suggest, another important task for the Humanities is "to locate (or discover) new material, with no prior
knowledge of the kinds of details used for retrieval"
[Sinclair, Ruecker, and Radzikowska 2013]. In our case, we tested out the distant reading possibilities of
ScripThreads by using the tool’s “Automate” function to export four types of
graphs (force directed, absence, presence and increasing) and the statistical
CSV files for the screenplays in the corpus that were produced between 1930 to
2006. We then looked at the graph images for patterns that stood out visually
across numerous screenplays. The first pattern that came to our attention was a
sub-group of increasing graphs in which the red thread moves diagonally at a
straight line, advancing higher and straighter than any other thread line (Figure 13). These graphs pointed us toward the
storytelling pattern of what we call the "hyper-present
protagonist" — screenplays featuring a main character who appears
in every scene or nearly every scene.

Figure 13.

Single character increasing graphs for four films with a protagonist who
appears in every scene or almost every scene: from left to right, I Am a Fugitive from a Chain Gang (1932), Across the Pacific (1942), On
Dangerous Ground (1952), and Pi
(1998).

After identifying the pattern, we began searching for all instances of the
pattern both visually and mathematically. We used R to write and execute a
simple algorithm that targeted the exported Character Stats CSV files and
extracted information on the character from each screenplay with the highest
percentage of involvement. ScripThreads’ Character Stats function calculates the
percentage of character involvement by: A) identifying whether a character
appears in a scene — yes or no; B) calculating how much of the screenplay any
given scene takes up as a percentage; C) adding all of the percentage points for
the instances when a character is present. In conducting this analysis, roughly
one quarter of the 935 screenplays did not parse properly and we chose to
disregard their results. We could have opted to use ScripThreads’ Advanced
Settings and gone one-by-one through the screenplays that returned inaccurate
results, adjusting the settings to more clearly identify the way a particular
screenplay notes characters and scene breaks. And if we were using ScripThreads
for close reading, this is exactly what we would have done. However, because we
wanted to test how the ScripThreads prototype performed at scale, we moved
forward in analyzing the reduced corpus of 674 screenplays. As we continue to
improve the tool, we anticipate the rate of screenplays that accurately parse at
the computer’s first pass will increase.

Out of the 674 screenplays, the median percentage of maximum character
involvement was 80% and the mean percentage was 78%. Only 70 of the screenplays
(roughly one-tenth) had a main character present in 94% or more of the
screenplay. This group of 70 screenplays formed the sub-set of screenplays
featuring a hyper-present protagonist that we analyzed in more detail.
Specifically, we were interested in three variables: historical era, genre, and
author. Which, if any, of these variables held the most significance for stories
featuring the lead character in every scene?

5.1 Historical Era

However, the data indicates that this storytelling tradition did not become
prominent in American cinema until the 1940s. Out of the 189 screenplays from
the 1930s that we analyzed, we found only two that clearly featured a
hyper-present protagonist — I Am a Fugitive from a Chain
Gang (1932) and 20,000 Years in Sing
Sing (1932).[3] These two outliers are both social problem films produced by
Warner Bros. during the pre-Production Code era. Both screenplays are also set
in prison contexts and co-written by Brown Holmes, a point we will return to in
our discussion of authorship. Our historical analysis suggests that the
hyper-present protagonist grew far more common in Hollywood movies produced
during and after World War II. Our findings confirm David Bordwell’s argument
that Hollywood writers, directors, and producers innovated new storytelling
techniques during the 1940s that filmmakers have used ever since [Bordwell 2006]
[Bordwell 2013]. Historically, this rise can be understood as part
of the effort of filmmakers and writers during the 1940s to produce films of
greater psychological realism, moral ambiguity, and experimentation in
storytelling.

Ultimately, our historical analysis was a case in which distant reading
confirmed what the leading historians of film style and narrative have argued.
But as Matthew Jockers points out, we should not expect distant reading and
computational literary analysis to always overturn previous understandings.
There is value to "bring[ing] a new type of evidence and a new perspective
to the matter and in so doing fortify…the existing hypothesis"
[Jockers 2014]. Additionally, in this case, our analysis also suggests something
interesting about I Am a Fugitive from a Chain
Gang, a remarkable film about a man whose life is ruined when he is
falsely convicted of crime and sentenced to hard labor on a chain gang in the
American South. Few would dispute that this film was unusual for its day, with
its strong social critique and bleak ending. Our analysis suggests yet another
unusual element that contributes to the film’s power — the protagonist (played
by Paul Muni) is in nearly every scene, increasing the film’s psychological
intensity and sense of claustrophobia.

5.2. Genre

In our analysis of genre, we confirmed certain existing assumptions about film
storytelling and challenged others. Screenwriting guidebooks generally discuss
the detective movie as the genre most likely to be presented as a "closed story," a storytelling strategy that keeps the
audience aligned with the protagonist’s point of view and reveals information to
the audience only at the moments when information is revealed to the
protagonist. In contrast, an "open story" is one in
which the audience learns information that the character does not know [Field 2009]
[Hunter 2004]. As we expected, several examples of the
hyper-present protagonist were detective films, such as Murder, My Sweet (1942), On Dangerous
Ground (1952), and 8MM (1999). Our
analysis turned up other examples of films that were essentially detective
stories, even though the protagonist was not officially a detective, such as
Across the Pacific (1942). And, in the case of
The Siege (1998), our analysis allowed us to
more clearly see the film’s lead investigator character and detective structure,
even though the film was marketed on the basis of its action sequences and
premise (What if the U.S. responded to a terrorist attack by placing New York
City under martial law?).

Interestingly, though, most of the hyper-present protagonist films we found did
not belong to the detective genre. Numerous dramas used the "closed story" and hyper-present protagonist technique to place the
audience firmly in the perspective of a main character and amplify the film’s
psychological intensity (examples include Now
Voyager [1942], Light Sleeper [1992],
Pi [1998], and the previously mentioned I Am a Fugitive from a Chain Gang [1932]). We also
found dramas that were "open stories" yet still
featured the protagonist in nearly every scene. To provide an example familiar
to many readers, the character of George Bailey in It’s a
Wonderful Life (1946) appears in nearly every scene of the film.
When he goes absent, though, the audience learns vital information that creates
a sense of dramatic irony (for example, only the audience and Mr. Potter know
that Uncle Billy accidentally put the bank deposit in Potter’s lap). Moreover,
the first half of the film contains a great deal of voice-over narration that
provides information that George doesn’t know — not the least of which being
that angels in heaven are discussing his plight and reviewing his life. Yet the
film also pivots to a "closed" storytelling mode as we learn
dramatic news at the same as George, such as the stroke suffered by George’s
father or the community’s run on the bank. Our distant reading analysis of genre
helped us distinguish between the presence of a hyper-present protagonist and
the question of how a story conveys information. Additionally, the analysis
reveals how films can move much more fluidly between "open"
and "closed" modes than most screenwriting manuals would
suggest.

If the detective genre is especially fertile ground for the hyper-present
protagonist, then are there other genres in which such a character is rarely
found? We have yet to find any occurrences in the romantic comedy or musical (a
genre that frequently sets a romantic comedy story to song). The romantic
comedy, by its nature, depends on multiple characters and obstacles to delay
their happy union until their end. These conflicts and obstacles are often
rooted in misunderstandings, which require the audience to know information that
one of the characters does not know. Beyond the romantic comedy, we found that
comedy screenplays, in general, almost never feature a hyper-present
protagonist. One explanation might be that writers depend on the protagonist’s
absence to create situations that serve as set-ups for the jokes later delivered
verbally or physically by the protagonist. The very title of A Night at the Opera (1935), one of the comedy
screenplays in our dataset, is premised on the incongruity between the
high-class form of the opera and the low-class antics of its stars, the Marx
Brothers. Numerous sequences are structured to play up this incongruity — moving
between scenes in which the brothers are absent to set up our expectations,
followed by one or more of the brothers entering to disrupt the status quo and
deliver the joke.

Only one comedy screenplay in our dataset featured a hyper-present protagonist.
This outlier was the Jim Carrey comedy Liar Liar
(1997), an example of what is known in contemporary Hollywood as “high concept”
(a movie with a story that can be distilled and marketed to audiences in 25
words or less). In the case of Liar Liar, the high
concept is, "What if a hotshot lawyer could not tell a lie
for 24 hours due to his son’s birthday wish?" The premise enables
screenwriters Paul Guay and Stephen Mazur to generate nearly all the jokes with
the protagonist present. This comedy depends on incongruity, but it’s different
than the incongruity derived from placing the Marx Brothers at fancy restaurants
and the opera. Instead, the incongruity comes from the difference between
Fletcher’s (Jim Carrey’s) compulsive lying before his son’s wish and how he must
adapt after he can no longer tell a lie. This example demonstrates that a film’s
concept (and perhaps the desire of producers to fully capitalize on their highly
paid star) can override the pattern of character presence typical for a
particular genre.

5.3 Authorship

Finally, we explored the question of authorship as it relates to patterns of the
protagonist’s presence. Do some screenwriters have a tendency to write films
that focus on one protagonist and place that character in nearly every scene?
The answer, we found, was yes. As noted earlier, screenwriter Brown Holmes
co-wrote the two outliers we identified from the 1930s, I
Am a Fugitive from a Chain Gang and 20,000
Years in Sing Sing. This finding suggests that Holmes may have had
an especially important role in dramatically structuring and writing the two
films, despite working as a contract writer for Warner Bros. and sharing the
writing credit with co-authors on both screenplays. As we continue our research,
we plan to integrate ScripThreads with other modes of computational analysis and
examine more co-authored Warner Bros. screenplays from the 1930s and 40s. We
believe the result might offer a better sense of the individual contributions of
creative artists on the highly collaborative medium of film. We may find that
some writers contributed especially strongly to a film’s dialogue, while others,
like Brown Holmes, left major contributions to a film’s dramatic structure.

As we examined the AFSO corpus for hyper-present protagonists, one author leapt
out at us for his tendency to structure films with a main character present in
nearly every scene. The AFSO corpus contains the screenplays for fifteen films
that were either written or co-written by Paul Schrader, who is best known for
writing and directing dark, character-oriented dramas, such as American Gigolo (1980) and Affliction (1997), and writing some of the best films directed by
Martin Scorsese, including Taxi Driver (1976),
Raging Bull (1980), and The Last Temptation of Christ (1988). Out of this group of fifteen
films, eight films feature a main character present in 94% or more of the
screenplay, and only two films place the main character in less than 80% of the
screenplay (the two outliers are Obsession [1976],
a Hitchcockian thriller directed by Brian DePalma, and Blue
Collar [1978], which Schrader directed about three auto workers who
rob their corrupt union).

Figure 14.

Increasing graphs of fifteen produced screenplays either written or
co-written by Paul Schrader. The graphs were superimposed onto one another
using ImageMagik.

Schrader’s tendency to frame stories around single characters who are almost
always present can be seen in Figure 14, which
superimposes the fifteen increasing graphs of Schrader’s screenplays onto one
another. The consistency of Schrader’s approach is clear from the cluster of red
lines (each one representing the most present character from a different
Schrader film) thrusting diagonally in a nearly straight line. For a point of
comparison, we can turn back to Lawrence Kasdan. Figure
15 superimposes the seven Kasdan screenplays that are available in the
AFSO corpus. The Kasdan image shows a screenwriter who works across numerous
genres and utilizes a wide variety of storytelling approaches — one film
featuring a hyper-present protagonist (Mumford
[1999]), one multi-character drama with no singular protagonist (The Big Chill [1983]), and several films that fall
in-between.

Figure 15.

Increasing graphs of seven screenplays written or co-written by Lawrence
Kasdan. The graphs were superimposed onto one another using
ImageMagik.

In some ways, Figure 14 provides an illustration
for what film critics and scholars already assume about Schrader: he is a
filmmaker who writes character studies about men who are psychologically and/or
existentially anguished. American Gigolo (1980),
Light Sleeper (1992), and Affliction (1997) are all films that Schrader wrote and directed
that fit this paradigm. Schrader calls his protagonists "existential
heroes" and, in a widely quoted interview with Garry Wills,
remarked, "all my life has been dedicated to the existential
hero, and the existential hero seems to have come to the end of his path,
replaced by the ironic hero"
[Schrader and Wills 2006]. However, there is no inherent reason why an
existential hero needs to appear in nearly every single scene of a movie. As
noted earlier, Lawrence Kasdan’s Grand Canyon keeps
its existential protagonist, Mack, absent for numerous scenes, even though he
appears in far more scenes than any other character. Similarly, the existential
heroes in films directed by Ingmar Bergman and Woody Allen generally exist
within an ensemble of characters. In Bergman and Allen films, the protagonist
will go absent to facilitate scenes focusing on supporting characters, who may
serve as foils to the existential hero, address the story’s major themes, and/or
advance the plot. There is a long history of this practice in literature, as
well. Tolstoy’s Anna Karenina would have far less
to say about love, society, politics, and life’s meaning without the narrative
of Levin and Kitty playing out in parallel to Anna’s story. The ScripThreads’
increasing graph visualizations of Scharder’s work reveal that a defining
characteristic of Schrader’s existential hero is his hyper-presence. The
audience stays with the existential hero throughout the entire film. Ingmar
Bergman makes films that address the existential themes of Sartre and Camus, but
Schrader is the filmmaker who structures his narratives more similarly to their
novels — following the "closed story" model that
keeps the audience aligned with the protagonist’s point-of-view.

Schrader’s best known screenplays follow the hyper-present, existential
protagonist model we have described. However, a more heterogeneous portrait of
Schrader as an author emerges when we examine his twelve unproduced screenplays
in the AFSO corpus. To be clear, these are screenplays written by Schrader that
were never made into films. Figure 16 shows the
superimposed increasing graphs of these 12 screenplays. This group of
screenplays includes four hyper-present protagonist scripts: three
music-oriented biopics (Dream Lover: The Bobby Darin
Story, Eight Scenes from the Life of Hank
Williams, and Gershwin); and one
detective film (The Investigator), a genre that we
know is comparatively likely to have a hyper-present protagonist. What is more
surprising is that seven out of the twelve unproduced screenplays place the
protagonist in less than 80% of the script. And three screenplays make the
protagonist present in less than 70% (Schrader’s two produced outliers, Obsession and Blue
Collar, are both higher at 72% and 75% respectively).

Figure 16.

Increasing graphs of twelve unproduced screenplays written or co-written
by Paul Schrader. The graphs were superimposed onto one another using
ImageMagik.

Schrader’s unproduced project in which the protagonist is least present (59%) is
his script for a retelling of Snow White (Figure 17). Here, we see an interesting dynamic
between an author’s general tendencies and the demands of a particular project.
The Snow White tale depends on dramatic irony — the
audience knows the apple is poisonous, but the protagonist does not. Moreover,
the antagonist needs scenes without the protagonist present to interact with the
magic mirror, one of the most iconic elements of the myth.

The increasing graphs of Schrader’s unproduced work open up a series of
questions that we plan to investigate further. What is the relationship between
genre and authorship in how narratives are structured? How do certain genres or
stories override a screenwriter’s established narrative techniques? Finally, how
do industrial and cultural assumptions about a particular screenwriter shape the
types of projects the writer is offered and that make it into production? By
examining Schrader’s unproduced works at a distance, we can speculate that
industry assumptions about what constitutes a "Paul Schrader
film" — namely, a dark, existential character study — may have
made studios and financiers more reluctant to produce screenplays by Schrader
that fell outside of this paradigm.

6. Conclusion

In this article, we have introduced the ScripThreads tool and demonstrated how
it can be used for closely analyzing one screenplay (Grand
Canyon) and analyzing a much larger group of screenplays to look for
a pattern. ScripThreads parses screenplays and extracts data about character
presence and interactions, then offers users statistical CSV files and a series
of 3D and 2D visualizations that present the data in a time-based manner. The
tool is not without its limitations. ScripThreads does not visualize act breaks,
shifts in spatial location, or a protagonist’s journey toward a particular goal.
Yet the data the tool offers can be quite telling. It can reveal details of
character presence and character co-occurrences that humans are prone to forget
or never remember in the first place.

In using ScripThreads to closely analyze a single film, the continuities and
differences between the viewer’s perception and the computer’s visualizations is
a powerful starting point for uncovering storytelling techniques and better
understanding cognitive reception. In using ScripThreads to analyze a large
group of screenplays, the visualizations and CSV output files allow researchers
to recognize patterns without having prior knowledge of the films. To draw
accurate and meaningful conclusions, though, some domain expertise in film
history is essential (just as a researcher would want some knowledge of 19th-century literature before making arguments about
the century based on topic modeling 100 Victorian novels). Whether applied
toward close ready or distant reading, ScripThreads is meant to help researchers
gain a richer understanding of the text or texts they are studying. This is a
tool to aid Humanities scholars in analysis and interpretation, not a substitute
for screenwriting and criticism.

ScripThreads offers one additional affordance — the ability to quickly visualize
the narrative structure of an unproduced screenplay. Films and television
programs play out their stories visually in sequences of photographed and edited
action; a produced screenplay has already been visualized for the screen.
However, due to the difficulty and expense of making a film, the vast majority
of screenplays are never produced, never visualized in this fashion. The
increasing graphs of Schrader’s unproduced screenplays (Figure 16) provide what may be the first transformative
visualizations of these twelve works. The graphs allow us to quickly recognize
one way (the protagonist’s level of presence) that some of these stories differ
from Schrader’s better known screenplays. What if we could apply a similar
analysis to the screenplay libraries of Hollywood’s studios, producers, and
talent agencies? The results would yield not simply graphs of character
presences, co-occurrences, and absences, but transformative renderings of
thousands of stories that have remained absent from audiences. We would no
longer be visualizing American film history; we would be visualizing a history
that might have been.

Notes

[1] For a brief
overview of the three act structure and Kristin Thompson’s richer four act
structure, see Eric Hoyt’s online Prezi, "Hollywood
Storytelling: 3 Act or 4 Act Structure," 3 October 2013, http://prezi.com/x7fhnbeofobw/hollywood-storytelling-3-act-or-4-act-structure/.
Readers are also strongly encouraged to read Kristin Thompson’s Storytelling in the New Hollywood: Understanding Classical
Narrative Technique (1999).

[3] We should acknowledge that more screenplays from
the 1930s failed to parse properly than screenplays from any other decade.
Nevertheless, we stand by our analysis and claim that the ever-present
protagonist became more common in American cinema beginning in the
1930s.