Machines are not “cool” enough… yet!

Machines have beaten humans at chess, Jeopardy!, Go, and many other tasks, and they are steadily coming for our jobs. Is there a place where we can hide from them? Yes, for the time being, there is one: the “gutters” between panels in comics(1). Let me explain myself.

Comics tell stories using a sequence of juxtaposed panels of images, often annotated with text. Speech balloons, captions, and onomatopoeia indicate dialogue, narration, sound effects, or other complementary information. Size and arrangement of panels contribute to narrative pacing. Text and image are often intricately woven together to tell a story that neither could tell on its own. In comics, most movements in time and space are hidden in the “gutters” between panels. In summary, comics are the realm of ellipsis:

A comics creator can condense anything from a centuries-long intergalactic war to an ordinary family dinner into a single panel. But it is what the creator hides from their pages that makes comics truly interesting, the unspoken conversations and unseen actions that lurk in the spaces (or gutters) between adjacent panels.

To follow the story, readers logically connect panels together by inferring unseen actions through a process called “closure”, the “phenomenon of observing the parts but perceiving the whole”(2)

The reader observes two separate panels and mentally pieces together what happens in between them, even though there is no panel containing what actually happened in between. Closure in comics is why comics falls under the category of cool media: Comics requires the reader to be constantly interacting with visual aspects and filling in the gaps between them, whereas in film (a hot medium), two actions are connected visually by the medium itself, rather than mentally by the user, creating a seamless effect.

Can machines do the same? Not yet.

Computers can describe what is explicitly depicted in natural images, but they cannot understand the closure-driven narratives conveyed by stylised artwork and dialogue in comic book panels. To reach that conclusion, Mohit Iyyer and collaborators(1) have assembled a dataset—COMICS—with roughly 1.2 million panels drawn from almost 4,000 publicly available comic books published during the “Golden Age” of American comics (1938–1954). An in-depth analysis of COMICS shows that neither text nor image alone can tell a comic book story, so a computer must understand both modalities to keep up with the plot.

To test machines’ ability to perform closure, understand narratives and characters, they have devised three tasks, and they have used four neural architectures to examine the impact of multimodality and contextual understanding via closure. All of these models perform significantly worse than humans on the three tasks.

In short, for computers to reach human performance, they will need to become better at leveraging context.

One final reflection. If it is our ability to understand “cool” media what still gives us an edge over machines, and thinking specifically about how best humans and machines can collaborate, our present emphasis on using video and other “hot” media, specifically among young people, does not seem the best way to leverage our “natural” strengths. Quite the contrary. Maybe you should consider reading more manga and switching off YouTube!