Post navigation

Visualizing Procgen Text

Lately I’ve been aggressively telling everyone I know to do more visualization of the systems they’re building, and thinking about what that might mean for the procedural text experiments I’ve been up to.

If you’ve played with The Mary Jane of Tomorrow, you’ve probably noticed that some of the poems it generates show a lot more variation and richness than others. That’s easy to observe, but can we quantify it? Can we draw observations about what level of complexity is the most satisfying, and identify portions of the game where more or less complexity would improve the experience? If we’re procedural text artists, how do we decide where to direct our attention?

One of the fun/horrifying things about this particular medium is that it pretty much never feels like you’re done: it would always be possible to add more corpora and more variant templates. Both Mary Jane and Annals of the Parrigues came to an end because I needed to move on to other work, not because I couldn’t think of anything further to add. But one thing I might want from a procgen text tool is help discerning where the biggest blank spots are currently.

The first step towards visualization is of course figuring out what aspects of the system you might want to look at, and in fact I often find “how would I draw a picture of this?” to be a good way of making myself think about the salient qualities of a particular system. Here are the things I decided I wanted to know about the text in Mary Jane:

Size of component phrases: how long is the smallest atom of text in a given composition? When you see something in the text, was that produced by a human or is it the juxtaposition of several pieces selected algorithmically? This is very varied in Mary Jane, with some poems or conversation options picking entire sentences, and other selections being just a word or two long. (Parrigues goes even further and composites town names from constituent syllables, but Mary Jane isn’t going that far.)

Number of alternatives: if a phrase was picked from a list, how many other options were there? Number of options is going to determine how unique a particular element is in your experience of the text.

Salience of selected phrase: why did we pick this piece? How many pieces of information about the world model did it have to match in order to be selected? (And to combine those two points, if we picked a highly salient phrase, how many other options were there of equal salience?)

For Mary Jane, this is slightly tricky. There are variants that are handled with [or] in Inform, as in

Introducing a way to count the variants such as Lime/Orange/Banana/etc would have involved tinkering with Inform at a deeper level than I was willing to do. There are also text filters that convert “the” to “le” to get that aggressively bad franglais effect, for instance. However, a lot of Mary Jane‘s text generation — and all of the generation that’s connecting text with a world model — is going through routines where I am able to attach some tags. So I started out by adding markup that puts each sub-piece in parentheses, understanding that it will still miss some nuances.

In {} I have { how many tags this piece of text matched / how many other pieces of text were also eligible to be selected / how many pieces were available in total, including those that were ineligible due to their tags not matching }.

With that markup, here’s what happens when you ask the robot to bake Snickerdoodles and she knows some recipes, but not baking safety:

(The Pine Nut Queen of Tomorrow sounds exasperated. {1/1/10})”I’m not sure that’s a good idea. I’m not very familiar with the stove. ((What would you say to {0/1/2}) ((((Tuna {0/1/5})-(Ham {0/1/5}) {0/6/6}) Divinity {1/4/6}) {1/1/4})? {0/2/6})”

She’s exasperated because she’s feeling ennui currently — { ennui, pos } is the single tag matched in her table of gestures, but when she does feel ennui, that rules out all other options.

“What would you say to…” is a random selection from several phrasings of the same question. “Tuna” and “Ham” are generic meats picked off a meat list, and their creepy juxtaposition into Tuna-Ham is a random option from 6 ways of describing meats. “Divinity” is a 1950s dish name that can be selected only when she has no snobbery, so there’s one point of salience there.

There’s a lot of variation in this example, but no salience on most of the selections except the outer one, tagged {3/1/8}: There are eight possible poem forms for the robot to use, and only one of those forms (beat poetry) matches her current world state, with three tags matched. (She has to know about poetry; she has to dislike rhyme; she has to be feeling ennui.)

The tree names, on the other hand, are picked totally at random from a list of some 170 timber trees. They could be anything, and it’s pretty likely that you won’t see their names recur much or at all in the course of play, but they also don’t resonate all that much.

Now a limerick:

“(((((I once knew {0/5/5}) (a young gars {2/3/6}) {0/3/3}) de Dijon {2/27/27})
(who in matters of wit was far gone {2/2/42})
( He was careless with fire
dynamite and barbed wire {2/5/57})
(And he took out a loan for a lawn. {2/3/86}) {0/1/4}) {2/3/8})”

That first line has loads of free randomness in it. The “young gars” and “Dijon” are selected because of our Frenchness, but mostly these are layered, low-salience picks with many nested templates.

By the time we get to the second line, though, we’re suddenly very constrained. We’ve selected both our rhyme scheme and our subject-matter, and that means that even though the tables have lots of possible options, only a handful of those options are applicable in the current circumstances.

The final {2/3/8} means that doing a limerick at all meets two world-model conditions (we know rhymes but we’re actually not much good at poetry) and that there were a total of three poem styles we could have invoked in the current situation.

“Taffy” comes from a list of sweet foods: it has no special salience, but only 2 of the 4 options in the table are currently available, which means that there are some tagged sweet foods but that they don’t match the current conditions. “Glorious” comes from a massive list of adjectives, but there is no tagging on that list at all (453 of 453 adjectives are available). (My favorite generation from this template, btw, is “Roses are red / Violets are blue / Honey is spineless / and so are you,” a post-breakup poem for the ages.)

“Seraphic” comes off a huge, untagged list of positive adjectives for personalities. “Braised butterflied canard,” on the other hand, is (believe it or not) a meticulously curated recipe choice based on the robot’s level of cooking and heating skill, her current snobbery, and her familiarity with French meats.

And here is a cynical poem about the hypocrisy of communists, informed by the robot’s interest in France and advertising:

Of the bits we’ve looked at so far, I think this one comes the closest to an aesthetically pleasing level of variation. There are a couple of variant elements per line, but not more; a slight majority of these variant elements are salient rather than random.

“Diced Spam Pie” has an okay level of variation if a procedurally generated recipe name is all we’re looking for: it requires just enough thought to be interesting. However, it’s overkill as a metaphor for a heart: does it matter that it’s a pie as well as diced? “Heart like diced spam” might sort of work; “heart like a diced spam pie,” not so much. (I talk more about variation levels in the “Venom” section of the Parrigues appendix.)

As for the “Kill fire” line, that’s salient only in that the robot knows about fire safety; it has no thematic ties to what came before.

This introduces the possibility of a procedural heuristic that would generate large numbers of poems and auto-detect ones that best conform to my aesthetic preferences — something lots of computational creativity approaches include, but one that I haven’t particularly attempted in my own procgen work so far.

Of course, I promised visualization and what I’ve actually done is produce an eye-watering markup that is annoying to read. I have more thoughts about how to convert this into something pleasurable to look at, but that will have to wait for another post. But we’ve done step one, which is figuring out the data to extract in the first place and verifying that it is pertinent in determining whether the generated content is hitting my goals for it.

Elsewhere: VisLing has some neat visualizations of language and linguistics phenomena, though none that seem to overlap what we’re doing here.