For the past week, I’ve been running a port of the Wavenet algorithm to generate poems. A reasonable training result emerges in about 24 hours, — a trained model that can generate immense amounts of text relatively quickly. On a laptop. (Code:github). By reasonable I mean the poems do not have any real sense, no sentient self, no coherent narrative, nor epic structure. But they do have cadence, they do not repeat, new words are plausible, and they have adopted a scattered open line style characteristic of the late twentieth century corpus on which they were trained. Much more lucid than Schwitters’ Ursonate, output is reminiscent of Beckett’s Not I : ranting incandescent perpetual voice.

Results

Remember, these are evolutionary amoebas, toddlers just learning to babble. The amazing thing is that without being given any syntax rules, they are speaking, generating a kind of prototypical glossolalia poem, character by character. Note: models are like wines, idiosyncratic reservoirs, the output of each has a distinct taste, — some have mastered open lines, others mutter densely, many mangle words to make neologisms — each has obsessions. The Wavenet algorithm is analogous to a winery: its processes ensure that all of the models are similar. Tensorflow is the local region; recursive neural nets form the ecosystem. The corpus is the grapes.

Edited micro poems

Code

Python source code + a few trained models, corpus and some sample txt: on github which will be updated with new samples and code as it emerges.

Details

The Model number refers to how many steps it trained for. Skip channels weave material from different contexts. On this corpus, larger skip channels produce more coherent output. Dilations refer to the size of the tensors of the encoder-decoder: eg. [ 1, 2, 4, 8, 16, 32, 64, 128, 256, etc… ] Higher values up to 1024 seem to be of benefit, but take longer to train. Loss is the mathematical calculation of the distance between the goal and the model; it is a measure of how tightly the model fits the topological shape of the corpus; as models are trained, they are supposed to learn to minimize loss; low loss is supposed to be good. For artistic purposes this is questionable (I describe why in see next section). For best results, in general, on this corpus: 10k to 50k steps, 1024 dilations, a skip channel of 512 or more, and (most crucial) loss less than 0.6.

Learned

Loss is not everything. An early iteration model with low loss will generate cruft with immense spelling errors. Thousands of runs later, a model with the same loss value will usually produce more sophisticated variations, less errors. So there is more going on inside the system than is captured by the simple metric of loss optimization. Moreover if the system is about to undergo a catastrophic blowout of loss values, during which the loss ceases to descend toward the gradient and exponentially oscillates (this occasionally occurs after approx 60k steps). Generated text from poems just before that (with good loss values below 1.0 or even excellent loss values below 0.6) will produce some ok stuff interspersed with long periods of nonsense or —— repeated **** symbols. These repetitive stretches are symptoms of the imminent collapse. So loss is not everything.Nonsense can be a muse. Mutating small elements, editing, flowing, falling across the suggestive force of words in raw tumult provides a viable medium for finding voice. Continue reading →

Many of these words are new and never-seen-before. It’s like having Kurt Schwitters on tap.

mengaporal concents
typhinal voivat

the dusial prespirals
of Inimated dootion

Here is my bill for over 11 hours of GPU cluster time:

Neural nets learn through exposure to words just like babies. The more they hear, the better they get. For preliminary testing of the code, a 2-layer 256-cell LSTM neural net was trained on a source text of moderate size: a draft of my book, in a 430kb text file format. So it’s as if a baby was exposed to only critical literary theory, no kids books, no conversation, just theory.

The results can be very very strange:

in a way of stumalized scabes occurs forms. and then paradigm for some control implicit and surjace of need)

And they evolve over time from simplicity:

and conceptual poetry and conceptual poetry in the same that the same that the same that the same that has been are conceptual poetry and conceptual process of the static entity and conceptual sourd subjects that in the same of the same that digital poetry in the static and conceptual sourd source of the static entity and conceptual sourd source of the static entity and conceptual poetry in the station of the static environment for the station of a section of a section of a section of a section of a section of a section of a section of a section of a section of a section of a section of a section of a section of a section of a

… to complexity

u archetypht]mopapoph wrud mouses occurbess, the disavil centory that inturainment ephios into the reputiting the sinctions or this suinncage encour. That language, Y. untiletterforms, bear. and matter a nalasist relues words. this remagming in nearogra`mer struce. (, things that digital entibles offorms to converaction difficued harknors complex the sprict but the use of procomemically mediate the cunture from succohs at eyerned that is cason, other continuity. As a discubating elanted intulication action, these tisting as sourdage. Fore?is pobegria, prighuint, take sculptural digital cogial into computers to Audiomraphic ergeption in the hybping the language. /Ay it bodies to between as if your this may evorv: and or all was be as unityle/disity 0poeliar stance that shy. in from this ke

It is important to recognize, the machine is generating words character by character, — it has never seen a dictionary; it is not given any knowledge of grammar rules; it is not preloaded with words or spelling. This is not random replace or shuffling, it is a recursive groping toward sense, articulate math, cyber-toddler lit. And it is based on a 9 hour exposure to a single 200 page book with an intellectual idiom. More complex nets based on vast corpuses trained over weeks will surely produce results that are astonishing.

Yet I somehow feel that there is a limit to this (LSTM) architecture’s capacity to replicate thought. Accuracy is measured through a loss function that began at 2.87 and over 46 runs (known as epochs) loss descended to 0.2541. A kind of sinuous sense emerges from the babble but it is like listening to an impersonation of thought done in an imaginary language by an echolaliac. Coherency is scarce.

E-lit Celebrity Shout-Out

Of course as the machine begins to learn words (character by character assembling sense from matrices of data), familiar names (of theorists or poets cited in my book) crop up among the incoherence.

in Iteration 15 at diversity level 0.2:

of the material and and the strickless that in the specificient of the proposed in a strickles and subjective to the Strickland of the text to the proposing the proposed that seem of the stricklingers and seence and the poetry is the strickland to a text and the term the consider to the stricklinger

mules and the technology, the created and the tractions and the tractional artically of the traction of the tractical processe of the prectional and and and structured the entional the eractions of the the tractions and the tractions of the termore the the creative of the ention of the metach of the interallention of the termore and the entions of the created the tractions and structured the media and contempler the tractically, and in the eranted the tractions of the metach of the contempler in

The network knows nothing of language.

Sense emerges from feedback.

Using Theano backend.
Source text on which the neural net is trained: MITPressDRAFT

Technical process: the following poems were produced using a 10,000+ corpus of poems used as templates. Each poem has been sent to Alchemy API to produce entity-recognition, POS, and sentiment reports. That analysis influences replacement algorithms. Replacement uses NLTK synsets and Pattern.en and a reservoir of words found in the corpus that do not have synonyms.

In short, 10000 poems are
transformed by algorithms
into millions of poems
at an extremely rapid rate.

The reader must then
find a way to convert this
spew into spoken word.

Once again: there is much magic in the math. The era of numeration discloses a field of stippled language. Songlines, meridians, tectonics, the soft shelled crab, a manta ray, a flock of starlings.

In the image below, each dot is a poem. It’s position is calculated based on an algorithm called t-SNE (Distributed Stochastic Neighbour Embedding)

The image above is beautiful, but it’s impossible to know what is actually going on. So i built a interactive version (it’s a bit slow, but, functions…) where rollover of a dot reveal all the poems by that author.

Screengrabs (below) of the patterns suggest that poets do have characteristic forms discernible by algorithms. Position is far from random; note, the algorithm did not know the author of any of the poems; the algorithm was fed the poems; this is the equivalent of blind-taste-testing.

Screen Shot 2014-08-24 at 3.33.31 pm

Screen Shot 2014-08-24 at 3.30.48 pm

Screen Shot 2014-08-24 at 3.32.46 pm

Screen Shot 2014-08-24 at 3.33.07 pm

Screen Shot 2014-08-24 at 3.33.31 pm

Screen Shot 2014-08-24 at 3.33.57 pm

Screen Shot 2014-08-24 at 3.34.16 pm

Screen Shot 2014-08-24 at 3.34.30 pm

Screen Shot 2014-08-24 at 3.34.54 pm

Screen Shot 2014-08-24 at 3.35.23 pm

Screen Shot 2014-08-24 at 3.35.33 pm

Screen Shot 2014-08-24 at 3.36.40 pm

Screen Shot 2014-08-24 at 3.37.02 pm

Screen Shot 2014-08-24 at 3.52.44 pm

Screen Shot 2014-08-24 at 3.53.37 pm

Screen Shot 2014-08-24 at 3.53.55 pm

Screen Shot 2014-08-24 at 3.53.27 pm

Still these images don’t tell us much about the poems themselves, except that they exist in communities. That the core of poetry is a spine. That some poets migrate across styles, while others define themselve by a style. The real insights will emerge as algorithms like t-SNE are applied to larger corpus, and allow nuanced investigation of the features extracted: on what criteria exactly did the probabilities grow? What are the 2 or 3 core dimensions?

What is t-SNE

My very basic non-math-poet comprehension of how it works: t-SNE performs dimensionality reduction: it reduces the numbers of parameters considered. Dimensionality reduction is useful when visualizing data; think about graphing 20 different parameters (dimensions). Another technique that does this is PCA: principal component analysis. Dimensionality reduction is in a sense a distillation process, it simplifies. In this case, it converts ‘pairwise similarities’ between poems into probability distributions. Then it decreases the ‘entropy’ using a process of gradient descent to minimize the (mysterious) Kullback-Leibler divergence.

To know more about the Python version of t-SNE bundled into sklearn, read Alexander Fabisch

One of the few parameters I bothered tweaking over numerous runs is appropriately named) perplexity. In the FAQ, LJP van der Maaten (who created t-SNE) wrote:

What is perplexity anyway?

Perplexity is a measure for information that is defined as 2 to the power of the Shannon entropy. The perplexity of a fair die with k sides is equal to k. In t-SNE, the perplexity may be viewed as a knob that sets the number of effective nearest neighbors. It is comparable with the number of nearest neighbors-k that is employed in many manifold learners.

Using Python (Anaconda), NLTK, WordNet, Alchemy, pattern.en, and pyenchant
to analyze and perform word replacement
on a corpus of 10,119 poems scraped from the PoetryFoundation
and generate 7,769 poems in approx. 2 hours and 30 minutes.

This is a real-time hour-long screen-grab output
of the trace window in SublimeText
as the poetry-gen program runs.

Markov chains are one of the traditional tricks in the NLP playbook. They are the apple pie-chart of text-generation.

Basic process: given a source text, find words that are neighbours, if you know the neighbours of a word, you can form a chain if you wish. [(“you”),(“know”,”can”,”wish”)] and reconstruct a text which contains pairs (bigrams) from the source.

So I did that using as source texts: Charles Bernstein Dark City and Rough Trades. (Found on Bernstein’s EPC author page).

The result is an example of what Charles Hartman might refer to as newbie-augmented-cyborg-poet (dead simple technically, but satisfying artistically since the volume of generated texts from which new verses can be hand-crafted is massive). This sort of auto-suggest based-upon-a-corpus technique radically shifts the dimensions of creativity: in the first ‘modified’ example I edited the output, adding words, disguising some obvious quotations from Bernstein, truncating verses, changing lines, modulating rhythms. In the raw output below, it’s just the computer (fueled by Bernstein’s berning phrases), it could go on infinitely given a large enough corpus.

Poetry is both the easiest and the hardest to generate. Since non-linear deflections and word-riffs are an aspect of contemporary poetry, slamming together ripe fertile conjunctions is easy. Migrating toward a sensitive, complex, experiential and contextual lived poetry is the real challenge (I didn’t even begin to touch it here).