Clojure with Carin Meier

Updated: Aprile 7, 2015

We are going to explore Clojure by creating a fun project together. In particular, we will create a twitter bot that creates its text based on a mashup of Edward Lear’s poetry, and a goodly selection of functional programming text taken from Wikipedia.

Why Edward Lear and Functional Programming? First, because I really enjoy his poetry. I fondly remember reading his poetry to my children. Some of my favorite poems are The Pobble Who Has No Toes, The Quangle Wangle’s Hat, and The Jumblies. The whimsical nature of his poetry, like his contemporary Lewis Carroll, have great appeal to me. It is only natural that I should want to combine it with my other love, functional programming. In fact, I feel that some of of terms in functional programming like monad and functor, could fit right in with Edward Lear’s Nonsense Songs. This humble bot aims to unite the spheres of functional programming and nonsense poetry.

This tutorial will start with getting started with a basic Clojure project and editor. Then, we’ll build up our tweet generator with a Markov Chain. Finally, we will deploy our code to Heroku and hook it up to a twitter account, where it will live and tweet all on its own.

Since this walk through is geared to explain how I work in particular, we will start my essential ingredient to any coding project … tea. I brew myself a cup of PG Tips tea with a splash of milk, then I sit down and fire up my trusty editor Emacs.

Emacs is a lifestyle

Emacs is more than an editor, it is a lifestyle. I also admit that the learning curve is steep. I actually only know about 4% of Emacs. This is completely normal given that the learning curve for the editor looks like a squiggly curlicue.

Nevertheless, once I started using Emacs for Clojure and experienced the interactive nature of the code and the REPL, (Read Eval Print Loop), I was hooked. I use a customized version of Emacs Starter Kit. I also find the Solarized Color-scheme a must for my eyes. For Clojure code, I use Cider for Emacs, which gives me the incredible interactive code experience that I was mentioning. If you are looking to try out Emacs I would recommend getting the starter kit and grabbing a good tutorial like this one.

Now that we have our tea and Emacs editor open, It is time to actually get our Clojure project created. For this I use Leiningen.

Getting the basic project setup

Leiningen helps you create, manage, and automate your Clojure project. If you don’t already have Leiningen installed, follow the install instructions and download it. We are going to call our project markov-elear, so to create a project we just type the lein new command at our prompt:

lein new markov-elear

This will create a basic project skeleton for us to work with. Next, cd into the directory.

cd markov-elear

The default src file that it creates is src/markov_elear/core.clj. This is the first thing to change. We want a more meaningful file name. For our purposes, let’s rename it to src/markov_elear/generator.clj.

mv src/markov_elear/core.clj src/markov_elear/generator.clj

There is also a skeleton test file that is created in test/markov_elear/core_test.clj. We will want to do the same thing to it as well.

Fantastic. Our project is all set up. We are ready to jack-in with Emacs and Cider and start coding.

Cider Jack In and Experiment

Here is where we start to use the interactive nature of Clojure and Emacs in earnest. With the generator.clj file open in Emacs, type M-x cider-jack-in. This will start a nREPL server for our project, so we can actively start to experiment with our code. This early stage is a bit like playing with putty before sculpting. It allows us to quickly try out different approaches and get a feel for data constructs to use. For example, first put your cursor after the namespace form and hit C-x C-e to evaluate the form.

At this point, you can put your cursor at the end of the form and again hit C-x C-e you will see the result 2 appear in the mini-buffer at the bottom of the screen.

Now, we are ready to experiment with Markov Chains. The first thing we need is some small example to play with. Consider the following text.

"And the Golden Grouse And the Pobble who"

To construct a Markov Chain, we need to transform this text into a chain of prefixes and suffixes. In Markov chains, the length of the prefix can vary. The larger the prefix, the more predictable the text becomes, while the smaller the prefix size, the more random. In this case, we are going to use a prefix size of 2. We want to break up the original text into chunks of two words. The suffix is the next word that comes after.

This table becomes a guide for us in walking the chain to generate new text. If we start at a random place in the table, we can generate some text by following some simple rules.

Choose a prefix to start. Your result string starts as this prefix.

Take the suffix that goes with the prefix. Add the suffix to your result string. Also, add the last word of the prefix to the suffix, this is your new prefix.

Look up your new prefix in the table and continue until there is no suffix.

The result string is your generated text.

From our table, let’s start with the prefix the Pobble.

Our starting prefix is the Pobble. Our result string will be initialized to it.

Look up the prefix in the table. The suffix that goes with it is who. Add the suffix to the result string. The new prefix is the last word from the prefix and the suffix. So the new prefix is Pobble who.

Look up up the prefix in the table, the suffix is nil. This means we have reached the end of the chain. Our resulting text is the Pobble who.

Things get interesting when there is more than one entry for a prefix. Notice that And the is in the table twice. This means that there is a choice of what entry to use and what suffix. We can randomly choose which one to use in our Markov Chain walk. As a result, our text will be randomly generated. If start with the prefix And the we have different possibilities for the resulting text. It could be

And the Pobble who

And the Golden Grouse And the Pobble who

And the Golden Grouse And the Golden Grouse And the Pobble who

And the Golden Grouse And the Golden Grouse And the Golden Grouse And the Pobble who

etc…

Since we could get into repeating chains, we should also put a terminating condition of the total length of our resulting text as well.

Now that we know the general idea of what we want to do, let’s start small and start experimenting.

Baby steps

First, let’s take our example text and put it into code to play with in the REPL.

This is nice, but we really need to get it into a word-chain format. Ideally it would a map with the prefixes as the key and then have a set of suffixes to choose from. So that the prefix of And the would look like

Tangible turn to tests

We have been experimenting in the REPL, but now that we have a feel for where we are going it is time to write some tests. I really like to use the lein-test-refesh plugin. It will continually rerun the tests whenever we change something in our files. I find the feedback loop is much faster then running lein test alone. It also takes care of reloading all the namespaces for you, so I don’t run into problems where my REPL environment gets out of sync with my code. To add it to your project, simply add the following to your project.clj file.

As you save the file, you will notice the test failing in your lein test-refresh window. This is because we haven’t written the word-chain function yet. After all of our experimentation, we know exactly what we need to do. Add the following function to your generator.clj file.

What about generating the word chain from an string of text? When we were experimenting in the REPL, we saw that using parition-all was going to be useful. Let’s add a test for that now in generator_test.clj. We want to parse an input string that has spaces or new lines.

It takes the prefix and get the suffixes associated with it. If there are no suffixes, it terminates and returns the result. Otherwise, it uses shuffle to pick a random suffix. Then it constructs the new prefix from the last part of the current prefix and the suffix. Finally, it recurs into the function using the new-prefix and adding the suffix to the result.

We have another passing test, but we still need to consider the other walking of the chain where it has a choice. Go ahead and add a test for that now too.

Because we have randomness to deal with, we can use with-redefs to redefine shuffle to always return the original collection for us. We also need to deal with repeating chains. We will have to give it another termination condition, like a word or character length for termination. Since our bot is destined for twitter, a 140 char limit seems reasonable.

Note: The test will actually run forever since it is stuck in an endless loop. You will have to restart your test-refresh session after you implement the solution.

Adjusting our generator.clj, we first need a helper function that will turn our result chain into a string with spaces, so that we can count the chars and make sure that they are under the limit. We will call it chain->text.

We check the result-char-count and the chosen suffix-char-count before we recur, so that we can ensure that it doesn’t go over 140 chars. If it is going to go over the limit, we return the result and do not recur.

What we need now is another higher level function that, when given a prefix and a word chain, will return the resulting text.

Taking A Start Text Phrase, Walking the Chain, and Returning Text.

Going back to the generator_test.clj file, let’s go ahead and write the test. We will use with-redefs again to control our randomness.

To make the test pass in our generator.clj file, we create the function that will take a start-phrase as a prefix and a word chain. Then it will split the start-phrase by spaces, so that it will match up to our prefix keys. Next, it will use walk-chain to get the resulting text chain. Finally, it will turn the result text chain into plain text with chain->text.

We can take an input phrase and word chain and generate some new text by taking a random walk in the chain.

What we are missing is a way to train our bot, by reading in some files of text and building out the chain that it will walk.

Training the bot by reading input files

To train our bot, we need to be able to give it a text file and have it turn it into a word chain. Our first text selection will be from The Quangle Wangle’s Hat.

Making it easier on ourselves, we will do some slight formatting of the text. Save it in a file called resources/quangle-wangle.txt.

On the top of the Crumpetty Tree
The Quangle Wangle sat,
But his face you could not see,
On account of his Beaver Hat.
For his Hat was a hundred and two feet wide,
With ribbons and bibbons on every side,
And bells, and buttons, and loops, and lace,
So that nobody ever could see the face
Of the Quangle Wangle Quee.
The Quangle Wangle said
To himself on the Crumpetty Tree,
"Jam, and jelly, and bread
Are the best of food for me!
But the longer I live on this Crumpetty Tree
The plainer than ever it seems to me
That very few people come this way
And that life on the whole is far from gay!"
Said the Quangle Wangle Quee.
But there came to the Crumpetty Tree
Mr. and Mrs. Canary;
And they said, "Did ever you see
Any spot so charmingly airy?
May we build a nest on your lovely Hat?
Mr. Quangle Wangle, grant us that!
O please let us come and build a nest
Of whatever material suits you best,
Mr. Quangle Wangle Quee!"
And besides, to the Crumpetty Tree
Came the Stork, the Duck, and the Owl;
The Snail and the Bumble-Bee,
The Frog and the Fimble Fowl
(The Fimble Fowl, with a Corkscrew leg);
And all of them said, "We humbly beg
We may build our homes on your lovely Hat,--
Mr. Quangle Wangle, grant us that!
Mr. Quangle Wangle Quee!"
And the Golden Grouse came there,
And the Pobble who has no toes,
And the small Olympian bear,
And the Dong with a luminous nose.
And the Blue Baboon who played the flute,
And the Orient Calf from the Land of Tute,
And the Attery Squash, and the Bisky Bat,--
All came and built on the lovely Hat
Of the Quangle Wangle Quee.
And the Quangle Wangle said
To himself on the Crumpetty Tree,
"When all these creatures move
What a wonderful noise there'll be!"
And at night by the light of the Mulberry moon
They danced to the Flute of the Blue Baboon,
On the broad green leaves of the Crumpetty Tree,
And all were as happy as happy could be,
With the Quangle Wangle Quee.

We can now use clojure.java.io/resource to open the file and slurp to turn it into a string. From there, we can simply use our text->word-chain function to transform it into the word chain that we need. Add the process-file function to the generator.clj file and give it a try in the REPL.

Also, I want to fix a bit of the punctuation of the generated text. In particular, I want to trim the text to the last punctuation in the text. Then, if it ends in a comma, I want to replace it with a period. If there is no punctuation, I want to drop the last word and add a period. I also want to clean up an quotes that get escaped in the text.

We now have a function that will generate tweets for us. The next step is to hook it up to a Twitter account so that we can share our smiles with the world.

Hooking the bot up to Twitter

To hook up our bot to twitter, you need to create a twitter account. Once you do that, need to do the following:

Go to https://apps.twitter.com/ to create new twitter application. You will want to set the permission so that it can post to the twitter account. This will give you a Consumer Key (API Key) and a Consumer Secret (API Secret).

Go to the the Keys and Access Tokens section of the application. On the bottom half there is a button that says Create my access token, click it. It will generate two more key pieces of information for you: Access Token and Access Token Secret.

Please note that these setting are sensitive and should not be checked into github or shared publicly. To help make our twitter account access, we are going to need the help of two libraries. The first is twitter-api that will help us make our api calls. The second is environ that will help us keep our login information safe.

The environ plugin allows us to pass configuration information from environment settings or a profiles.clj file that can be ignored and not checked in. Let’s go ahead and add a profiles.clj file to the root of our project and put in all our twitter account info.

Hooray! We are almost there. We next need a way to run this status update on a periodic basis, having it post automatically for us.

Automating our tweets

To have this run from the command line in an automated fashion, we are going to do two things. The first is to use the Overtone at-at library for scheduling. And the other thing that we need to do is to add a main function to the generator.clj file and to setup up the project so that it can run with lein trampoline run.

So first, modify the project.clj file to have the at-at library, as well as the main function for the namespace.

Then, going back to the generator.clj file, first add the overtone/at-at library to the namespace. Then, define a pool for the scheduling process, and add in a -main function to tweet for us every 8 hours.

Started up
Are the best of food for me!
generated tweet is : With only a beautiful pea-green veil Tied with a flumpy sound.
char count is: 62
{... :text With only a beautiful pea-green veil Tied with a flumpy sound.}

At this point our program is complete. We could happily leave it running locally. It is much better though, to deploy it somewhere. http://heroku.com/ is a fantastic place for this. It provides free hosting and has nice Clojure support.

Deploying to Heroku

The first thing you will need to do is to create an account on Heroku. It is free of charge. You can create your login at https://signup.heroku.com/dc.

Once you have downloaded the tool, you will need to configure it with your username and password. You can do this at the command line by typing heroku login. You will be prompted for your email and password.

-> heroku login
Enter your Heroku credentials.
Email:
Password:

Now you are all set to configure your project.

If you haven’t initialized it yet as a git repo, do so with

git init

After that, we need to tell Heroku how to start up our app. We do this with a Procfile in the main project directory. Go ahead and add the file with the following contents.

worker: lein trampoline run

This will tell Heroku to run our program as a background worker, (rather than a web app), and start it up with lein trampoline run.

The next step is to create an app on Heroku for it. This will get Heroku ready to receive your code for deployment. Type heroku create into your command prompt at the root of the project. You will see.

It created a random application name for you, (which you can rename later through the console). It also added a repository called heroku to our git config. Once we push our code here, it will automatically deploy.

You will also need to setup your Twitter creditionals on the Heroku account so it will be able to talk to it. You can do this with heroku config.

You need to do a command line heroku config for each one of our configurations:

Carin Meier

Clojure

Author of Living Clojure, Engineer, Entrepreneur

Carin studied physics in college, and ended up as a software developer. Her passions led her to developing home automation and control libraries for drones. She helps lead the Cincinnati Functional Programmers and is a frequent conference speaker, keynoting OSCON and Strange Loop. To top it off Carin is the author of the upcoming book Living Clojure from O'Reilly.