Making a Markov Chain Twitter Bot in Python

The study of Markov Chains is an interesting topic that has many applications. Such techniques can be used to model the progression of diseases, the weather, or even board games. We are going to introduce and motivate the concept mathematically, and then build a “Markov bot” for Twitter in Python. The resulting bot is available on GitHub.

Introduction to Markov Chains

A Markov Chain is used to model events whose outcome only depend on the current state. That is to say, what the system does next only depends on where it currently is. In many ways, a Markov Chain can describe how I play most strategic games… Anyways, if our state at time is represented by , then this is the same as writing
\begin{equation}
P(X_t = x | \text{Where we are + where we’ve been}) = P(X_t = x | \text{Where we are now}).
\end{equation}

This can of course be written more mathematically, but it gets the point across. When you have this situation, you can arrange these probabilities into a matrix. To construct such a matrix, the rows and columns will be all the possible states, and the entry is the probability of moving from state to state .

Let’s pretend our mood is governed by a Markov Chain each day. If we are happy, then there’s a 75% change for us to be happy the next day, other wise we’re sad. If we’re sad, there’s a 10% change we’ll become happy the next day, and 90% change we’ll stay the same. Putting this into a matrix we have

If we start happy, we can represent that as the row vector . If we want to know how we’ll feel tomorrow, we can find the probability distribution like this:

so we have a 75% change to still be happy, and a 25% otherwise. After another day…

Wait, the distribution is the same? We’ve stumbled onto what is called a limiting distribution. For special transition matrices, these limiting distributions exist, which mean no matter what state you start at, you’ll end up at the same distribution eventually. In this case, the interpretation is this person will be sad approximately 3/4 of the time.

Where does this distribution come from? It’s the eigenvector of associated with the eigenvalue scaled to sum to 1!

1
2
3
4
5

P_t=np.transpose(P)D,S=np.linalg.eig(P_t)largest_eigenvector=(S[:,1]*-1)# Make positivelargest_eigenvector/=np.sum(largest_eigenvector)# Scale to 1print(largest_eigenvector)

[ 0.28571429 0.71428571]

Limiting distributions are related to stationary distributions. It turns out that as long as your transition matrix P satisfies a few qualities, it’s largest eigenvalue will equal 1, the (left) eigenvector associated with 1 will have entries of the same sign, and all other eigenvalues will be strictly smaller than 1 in absolute value. The mathematical underpinning of this guarantee is a special case of the Perron-Frobenius Theorem. We call this left eigenvector the stationary distribution. Of course, it’s clear that for some left eigenvector associated with , that , but how does a stationary distribution (new ) come about? The stationary distribution is a vector such that for an arbitrary vector ,
\begin{equation}
\lim_{n\to\infty} v P^n = \pi.
\end{equation}

Now we will look at an example of this behavior with right eigenvalues to accustom ourselves to idea of limiting distributions. Let be a 2-by-2 matrix which has eigenvalues with and eigenvectors which span . This means any vector in can be written . Another way to put it is the eigenvectors form a basis for the space. This isn’t always true, but assume we have a matrix where this is true.

As , the term and the other term will be the only term left. So,
\begin{equation}
\lim_{n\to \infty} M^n x = \pi
\end{equation}
for all . This gives us an idea of how limiting distributions can come about in the wild, but this is not the whole story!

It’s interesting to note that if a transition matrix has a stationary distribution, it may not necessarily be a limiting distribution. However, if we have an idealized case like our matrix , then the stationary distribution is the limiting distribution.

Implicitly Building a Transition Matrix From Text

With the theory out of the way, let’s talk about how you can actually build such a matrix (implicitly). We’re going to use a phrase from “Smells Like Teen Spirit” from Nirvana to do an example.

Hello, hello, hello, how low?

In the case of text Markov Chain, a state is what word you’re currently at and the transition matrix is determined by what comes after that word in the whole “corpus” you are using (in this case, this line of the song).

Think about what the transition matrix of this sentence would look like when you ignore punctuation and capitalization. If you are at the word “hello”, there is 66% chance you’ll end up back the word “hello” and a 34% chance you’ll end up at “how”. If you are at “how”, you have a 100% chance to get to “low”. For text, it’s usually easy to actually represent this as a dictionary in Python to avoid calculating everything:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

corpus='hello hello hello how low'defbuild_transition_matrix(corpus):corpus=corpus.split(' ')transitions={}forkinrange(0,len(corpus)):word=corpus[k]ifk!=len(corpus)-1:# Deal with last wordnext_word=corpus[k+1]else:next_word=corpus[0]# To loop back to the beginningifwordnotintransitions:transitions[word]=[]transitions[word].append(next_word)returntransitionsprint(build_transition_matrix(corpus))

Sampling from the Chain

Let’s say we want to “sample” a sentence from corpus that is typical of the underlying distribution of the corpus (via it’s transition matrix). We first choose a starting state (word) and then we keep applying the transition matrix (via the dictionary) until we have our final sentence. We are really using a Markov Chain Monte Carlo technique in hopes to sample from the limiting distribution. So, if your corpus really has a limiting distribution, you would end up with realization of the chain, or a path, which samples each word from the limiting distribution, if it exists.

For this experiment we will use the full song without repeating any parts:

corpus='''
Load up on guns, bring your friends
It's fun to lose and to pretend
She's over bored and self assured
Oh no, I know a dirty word
Hello, hello, hello, how low?
Hello, hello, hello!
With the lights out, it's less dangerous
Here we are now, entertain us
I feel stupid and contagious
Here we are now, entertain us
A muchacho
An albino
A mosquito
My libido
Yeah, hey, yay
I'm worse at what I do best
And for this gift I feel blessed
Our little group has always been
And always will until the end
And I forget just why I taste
Oh yeah, I guess it makes me smile
I found it hard, it's hard to find
Oh well, whatever, never mind
'''defsample_sentence(corpus,sentence_length,burn_in=1000):corpus=corpussentence=[]transitions=build_transition_matrix(corpus)# Make a sentence that is 50 words long# We sample the sentence after running through the chain 1000 times to hope# to near a stationary distribution.current_word=np.random.choice(corpus.split(' '),size=1)[0]forkinrange(0,burn_in+sentence_length):# Sample from the lists with an equal chance for each entry# This chooses a word with the correct probability distribution in the transition matrixcurrent_word=np.random.choice(transitions[current_word],size=1)[0]ifk>=burn_in:sentence.append(current_word)return' '.join(sentence)print(sample_sentence(corpus,50,1000))

Normally, the end of these chains are fun to read in a nonsensical way. With a larger corpus, you begin to get better combinations.

1
2

longer_sample=sample_sentence(corpus,100,10000)print(longer_sample)

just why I taste
Oh yeah, I guess it hard, it's hard to find
Oh well, whatever, never mind
Load up on guns, bring your friends
It's fun to pretend
She's over bored and self assured
Oh no, I feel blessed
Our little group has always will until the end
And I guess it hard, it's less dangerous
Here we are now, entertain us
A muchacho
An albino
A mosquito
My libido
Yeah, hey, yay
I'm worse at what I taste
Oh yeah, I feel blessed
Our little group has always been
And always been
And always will until the end
And I guess it makes me smile
I found it makes me smile
I found it hard, it's less dangerous
Here

Leaving in the new lines makes it more humorous because you get two words that “go” together in the song. We can actually expand our state space to be pairs of words rather than single words to get better results. But who’s going to see our results? The answer… the world:

Assembling the Bot

Now, we have explored how to use a Markov Chain to generate gibberish (essentially), and now we’re going to unleash that gibberish on the world with an automated twitter bot. In a brief post, we piece together the components necessary to have a bot. The resulting bot is available on GitHub.

To build a bot, there’s really two steps:

Sample from a Markov Chain

Post to twitter

The bot will allow you to input a series of text files as a corpora and use all of them to create your tweets. Having multiple documents complicates things a little when building the transition matrix, but it’s not too bad.

The Basics of the Bot

First, I have a class called Bot. Bot not only stores the transition dictionary but it also can generate tweets and post to twitter via the TwitterAPI. This class’s init gives you all of the options and creates some basic instance level variables that will get filled out later.

The main loop of the bot is the following function. In an eternal loop, it samples a tweet, posts a tweet, and waits– before starting all over again for all eternity.

1
2
3
4
5
6
7
8
9
10

defrun(self):self._load_data()# Here it loads the corpora and converts them into a transition matrixwhileTrue:try:tweet=self._get_tweet()# Samplesself._twitter_api.tweet(tweet)# Posts to twitterexceptExceptionase:self._logger.error(e,exc_info=True)self._twitter_api.disconnect()time.sleep(self.sleep_timer)#Every 10 minutes

Cleaning Up the Output

In our last post, we would sample text but did nothing to clean it up to make it more readable. There’s a lot that can be done to clean it up and make it more sensical. First, you can leave in the punctuation to make it flow better on the page. Another trick for a larger corpus is to make your state two words rather than just a single word. This will make the output make more sense because the words will follow a more natural pattern.

Transition Matrix with Two States

In this next snippet of code, we have a corpora stored in self._documents. We follow the advice of the previous paragraph to create a transition matrix (dictionary) with two states. The code is slightly more complicated due to the special case of having one word in one corpus and one word in another, but it’s not so bad.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

def_load_data(self):next_key=Nonefordocinself._documents:withopen(doc,"r")asf:forlineinf.readlines():parsed,add=self._line_to_array(line)ifadd:ifnext_keyisNone:a=-2else:a=0forkinrange(0,(len(parsed)+a)):ifnext_keyisnotNone:key=next_keynext_key=(next_key[1],parsed[k])else:key=(parsed[k],parsed[k+1])self._add_to_corpus(parsed,key,k,next_key)# You can imagine what this function doesifk==len(parsed)-3andnext_keyisNone:next_key=(parsed[k+1],parsed[k+2])self.last_key=next_key

Generating the Text in a Nicer Way

This part is more of an art than a science. We apply a few heuristics to make the text flow better. As before we have a burn in section to get ourselves further into the state space. Then we make sure our tweet begins at the first word of a sentence and add proper capitalization.

def_generate_text(self,size=10000):size+=self._burn_in#For a burn in of 250 words.start_word=self._grab_random_two_words()text=['']*sizecap=[False]*sizecap[0]=Truetext[0]=start_word[0]text[1]=start_word[1]punc=set(st.punctuation)# Create Samplei=2whilei<size:ifany([Trueifkinpuncandk!=','elseFalseforkintext[i-1]]):cap[i]=Truekey=(text[i-2],text[i-1])ifkey==self.last_key:#Restart if last key is chosennew_key=self._grab_random_two_words()text[i]=new_key[0]ifi+1<size:text[i+1]=new_key[1]key=new_keyi+=2choice=np.random.choice(self._corpus[key])ifi<size:text[i]=choicei+=1#Capitalizeforkinrange(0,size):ifcap[k]:text[k]=text[k].capitalize()ifk==size-1:ifnotany([Trueifjinpuncandj!='\''elseFalseforjintext[k]]):text[k]=text[k]+'.'# Find the first period after the burn in sectionforfirst_periodinrange(self._burn_in,size):if'.'intext[first_period]:breakreturn' '.join(text[(first_period+1):]).strip()

Posting to Twitter

The Twitter API is a custom class that wraps the library tweepy. It takes authentication information and allows you to login, tweet and disconnect from twitter.