Blogroll

Latest tweets @Corunet

Twitter alerts: using twitter streaming API for fun and profit

23Oct 2009

Twitter is a wonderful service, but, until now, you have to subscribe to some websites to be alerted when a selected word (maybe your trademark) is tweeted. We’ll try to develop a service that filters the tweeter api, stores the interesting ones in our database, and show them in the browser in real time.

What?

We’ll take twitter real time results for a given word (or words) and visualize them in a browser window, like monitter.com, but on our own servers and a bit more automatic. This is going to be useful to get our own alerts and work with them.

How?

We are going to use the twitter stream API to get the comments containing a given word. Since the results are given in Json format, we’ll need to filter the data, take the interesting bits and store them. From the other side, we’re going to write a bit of javascript to read the data from the server into a web page using ajax. We could use other technologies, like comet, to push the data to the browser, but, while it would be a much cleaner implementation, I think I’m not ready to write about that yet (check this space ;)). We need to decouple the reading of the stream from twitter and the serving to our clients because we can only have a single active instance of the twitter streaming api at a given time, and we’re going to leave the listening proccess on for a long time using a daemon/service approach. On top of that, we want to store the tweets in a database for later perusal and data mining.

The tools

We are trying to make a useful system. To do this, we’ll need some tools:

Server side programming language: This time, PHP. While probably not as elegant or fast as Python or as cool as Ruby, gets the work done and it’s available in almost all web hosting plans. And the documentation is pretty extensive.

A database to store all the info

Client side programming: We are going to use HTML, javascript and the jQuery javascript library to simplify AJA(X) programming and for the effects.

A twitter acount: Yours own is OK but maybe you want to create a new one for this kind of tasks. Write down your user and password, we are going to need them soon.

The twitter API. Twitter people are so kind they have developed a restful API free for everyone to use. And it’s sub-zero cool.

The Twitter streaming API

Twitter has published a Streaming API that’s described as “The Twitter Streaming API allows near-realtime access to various subsets of Twitter public statuses”. In fact, this is jus what we need. You can read all the documentation at https://twitterapi.pbworks.com/Streaming-API-Documentation, but I’ll try to take the interesting parts for this project so you don’t need to yet.

We are going to use just one method (status/filter) to get results including one or several words. This method can return a stream of data in xml or json formats, has to be called using POST and can get some parameters. You can use it from the command line if you have access to some kind of unix in the following way:

Until you exit it (with CTRL+C) . This is a Json stream and can be read and parsed by several means. In fact, it’s eval-uable javascript code that we could read from the browser. But right now, we’re going to use a server-side language to read it and work with the interesting parts.

Reading Twitter stream

As I told you in the tools section, we are going to use PHP as our server side language. For the first part, the reading of the twitter stream, we don’t even need a web server, since we can run it from the command line. And if we run it from the command line, we can convert it to a kind of daemon/service and leave it on for a long time. But first, some code:

We create an $opts array (in fact an array of arrays) that contain the parameters. In this particular case, we’re using two, the POST method and the search line (track=google). Then, we can treat the twitter stream as a file, using stream_context_create and fopen and just start reading lines. Each line is going to be a JSON encoded tweet, similar to what we’ve seen when we called the API from command line. Since we want to use the contents as easily as possible, we’ll need the json_decode function to parse it into PHP objects, print them and call flush just in case we’re calling the script from a browser.

Storing the results

The best way to store the results for later perusal is a database. I’m using MySQL but any other database should be OK. To be able to store the data, we need to create a database with a single table.

We are only going to store the following data:

Text: This is the twetter status. 140 chars max.

User screen name: The screen name of the poster. This is needed to create the link to twitter

Id: A unique id for the tweet. It’s a sequential number, so, we can order the tweets acording to this, and use it along with the user screen name to built a link back to twitter, and use it as our primary key.

Followers count: The number of people that are going to receive the tweet in their inboxes. We are using it to style the real-time viewer. Since my primal intention is to watch a trademark, I care about the number of people that are watching the messages.

The time of the tweet: basically for filtering purposes. We are going to store our server time to avoid lengthy conversions.

We could store several other fields, and a complete solution should probably take into account that you can have some different tweet types, but for the time, these four fields should suffice.

To create the table, we can run the following SQL script from the server:

It’s as ugly as sin but it works. If you run it from the command line, it should start storing tweets in your database and keep on until you stop it. So, our database is starting to fill with tweets concerning our desired word. Now we need to be able to navigate them…

Creating the code from the server side

Now we need to publish the tweets in our browser. To do that, we need a small PHP script that returns the tweets when called. If we call it with a parameter start it’ll return all the tweets with an id bigger than that. Otherwise, it’ll return the last ten tweets stored in our database. To do that, we will use two different queries, the first one to return the last ten results, and the second one to return all results since the given id. We use subqueries (SELECT from SELECT) to get the results in our wished order.

We are going to poll this code every ten seconds and refresh the tweets list to show the most recent ones, using javascript, and the output format will be JSON.

Writing a front-end

Since, as Larry Wall said, one of the cardinal virtues of a programmer is lazyness, we are going to use jQuery to construct the interface and the business logic. And we’re not even serving it, but linking from the Google CDN, as Dave Ward posted in his wondeful blog.

Let’s see… This is probably the most complex part of the article, so, I’ll try to go slow and explain every function:

getTweets(id)

This function calls the server using the getJSON jQuery method. Then, it takes each response line and calls addNew with it. If we call it with an id parameter, it’ll ask the server for all tweets with ids greater than that. Otherwise, it will grab the last ten tweets.

addNew(item)

It takes an item (a tweet) as input. It hides the first tweet, remove it’s ‘tweet’ class, appends a new tweet at the bottom and shows it. It calls the renderTweet function to get the tweet in HTML format.

renderTweet(item)

Just one line to call getImportanceColor() and a return with the HTML code. It’s a bit long but that’s because we’re adding a couple of links to the tweet to be able to visit the original one.

getImportanceColor(number)

It takes a number of followers and returns a rgb color that will be between total black, for people without followers, and total red, for Ashton Kutchner. It uses logarithms to scale between the two extremes, because there are 6 orders of magnitude between the extremes. We will use it to paint (it) black the twitters with few followers and red the twitter stars.

poll()

This is the timeout function that calls itself every 200ms and gets the new tweets.

The last block just starts the polling as soon as the document is loaded.

The Result

This is a small screen capture of my browser visiting the HTML/javascript page while running storing_tweets_in_the_database.php. It’s watching the word ‘twitter’ and, as you can see, it’s running too fast for the human eye -at least mine -, but since we are keeping all the data in our database, it’s not lost forever

Limits

Right now, because of the Twitter API limits, just one instance of the watching process can be run at once. Anyhow, you can write several words, separated by commas, and Twitter will return results for all of them.This code should not be used in production, since there are almost no security checks to avoid missuse. If you want to use it in a machine open to the public, you should check -twice- every input for missbehaviour.

Further work

Obviously, this is just a sample. It can be made much better looking, and we could even analyze the tweets and tweet back a response to any questions concerning our keywords. The watch module should be daemonized or converted to a service to be left unatended. The HTML page could be able to filter between two dates and so on. Keep on watching. We’ll try to keep on posting this kind of contents.

Shameless plug

I’m part of Corunet, a web agency in Spain, that can deliver consistent good results in all kind of internet projects. You can visit our website http://coru.net/ or contact me at david@corunet.com if you have any special needs

This is the best tutorial on the Twitter Streaming API for PHP out there. Awesome job. I was a little disappointed to get to the end, have working code, and then find out that you don’t think this is ready for production. What do you think it would take to get his code ready for production? What are the issues? Have you looked at phirehose before?

I am trying to use the twitter streaming API for a personal project. If you think you could help me for a reasonable amount of compensation please contact me.

Hi Jed,
I don’t think it’s ready for production since it doesn’t take care of disconnections/reconnections, neither can update the stream for new filter words. Anyway, I’ve already used it with minor modifications for some customers. I’ve been trying phirehose lately and looks great, but can’t vouch for it yet.
If you want me to help you with a project, drop me a line to david@corunet.com with your idea and I will try to send you a budget.

Hi there, great article, i refer to your code a lot. Now im trying to figure out one thing, instead of the new tweet appears at the bottom and the top vanishes, how do i reverse that? mening to say newer tweets will show at top and the bottom one vanishes like tweetdec style.

I tried renaming first to last last to first but doesnt seem to work. Is it more complicated to start from top of issit very simple that i couldnt see it yet.

If u can kindly guide on how to reverse the tweets would be great. Thanks!

“Text: This is the twetter status. 140 chars max”
Thats just what the user is allowed to enter. If you take a look at the message, it contains of a lot more characters. That’s because links etc. are send as html.
I’m not at home so I cant give you the max number of what I have counted so far

Great tutorial! I have one question: in logic.js, when calling renderTweet(), you pass the function a second argument, ‘hidden’. I see what it achieves by inspecting the DOM in Firebug, but how does this take effect in your code? it’s a neat trick, but could you explain it? Thanks!

Rico: as the dev.twitter.com URL says, you just have to change the ‘http’ by ‘https’.
If it ain’t working yet (as happened to me yesterday on this windows machine), then its because of configuration/setup.
Check that the: php_openssl extension is loaded and that you set allow_url_fopen = On in your php.ini
That should do it.
Good luck.

“You may export or extract non-programmatic, GUI-driven Twitter Content as a PDF or spreadsheet by using “save as” or similar functionality. Exporting Twitter Content to a datastore as a service or other cloud based service, however, is not permitted”

Seems to me that a SQL db would constitute a “datastore”, and that this datastore is used to drive a “service”.

Pzelnip, you’re right. Seems that lately tweeter frowns upon storing the tweets in a database, at least as as permanent store. Anyway, when I first contacted them about two years ago, they told me that the technique described in the article was ok, or at least no reason for a ban. So, I guess that the spirit of the norm is that they don’t want people making tools to analyze past tweets using streaming API.
Thank you so much for your comment.

hi! I tried to run your code but I’m getting this error:
Warning: fopen(http://…@stream.twitter.com/1/statuses/filter.json) [function.fopen]: failed to open stream: No connection could be made because the target machine actively refused it. in E:\wamp\www\twitter\app\demo\twitter_watch\watch.php on line 17

So I’ve got everything working great, only problem is I can’t figure out how to stop to the watch/stream. I had to actually change the configuration files to a invalid username and password, then kill the process on the mySQL server.

Thanks for a great article. Using it for a little pet project just for fun. One question. I’m noticing some oddities with what is returned when I search for keyword x but when I search for the same keyword on twitter i get different results. Some are the same but not all.

Is there an easy to push the output from watch.php to the screen just to debug. Like I said everything works, just with some oddities.

Thanks for this great article. Sadly, i couldn’t get the code to work, is it still possible to use this method considering the recent changes in the Streaming API (oauth …)? If so, what should be changed in order for it to work ?