Twitter Tools bug (slow)

UPDATE: Twitter tools 2.1 has been released, which fixes the issue described below. You may still need to run the code below to remove all the duplicates. Not sure if TT 2.1 does it automatically.

UPDATE 2: Looks like he did this by making the tw_id index “unique”. There may be other code changes elsewhere, but…

UPDATE 2b: Okay, so the “fix” is retarded. Basically, all they did was make the tw_id index “unique”. So, when Twitter Tools goes to add pre-existing tweets, the apache error log fills with errors about the index entry already existing. Rather than writing good code that just checks to see if the tw_id already exists, it just blindly tries to add it and fails when MySQL says “nope, already exists”. Heck, it COULD just only try to grab tweets posted since the last one downloaded. Either of these would be better than what it is doing. *sigh* If the code was better commented, I’d try to fix it myself… but my eyes cross when I look at it. =/

Quick post about an annoying bug with Twitter Tools.

There seems to be a bug with Twitter Tools 2.0 for Wordpress, and unfortunately, the creator seemed rather un-interested in fixing it (instead just directing me to his pay-for site for Wordpress support. =/).

Anyway, so, Twitter Tools, if you don’t know, keeps a database table of all the tweets you’ve written. It seems to check Twitter for posts each time it’s loaded, and if there’s a new tweet, it downloads it and adds it to the table.

Problem is, at some point, something changed (with Twitter I’m guessing), that made Twitter Tools re-grab ALL the tweets written by said user that are available on twitter’s site. In my case, that means it grabs 100+ tweets each time I tweet something and the page is loaded. So, the table in question quickly grows to a pretty large number of rows (I have about 475 tweets since I installed Twitter Tools, so the table should only be 475 rows. Instead, it had grown to 85,000 rows as of a few days ago).

Given the query used by Twitter Tools to show the latest tweets, looking at that many records can take a while. In my case, given my server, it took about 20 seconds (which slowed down my site by that much).

The query looks like this (this is assuming standard wordpress prefix, etc):

SELECT * FROM wp_ak_twitter WHERE tw_text NOT LIKE 'New blog post%' AND tw_text NOT LIKE '@%' GROUP BY tw_id ORDER BY tw_created_at DESC LIMIT 5;

Now, that’s a pretty slow query to run on 85k rows, but, not easily sped up. Easiest speed up, I guess, would be to add an index of the first 20 characters of “tw_text”, but that doesn’t really solve any problems, just speeds up the query. Adding an index for “tw_created_at” would help a bit too… but still, it’s 85k records, MANY (the vast majority) of which are duplicates.

First, are you impacted by this issue? Try this from the MySQL command line:

SELECT DISTINCT tw_id AS id,count(tw_id) AS idCount FROM wp_ak_twitter GROUP BY tw_id HAVING idCount > 1;

This should show you how many duplicates you have. If it comes back with nothing, then you’re good. But, I’d imagine you’ll be shocked by how many duplicates you have. Some of my tweets had around 500 duplicates.

So obviously, the first “fix” is to remove the duplicates. With some googling, I figured out the query looks like:

DELETE FROM wp_ak_twitter USING wp_ak_twitter,wp_ak_twitter AS vtable WHERE vtable.id > wp_ak_twitter.id AND vtable.tw_id = wp_ak_twitter.tw_id;

This basically creates a temporary table with the contents of wp_ak_twitter, then compares the row ID (which is unique, and a Primary Key), and then checks to see if the tweet ID is the same as any other rows. The first “where” basically keeps a row from matching itself. So, this will remove all duplicates and leave one original. After the above command, you’ll want to do:

OPTIMIZE TABLE wp_ak_twitter;

and

FLUSH TABLE wp_ak_twitter;

This will reduce the actual size of the table to what’s used by the remaining rows. Without this, the table will remain the same size as it was with all the duplicates.

But, this doesn’t really “fix” anything. It just removes the duplicates. The actual fix I haven’t quite finished yet. I’ll try to update this post in the next few days with a fix. But basically, it’s going to involve inserting a “SELECT” into the twitter tools code that checks for a tw_id before inserting the tweet.

Until then… good luck. Hopefully this will speed up your website if it’s been slow and you have twitter tools installed.