Lots of "Query End" states in MySQL, all connections used in a matter of minutes

This morning I noticed that our MySQL server load was going sky high. Max should be 8 but it hit over 100 at one point. When I checked the process list I found loads of update queries (simple ones, incrementing a "hitcounter") that were in

query end

state. We couldn't kill them (well, we could, but they remained in the

killed

state indefinitely) and our site ground to a halt.

We had loads of problems restarting the service and had to forcibly kill some processes. When we did we were able to get MySQLd to come back up but the processes started to build up again immediately. As far as we're aware, no configuration had been changed at this point.

So, we changed

innodb_flush_log_at_trx_commit

from 2 to 1 (note that we need ACID compliance) in the hope that this would resolve the problem, and set the connections in PHP/PDO to be persistent. This seemed to work for an hour or so, and then the connections started to run out again.

Fortunately, I set a slave server up a couple of months ago and was able to promote it and it's taking up the slack for now, but I need to understand why this has happened and how to stop it, since the slave server is significantly underpowered compared to the master, so I need to switch back soon.

Has anyone any ideas? Could it be that something needs clearing out? I don't know what, maybe the binary logs or something? Any ideas at all? It's extremely important that we can get this server back as the master ASAP but frankly I have no idea where to look and everything I have tried so far has only resulted in a temporary fix.

I'll answer my own question here. I checked the partition sizes with a simple df command and there I could see that /var was 100% full. I found an archive that someone had left that was 10GB in size. Deleted that, started MySQL, ran a PURGE LOGS BEFORE '2012-10-01 00:00:00' query to clear out a load of space and reduced the /var/lib/mysql directory size from 346GB to 169GB. Changed back to master and everything is running great again.

From this I've learnt that our log files get VERY large, VERY quickly. So I'll be establishing a maintenance routine to not only keep the log files down, but also to alert me when we're nearing a full partition.

I hope that's some use to someone in the future who stumbles across this with the same problem. Check your drive space! :)