4 Answers
4

So you actually are using a lot of CPU. Either get a better server or make your forums become less popular. You also seem to be sending quite a bit of mail... is your forum hacked and is somebody using it as a spam source? Check your mail logs...

You could try switching from mod_php to mod_fcgid before trying a better server.
–
xoferJun 5 '13 at 21:38

Thanks it appeared something was definitely going on with the mailserver. I'm still not sure how to fix it but in the meantime I ran service sendmail stop and my server load gradually reduced to < 2.0 and forums now work.
–
EuskadiJun 5 '13 at 22:32

[Update: answer was posted before the full top output was added. While the answer is still correct, it no longer applies to the situation]

Load is not CPU usage, load is amount of processes in the run queue. Usually a high load with low CPU usage indicates an I/O problem, like sluggish or hanging I/O. I once had a load of over 9000 on a mailserver where the storage went for a walk. Hardly any CPU usage, and ssh perfectly responsive, it just didn't like being a mailserver anymore.

I see, so how can I go about addressing the I/O problem?
–
EuskadiJun 5 '13 at 21:02

1

Find out what's doing the I/O (iotop can help) and make it stop doing that :)
–
Dennis KaarsemakerJun 5 '13 at 21:03

Or - if that is not possible / Feasible then simply add more IO capacity. SOME things just need a whole rack of SSD to get the IO budget. Especially large databases need tons of RAM, tons of IO. Email servers similar (every Email goes to disc).
–
TomTomJun 5 '13 at 21:26

Not every email goes to disc. Some of our high-volume newsletter servers have the smtp spool on ramdisk and we don't care if we lose mails that is in transit if it crashes. That's quite unusual though.
–
Dennis KaarsemakerJun 5 '13 at 21:28

You have both an high CPU (idle time 3.1%, nice time 0%) and probably a high disk load (try looking at vmstat output, check for some out-of-scale number in the block-in/block-out queues or some high value on the wait time, which means if I'm not wrong the time spent waiting for some I/O to complete).

On a not loaded system you'll have the wait-time close to 0% and small values for the read/written blocks.

I experienced similar troubles with a site, where mysql was using a lot of disk and memory, while php/apache were mostly CPU-bound... The solution was to split it in two: the www front-end on a machine, the mysql back-end on another. Things went smoother then..

Anyway try to better understand what is causing your load - maybe your sendmail is part of the problem, I see a lot of such processes in the "D" state (waiting for device - that is, disk bound). First of all ensure it is working for you and not for others (relaying spammers' mail or such...)

You should just install postfix. Your mail server is probably acting as an open relay due to a configuration. Postfix defaults mitigate those problems and is probably faster than re-configuring sendmail -

Issue sendmail -bp to get a list of messages in the sendmail queue. If you have a lot of messages in /var/spool/mqueue that are not going away you could just change into that directory and rm *. If someone is sending a message at that moment and it doesn't get removed by sendmail before you do however, it will be lost. Since there is no sendmail switch to flush the queue, you may have to do that. There are other methods as well that you can find in other threads.