Do the math here. If you have 1500 servers, even if they're a svelte 10MB each, that's 15GB of RAM. If 190MB is a typical size for an apache process with your workload (php?), and you have 4GB of ram, ServerLimit should be no more than 21.
–
mattdmMar 8 '11 at 16:17

2

Also, I suspect that you don't have a memory leak, but rather a PHP script which is able to consume a lot of RAM. Apache processes will expand to the maximum memory used by PHP, and not go back down. So setting MaxRequestsPerChild ridiculously low will mitigate this, but the solution is to either a) reign in your PHP app's memory consumption or b) switch to a different server architecture (using fastcgi for php).
–
mattdmMar 8 '11 at 16:22

1 Answer
1

All of them. ping working just means enough of the IP stack is up to process ICMP Echo requests (that's not a huge portion of the system compared to what's required for SSH and web servers). You could have had what I call a "partial panic" (Kernel blew up, but the IP code kept running), run out of RAM, or your SSH/HTTPd processes could have fallen over for unspecified reasons.

/var/log/messages is probably a good starting point, as is the log for your web server (presumably Apache). If nothing else it will give you an idea of when the system last worked and how long it was in the brain-dead state before it got rebooted...

Update based on comment

Sounds like something has a memory leak.
When you ran out of swap userland blew up but the kernel (being wired in RAM) could keep running & answering ping requests.

For a permanent resolution you should monitor your swap utilization carefully and when you notice it trending dangerously upward (>33% used is my threshold) hunt down the process with the most swap used: That's probably your culprit.

thanks for the reply, in that log i found Mar 8 15:40:20 ns354729 kernel: Free swap = 0kB before the down I could see (in control panel of my hosting company) the values of CPU, RAM and SWAP where totally out of control with 5% of free ram and only 1-2% of SWAP.
–
dynamicMar 8 '11 at 15:20

But now the problem is: How can I know what caused all this problem
–
dynamicMar 8 '11 at 15:23

1

@yes123 - See the update to the answer :-) You could also have an undersized box (too little RAM/too little swap for what you're trying to do), but a memory leak is what I'd investigate first.
–
voretaq7♦Mar 8 '11 at 15:25

@voretaq: i added some information. As you can see my swap grew in few hours by +70%
–
dynamicMar 8 '11 at 15:34

@yes123 that's definitely a memory leak (or a bunch of processes being kicked off and chewing through your swap). To find the cause you have to watch it happening. Also note that "top memory users" can be a somewhat misleading measurement (you want "top swappers" -- If you run top it's the guys with the biggest SWAP column -- reorder top's output to see that).
–
voretaq7♦Mar 8 '11 at 15:41