The project is a game, so a typical user would visit 100+ pages.
When the server is busiest, it gets 35-40k requests/hour.

For some misterious reason after a number of hours the whole thing
starts
moving slower, typically the server load goes up to 5-8 and I know that
I
have to either start killing dispatch.fcgi processes, or simply restart
the
whole thing.
It is definitely not the fact that the server cannot deal the number of
requests. It appears that some of the dispatch.fcgi processes simply
bring
the server to a semihalt. Killing the culprit makes the load go under 1%
and
the game itself several times faster. The problem is that I never know
which
one is the one causing the problems.
I have attempted to find and fix memory leaks, I have removed rmagick
from
file_column since it was said that rmagick was causing leaks;
I have removed the unnecessary services, I am keeping the lighttpd
configuration to a minimum, yet, I pretty much have to restart the
server
daily.
Are there any special tricks that have to be done to have the
dispatchers
behave? And maybe to use less RAM?
Any suggestions are welcome.

I would go through all your code and make sure there are no
possibilities that an infinite loop occurs. Every time I’ve had an
app go fine for a while and then suddenly start crawling, it’s because
some infinite loop that I didn’t notice occurred. In a game I’m sure
there are lots of possibilities for things like this to happen.

my apps on either 1.4.8 or 1.4.9. 1.4.9 has been working great for me
on debian specifically. And I got something similar with 1.4.10 where
i got zombied fcgi’s. So try downgrading to 1.4.9 or 1.4.8 and see if
that solves your problem. And are you running your fcgi’s with unix
sockets or over IP:PORTNUM? You might want to run the fcgi’s each on
their own consecutive port numbers as standalone spawn-fcgi’s and let
lighty just load balance between them. This way you can reap them
easily with script/process/reaper. Or you can grep through the ps
awxx | grep dispatch.fcgi results and see which ones are zombied and
kill them and respawn. You could do this in a script.

I can’t really blame 1.4.10 for the troubles. I’ve upgraded to 1.4.10
only
several days ago, to have a bug fixed.
I’m having a basic lighttpd.conf which uses unix sockets.
I will look for some documentation/blogs regarding spawning fcgi’s on
different ports.
If you have a configuration file that I could look upon, it would be
great.

The project is a game, so a typical user would visit 100+ pages.
When the server is busiest, it gets 35-40k requests/hour.

You’re using caching, right? Judging from your process run times
Rails doesn’t see most of those requests. Your processes should
accumulate several minutes of CPU time if you’re serving nearly a
million requests per day.

For some misterious reason after a number of hours the whole thing
starts moving slower, typically the server load goes up to 5-8 and
I know that I have to either start killing dispatch.fcgi processes,
or simply restart the whole thing.

From your process sizes you’re probably spending most of your time
swapping.

Judging from your process times I doubt you need seven fastcgi
processes. It looks like you sent this mail nine hours (at 22:32)
after starting these processes and they’ve each accumulated less than
three minutes of CPU time. Try running just four.

How big is your app when you start it? 130MB to 180MB virtual is
alarmingly large.

The performance has been linear during the day. I will know more in a
couple
of hours or tomorrow, but it seems that although I haven’t found any
‘omygod
what a stupid endless loop’ in the code, changing the configuration
helped
more than I could have anticipated.

Now, maybe someone more experienced could try to explain why the
standard
lighttpd configuration was so bad in my case.

the min-procs/max-procs. But ever since lighty 1.3.x somtime the
dynamic spawning has been removed from lighty. So the min-procs
desn’t have any effect at all and lighty will always spawn what you
set max-procs to. But for some unknown reason, the way this works can
get a little weird under heavier load with more fcgi’s. The load
balancing between fcgi’s doesn’t seem to work as well with the min/
max-procs directives and sockets. So like you I have had much better
luck with explicitely listing all fcgi listeners in lighty and using
spawn-fcgi to load the fcgi listeners stand alone.

I have also had really good luck with using IP:PORTNUM listeners for

the fcgi’s instead of sockets. It seems to me that lighty has an
easier time load balancing between listeners when it doesn’t have to
think about it as much and the fcgi’s are each listed explicitely.

I'm glad its running for you. 200-250 MB of ram for each fcgi seems

a bit excessive. My fcgi’s are usually between 25-80MB ram each. But
you are running a game so maybe they are each doing more work and
holding more in memory then I am.

You’re using caching, right? Judging from your process run times
Rails doesn’t see most of those requests. Your processes should
accumulate several minutes of CPU time if you’re serving nearly a
million requests per day.

The content is dynamic with no static pages. The ‘ps’ was about 1 hour
after
killing several processes.
This is how it looks after 15 hours:
(note that in the past 4-5 hours, the server was more or less idle)

Judging from your process times I doubt you need seven fastcgi
processes. It looks like you sent this mail nine hours (at 22:32)
after starting these processes and they’ve each accumulated less than
three minutes of CPU time. Try running just four.

How big is your app when you start it? 130MB to 180MB virtual is
alarmingly large.

Lighttpd was started 2-3 hours before I took the ps. It was not a rush
hour.
Right after starting lighttpd and idle dispatch.fcgi takes 53-63MB of
RAM
The one or two that are active will quickly jump to 131MB.
After that in a matter of hours all of them will jump to 200-220MB.

In the meantime I’ve lowered the number of dispatchers to 5 (though it
seems
the dispatchers are simply attempting to steal as much RAM a possible)
and
compiled ruby on the server.
I am also going to try to spawn fcgi’s as separate processes and see how
it
goes.
Bogdan

It’s been 10 hours since I’ve started lighttpd with the new
configuration.
The top dispatch.fcgi uses now 100MB. The other 4 are at 88-90MB.
Plus, no lag at all and the fifth dispatcher barely gets used.
Also the system load is under 1%.
There is something magical in this configuration

It’s been 10 hours since I’ve started lighttpd with the new
configuration.
The top dispatch.fcgi uses now 100MB. The other 4 are at 88-90MB.
Plus, no lag at all and the fifth dispatcher barely gets used.
Also the system load is under 1%.
There is something magical in this configuration