* olivier Thereaux <ot@w3.org> [2007-08-01 18:35+0900]
>Hi all,
>
>Just FYI, we have switch mod_perl2 off on all validator servers again.
>Running under mod-perl seemed like a good idea, but there were some
>issues we're having trouble explaining, like how the load remained
>really, really high on the machines (apache2 processes using up a lot
>of CPU) even when very few requests were being received.
>
>Switcing off mod_perl means more resource forks, and a slightly
>slower validation process, but ultimately, less load and less wait,
>it seems. Go figure.
mod_perl may have been a red herring; I was blaming it for the
massive apache2 process sizes (200 MB+ on jesssica) because I
thought that was something you guys had changed recently (though
I don't even know if that's true) and I had never seen apache
processes that big.
Also, 'check' is so expensive that I didn't really expect
mod_perl to be a huge win; the bit of extra work to fire up a
perl interpreter must be relatively cheap.
But even after we pruned that and other stuff last night the
apache2 process sizes are 120 MB, and the 'check' processes are
80-90 MB, so maybe we would be OK with mod_perl after all.
>I still want to try some of the performance tweaks you suggested,
>Ville [1] (avoiding copying content, undef-ing after use, etc).
>Gerald also was suggesting looking at e.g BSD::Resource or any ulimit-
>like system, to avoid having some "check" processes spin away and hog
>CPU. Worth a shot.
I think resource limits would help a lot. After our changes last
night all the validator servers all seem pretty happy:
25 requests currently being processed, 103 idle workers -- jessica
18 requests currently being processed, 12 idle workers -- fugu
16 requests currently being processed, 10 idle workers -- lovejoy
The biggest problem now seems to be a few URIs that consistently
eat up many minutes of CPU time; currently on jessica there are
several processes that have consumed 20+ cpu minutes each:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31664 www-data 25 0 88632 36m 10m R 29 0.9 22:14.48 check
32073 www-data 25 0 97820 36m 10m R 29 0.9 21:09.62 check
31862 www-data 25 0 131m 72m 10m R 27 1.8 21:26.18 check
31732 www-data 25 0 97824 36m 10m R 24 0.9 22:10.81 check
It might be useful if 'check' told us what it was doing by adding
lines like this throughout the code:
$0 = "check: fetching $uri";
...
$0 = "check: sgml::parsing $uri";
...
$0 = "check: returning results for $uri";
so we can see what each process is up to in the output of 'ps'
In the meantime you can see which URIs are responsible for these
long-running check processes with:
cat /proc/31664/environ | tr '\000' '\012' | grep QUERY_STRING
(I would paste a few samples here but I think that would violate
our privacy policy)
>[1] http://lists.w3.org/Archives/Public/public-qa-dev/2007Jul/0022.html
--
Gerald Oskoboiny http://www.w3.org/People/Gerald/
World Wide Web Consortium (W3C) http://www.w3.org/
tel:+1-604-906-1232 mailto:gerald@w3.org