httpd-dev mailing list archives

Ruediger Pluem wrote:
> What information do your cookies contain? Are these session cookies that
> are individual to each client? In this case the usage of mod_disk_cache
> with Vary Cookies set would be bad. As these responses would be individual
> you couldn't reuse the results anyway for other clients, so it would be
> the best to leave caching to the individual client caches (e.g. browser caches).
> If your cookies are like BACKGROUND=blue for some users and BACKGROUND=red
> for other users you should think of incorporating these differences into
> the URL's instead of into varying responses.
I use two cookies currently - one for user logins and one for options.
They are independent - people browsing the site may have either, or
both, or neither set.
I need to cache all dynamically generated content so that the server can
cope with slashdottings and links from other popular sites where lots of
people all click on the same link at the same time ("click storms").
Such links could go to any page on the site, and so I really need to
cache almost everything from mod_perl - with the exception of areas of
the site which are obviously user-specific, such as edit forms, users'
personal pages and so on. Those are no-cache.
I am very careful about setting expiration times, since with it being a
dynamic site and all, you don't want too many stale pages. So many of
the indexes (e.g. list of latest journal updates) have an expiration of
only 1-3 minutes, while other journal pages have expiration of 12 hours
or more.
I keep a 'version' field as part of the database records for most
content on the site, which is incremented whenever an object is edited.
Then when someone edits a journal, I include a special 'v=xxx' parameter
in subsequent links to pages on that journal, to differentiate it from
earlier versions. So the links from the (fast expiring) index pages such
as forums or journals index will quickly have the new link with the new
version. This allows me to have extensively cached content while still
having people see new edits quickly. Thus the cache is fairly high turnover.
The mod_disk_cache works very well, the only issue being keeping the
cache size under control without making iowait become noticable as a
result. I have been finding that keeping the limit down to 100M rather
than 1000M, and making DirCacheLevels 2 rather than 3, and clearing out
the orphaned .header files, and running htcacheclean and my header
pruning script every 10 minutes, seems to make the server very
comfortable - the iowait goes away to unnoticeable levels.
All the app level code here was developed by me. This is a community
website for bicycle touring journals - www.crazyguyonabike.com. It
currently sees somewhere north of 100,000 page requests per day,
according to analog (and that's not including googlebot, which is on
there constantly). I am very interested in configuring the site to be
able to run efficiently on one reasonably well-spec'd server. Caching
dynamic content is a major part of being able to scale well to cope with
click storms.
> Regarding the performance you should take a look at the following:
>
> 1. Use a separate filesystem for the cache.
> 2. Ensure that it is mounted with noatime option.
> 3. Check if you are using the right type of filesystem for this job. If the
> size of the individual cache files is rather small reiserfs can be much
> faster then ext3 if I remember correctly.
I currently use ext2 with noatime for the main filesystem (including
cache). I went to ext2 from ext3 because ext3 has extra overhead related
to keeping the journal (I believe that is the big difference between the
two these days). Though I do not have numbers, I do seem to have seen
disk performance increase since going back to ext2. I'm not sure if you
can install dir_index with ext2 without turning it into ext3 in the
process, but in any case I don't have dir_index enabled currently.
I was aware of the potential for using other filesystems for the cache,
and had thought about reiserfs as a possibility. However after I wrote
to the httpd users list a few weeks back asking about this very issue, I
got zero responses. I then went to the squid group and asked there too,
and similarly got zero useful responses. I agree that reiserfs might
handle many small files better, but I am wary of using that since the
trial of Hans Reiser - it kind of calls the future of his tool into
question, unfortunately.
>> 2. Why does htcacheclean not keep the cache at the stated size limit? If
>> you say -l100M and then do a du and it says 200M, then that is
>> counter-intuitive, and actually wrong in real terms. It gets worse with
>> the larger caches - when I had 3 levels and cookie Vary headers on, the
>> limit for htcacheclean was 1000M, but the cache would grow to 3GB and up.
>
> Again, this is an issue with the documentation. In fact htcacheclean does
> not limit the size of the cache at all. It can grow indefinitely.
> It only ensures that the size of the cache is being reduced back at least
> to the given limit after it ran. The size of the cache is defined as the
> sum of all filesizes in the cache. It does not consider the disk usage of
> these files which can be larger and it also doesn't take the sizes of the
> directories into account. I am not sure if a du like measurement of the
> cache size would be implementable in a platform independent way, but I
> may be wrong here.
Ok, that's fine. You're right, it sounds like a documentation issue.
> This seems to be a bug. Can you please try if the following patch fixes this?
I applied the patch and rebuilt httpd_proxy successfully. The new
htcacheclean runs ok, but still seems to leave behind the orphan .header
files. At least, I tried running htcacheclean in single run mode, thus:
htcacheclean -t -p/var/cache/www -l100M
Then I run my prune_cache_headers perl script, and it seems to still
find a bunch of orphaned .header files to delete. So it doesn't appear
to have fixed the issue. I did confirm that the patch was applied.
>> 4. Will I be causing any potential problems for Apache by my deleting
>> the leftover .header files myself (ones which have no corresponding
>> .vary subdir)? Does that cause apache or htcacheclean to have potential
>> issues if you do this while they are running? If they are junk then I
>> can't see it being a problem, but it's unclear currently if they are
>> actually used or not.
>
> IMHO not. The patch above does the same.
Great, thanks - good to know.
Thanks for your help!
Neil