If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

How often does Apache access each file under load?

I'm in the process of configuring a WordPress site for a multi-machine (i.e., clustered) deployment. I the interest of replicating files across all of the LAMP servers in the cluster, I'm considering using an NFS share to contain the web root and have all machines access it. Assuming that's a good idea (and please tell me if you think it's not), then it would stand to reason that scalability would probably be bottlenecked by the ability of the NFS share to handle requests from many machines. Imagine 100 machines slamming one NFS share.

With that in mind and the fact that I wanted to know which files to replicate, I went hunting around and found a remarkable command that will tell you when file system changes of various types happen in a particular directory:

I am not certain, but I believe that particular command will tell me not just when files are moved in/into/out of this directory, but also when any file is accessed. This brings me to my main question.

When Apache is serving a PHP website, how often does it access a particular file? The reason I ask is because I ran that command on my web root and accessed the site I'm working with. The *only* file that showed any access was .htaccess. I find this completely baffling. How would apache know if a PHP source file or js or jpeg or css file had changed without accessing it? Why on earth does apache access .htaccess -- and .htaccess only -- for a given request but never accesses the requested file itself? Here's the output from one page request:

Just as a random suggestion: Apache records the last-modified time when it does access (and cache) a file - and only feels it needs to read the file if the last-modified time changes. (In other words, inotifywait doesn't count merely querying a file's metadata (such as last-modified time, or permissions, or filename) as actually accessing the file.)

Just as a random suggestion: Apache records the last-modified time when it does access (and cache) a file - and only feels it needs to read the file if the last-modified time changes. (In other words, inotifywait doesn't count merely querying a file's metadata (such as last-modified time, or permissions, or filename) as actually accessing the file.)

I have run the inotifywait command and then restarted apache and this restart does not result in Apache accessing any files in the web root. I am puzzled that checking a file's modification date does not constitute "access." I'm more puzzled that an apache restart didn't result in the web root being accessed.

I tried rebooting my machine entirely, running the inotifywait command, and accessing the site. This also did not result in access to anything but the .htaccess file. Apache is configured to start at boot time, so I was thinking perhaps Apache cached these files at boot?

So I started looking for an Apache cache. Googled around, did some grep searches and located the htcacheclean command. The grep searches revealed this:

Which appears to have reported that it accomplished nothing. I've tried it again while running the inotifywait command to watch this cache directory and it really does seem to accomplish precisely nothing. And yet Apache does seem to cache these files -- I just don't know where.

I did a "touch index.php" and accessed my site -- this finally resulted in file access to my web root, but there was absolutely no file system activity reported in the cache directory. I'm thinking the cache might be somewhere else. Is there some apache command to find out?

I have not been able to determine precisely when and how often Apache accesses the files in my web root or where Apache is caching things. I'm also wondering what kind of file system load results from all these checks on a file modification time. I expect it's much easier than reading the contents of the file.

And I'm also totally confused that I see several file accesses on .htaccess for each page load. Why apache cannot cache this file is a mystery to me.

Apache always does a recursive check on the directory tree of the current document. Because the htaccess files are per directory and can impact how the request is handled and they are recursive they can't really be cached. This is the down fall of apache. Other systems like nginx bypass/optimize this feature and gain the performance boost.

Apache itself won't execute the php file in a cgi setup. I'm also not sure a read counts as an access like a process to write it would. I.e. is cat-ing a file the same as vim-ing it? In one instance the file cannot change as the mode is read only.

I'm not sure off your setup but would have you looked into amazon and using s3 along with many micro or small ec2 instances to serve up your content? You can use dds to store the db, run many ec2 instances to house the wordpress code and then use s3 to store User files like themes, uploads and plugins. If you use beanstalks you can disperse an upgrade across your entire system in a snap. Plus you can grow and shrink as needed and even expand across Geographic areas giving you failover capability (this is what netflix is missing).

I believe moving the mod_write rules from .htaccess to the apache conf file (loaded at apache startup) will improve performance and eliminate the need to check all the .htaccess files. Unfortunately, Wordpress has features that make it write this file -- e.g., when you change your "permalink" style.

Using this script:

PHP Code:

<?php print php_sapi_name();?>

I get "apache2handler" which suggests this particular machine is *not* running in CGI mode. I would love to know where the caching happens so I can check access patterns over there.

Doing a cat operation on a particular file via command line does not appear to trigger any access in this directory either. I'm beginning to wonder WTF "access" really means. I am hoping to find some kind of file access monitoring that actually gives me some idea of the workout my disk is getting.

I am indeed going to use EC2 and RDS and an ELB. I looked into using the W3 Total Cache plugin for Wordpress but from what I can tell it doesn't really cover everything -- namely certain core functionality of Wordpress that deletes or modifies local PHP files. Unless I'm missing something, my EC2 instances shouldn't be trying to require_once PHP files from S3.

My point with AWS and Beanstalk was not to put the WP core files in S3. Rather you keep the core files on each individual server and use S3 to house your user-uploaded content (images, videos, documents, etc.). When you install a plugin, all you have to do is then tell beanstalk you have a new version and to deploy it and it will deploy to all your servers for you.

Seems a lot easier than copying the file mulitple times to different places.

One thing to keep in mind is that filesystem use isn't necessarily 1:1 with physical disk usage: because reading from core is so much faster than reading from disk, the operating system (depending on what it is) may use otherwise-idle RAM to image chunks of the disk that it anticipates frequently reading from (this is why Linux, for example, reports such a low value for unused memory).

One thing to keep in mind is that filesystem use isn't necessarily 1:1 with physical disk usage: because reading from core is so much faster than reading from disk, the operating system (depending on what it is) may use otherwise-idle RAM to image chunks of the disk that it anticipates frequently reading from (this is why Linux, for example, reports such a low value for unused memory).

Almost all modern operating systems cache file-data in memory managed directly by the kernel. This is a powerful feature, and for the most part operating systems get it right. For example, on Linux, let's look at the difference in the time it takes to read a file for the first time and the second time;

Even for this small file, there is a huge difference in the amount of time it takes to read the file. This is because the kernel has cached the file contents in memory.

By ensuring there is "spare" memory on your system, you can ensure that more and more file-contents will be stored in this cache. This can be a very efficient means of in-memory caching, and involves no extra configuration of Apache at all.

Additionally, because the operating system knows when files are deleted or modified, it can automatically remove file contents from the cache when necessary. This is a big advantage over Apache's in-memory caching which has no way of knowing when a file has changed.