Re: Kernel VS application file caching

On Jan 22, 2010, at 1:00 AM, Joerg Sonnenberger wrote:
> On Thu, Jan 21, 2010 at 08:48:39PM -0500, Steven Bellovin wrote:
>> Let me give a real-world example. Between a NetBSD laptop and a NetBSD
>> desktop, connected via gigE, I can run 'ttcp -s' at (if I recall
>> correctly) 700M bps. I can upload data to my office at ~2.5M bps; I
>> can download at about 13M bps. In other words, when I go in or out of
>> my house, the network is by far the limiting performance factor. For
>> anything but a floppy drive, it doesn't really matter how fast my disk
>> is or how much caching happens; I can't ship data faster than the network.
>
> This is sadly only true as long as you are doing sequential reads.
> As soon as you have a lot of seeks, disk read performance suffers
> greatly. Try to tar up the content of src using
> find | shuffle | tar -T -
> for example.
I ran those tests. If nothing else, they show the difficulty of doing
benchmarks...
Modulo the exact commands necessary, I tarred up /usr/src three different ways:
in random order per your suggestion, straight-forwardly, and find | tar -T -,
to decouple the file system traversal from the tar. The shuffled order was was
first, but it did run at about 5M bps, so I could still upload at that speed
and be limited by the network....
On the particular instance of /usr/src I have, there are about 140K files. The
file system block size, according to dumpfs, is 16KB; 130K of the files are
16KB or less, and hence will not require a seek to read (to a first
approximation, since cylinders and cylinder groups are now a myth). Another
5.4K are two fs blocks; 1.8K are three fs blocks. In other words, almost none
of the actual file-reading is affected by seek time. So where was the time
spent?
A lot of it, it turned out, was writing out the file to disk. When I wrote to
/dev/null instead, both tar commands ran very quickly. Beyond that, I have a
3GB machine and /usr/src only takes about 1GB; after I read it once, more or
less everything is cached...
The first lesson, of course, is that running accurate benchmarks is hard. If
anyone has a way to invalidate the entire buffer cache, it would certainly help.
The second lesson is that defining the terms for a benchmark is hard. Am I
measuring tar to /dev/null, tar to a pipe, tar to the network, or tar to a file?
The third is that in a real workload, it may just not matter. How much RAM do
the users have? How much file space are they sending? What are the file
distributions? I strongly suspect that caching a very few files will make a
tremendous difference -- but I don't *know* that.
Build first, then measure, then optimize. Build so that it's easy to add
optimization in likely places. And how do you know where those are? That's
where experience pays off.
--Steve Bellovin, http://www.cs.columbia.edu/~smb