> > Also this allows eg database systems to be given a slice of disk which they> > are in complete control of, and can maybe manage better than the normal> > buffering (known access patterns etc).> > That's a theory I don't subscribe to myself.> > Sure, there are old-fashioned databases that think they can do a better job> of it than the kernel does. They are usually wrong, I suspect. They are using> raw devices more for historical reasons than anything else, and they could> just as well use a filesystem.

[yes, raw devices are a hack, still RDBMS ppl use it because:]

one not-so obvious problem is that an RDBMS >has< to implement awrite-cache for itself. Thus if the block device would be buffered too (inthe kernel), then we had double buffering. [as it is buffered now]

The kernel write cache spontanously writes data to the device, and this isnot good for an RDBMS: it has to be sure that the cached data firsttouches the log, then only the actual database. If the kernel providedsuch a functionality, then RDBMSs could efficiently use the kernelbuffering.

So RDBMSs like Oracle just do the following: SYSV shared memory as awrite-cache, raw devices as database. Files >can< be the database too, butin that case we have double buffering. (which isnt too bad due tobrilliant RDBMS designs, but which in turn results in slightly betterperformance data for raw devices). Oracle doesnt mmap() files directly,because it theoretically CANT: it has to guarantee transaction protection.

If we could guarantee for a database that a page wont be written out onlyif the database server wants so ... then using mmap() for a database wouldbe a much cleaner design [and probably would be faster too].

[i dont know how well mlock() is suited to do this task ... but i suspectit's rather for preventing paging, not for implementing an RDBMS-type ofwrite-cache]