> On Sun, Mar 21, 1999 at 12:16:40PM -0800, Matthew Jacob wrote:> > Then there was "Enriching the File System-Storage System Interface"> > work-in-progress (Drew Roselli, Jeanna Neefe Matthews, Tom Anderson> > (drew@cs.berkeley.edu) which talked about informing the storage subsystem> > with info hints as for file block lifetime, etc... I think this might have> > been what you were thinking of? > > > I think this is what Jim must have had in mind. Tell the disk drive which > blocks are related, which blocks are unlikely to be written, ... so that the> drive can optimize geometry. If I'm not mistaken, sct said that ext3 will> support this kind of thing.>

There have been some responses from those who declare that Ihave been talking about "old technology", some who declareas usual that I don't know what I'm talking about, and somethat agree.

Nevertheless, I will present some ideas which I hope will makea few persons think, rather than just reject this information.

The technology to which I referred that interleaves sectors isindeed old technology. It is old simply because it has beenmoved from the external world into the internals of the Diskdrives. The elementary physics of Disc drives has not changedsince IBM Engineers first developed the 3030 which, becauseof its part number, became known as the Winchester drive.

I am sure that I will get some flames from some who will claimthat the part number was 3006 and the thirty-ought-six willout-shoot a thirty-thirty any day. Sorry, I beat you to it.

Basically you move a head over some magnetic media and writedata in a circular track. When you want that data back, youswitch the head into a read amplifier and read it.

These data are written and read as a serial stream of NRZ data.NRZ is used so there is no overall magnetic bias and it is self-clocking. This serial bit-stream is generated and recovered bya device known as a serializer.

Heads have changed over the years, media has changed, and supportmechanisms including electronics and I/O devices have changed. However, the basic stuff works according to the original principles.

In the "olden" days, you could optimize the throughput by alteringthe sector interleave during a formatting operation. By matchingthe rate at which data could be written or read with the rateat which you could send or receive these data from the externalworld, you could optimize the I/O performance of the disk drive.

In recent years, the emphasis has been upon increasing the storagecapacity of these Disc drives. The storage capacity, which usedto be specified in terms of kilobits per square inch, calledareal density, is now specified in terms of gigabits per linearcentimeter.

This means that read amplifiers that used to handle data at upto 10 MHz are now handling data at 800 Mhz to 1 GHz. You cancheck the numbers yourself. The bits per second that travel underthe read head is now much greater than the bits per second thatany I/O interface can handle. This means that sector interleave,spiral compensation, and other tricks are now worthless becausenothing you do along these lines will allow you to actually usedata as fast as it becomes available. The data just arrives toofast. Because of this, sectors are no longer interleaved.

Marketing persons, seizing buzzwords, cite this as an advantageand quote 1:1 interleave as though it were a benefit. Instead itis simply a side-effect.

Expensive disk drives now do full track buffering. This costsmoney because RAM costs money. To buffer one full track on aDisc drive requires CAPACITY / (HEADS * CYLINDERS) which canbe upwards of 100 megabytes of high-speed SRAM. Sector bufferingis always necessary. It is part of the de-serializer and isrequired because the Disc internals are never synchronous withthe outside world.

Full track buffering suffers from the possibility that the nextread or write will be on a different head. If the buffer wasdirtied by a write, it is flushed to the physical media beforeany other operation can take place. To save time, the wholetrack is written. This saves time because you can start thewrite anywhere. However, it still takes 1 revolution of theplatter.

Because of this, rotational latency now means nothing either.This is another buzzword gone obsolete.

Full track buffering and other expensive disk optimization techniquesare usually done in SCSI Disks. However, some enhanced IDE drivesnow provide these features, but not the "Saturday night specials".

The "smarts" are generally provided by an ASIC which is a largeprogrammable gate-array which can be just as smart as a CPUand 200 or more times faster. However, it is "application-specific", not something to which you'd port Linux.

Some IDE drives provide "read ahead". In plain languagethis means that the sector buffer is larger than one sector so,if enabled, the sector buffer will be filled on a read. Thereis a non-zero probability that the data you need next willbe in the sector buffer so it is possible to improve performanceby using this feature if you read continuous sectors.

In summary, the internals of Disc drives necessary to provide highcapacity, make it impossible for the outside world to enhance thedrive's performance. As Linus has said; "It's just a linear groupof blocks...".

There are a few things that can be done though.

(1) Queue reads and writes separately. If the drive is givena group of blocks to write, it can optimize head-travel andeven servo settling times to improve throughput. It is notuseful to sort writes because, what may seem adjacent to youmay, in fact, not be accessible on the same track or head onthe physical drive.

Write queueing is the least expensive in software because youknow when a buffer has gotten dirty. Just wait until quite afew are dirty and write a bunch all at once. How many? Don'tknow.

Read queueing generally wastes time because the application reallyneeds the new data now. However, finishing a concurrent writebefore doing the read, may improve performance marginally.

(2) Use separate queues for each physical drive. SCSI Discs candisconnect while they do their thing. This allows other SCSIactivity to occur. If the next read or write in the queue isfor the drive that disconnected, you wait. You have no choice.However, if you have separate queues, you get to read or writeto another device during this time.

This may improve swap performance for those who have set up aseparate swap drive.

In conclusion, all this bandwidth will be wasted in a year ortwo anyway. Disk drives currently under test use static RAM.This is not the Slow........ NVRAM where you write then read-read-read-read, etc., until it finally "takes". This is real staticRAM with a 3-volt battery to keep it alive for 20 years. Thesedrives use standard interfaces so you can just put them in yourPC and be done with it. Initially they will be expensive, about4-1/2 times the cost of an electromechanical equivalent. Theywill be marketed as high-speed improvements for file-servers andthe like. As the market develops, the cost will come down.Storage Tek, IBM, and, Fujitsu are testing these things now. Probably Seagate and others are also, but they are keeping quiet.The speed of these drives will be solely dependent upon thebandwidth of the I/O interface so I wouldn't spend much timeworking upon elevators.

Cheers,Dick Johnson ***** FILE SYSTEM WAS MODIFIED *****Penguin : Linux version 2.2.3 on an i686 machine (400.59 BogoMips).Warning : It's hard to remain at the trailing edge of technology.

-To unsubscribe from this list: send the line "unsubscribe linux-kernel" inthe body of a message to majordomo@vger.rutgers.eduPlease read the FAQ at http://www.tux.org/lkml/