CS162 Spring 2005
Lecture Notes 2005-03-16
prepared by Lane Rettig
*********************************
TOPIC: I/O Devices (Storage)
Main topics covered in this lecture:
- System Design: Trends in Storage Technologies
- Characteristics of Modern Disks
- Different Forms of Storage
- Optical Storage
*********************************
ANNOUNCEMENT
============
Professor still has exams if you need to pick yours up.
NOTE
====
This lecture referenced a large number of charts and graphs with details of
hardware specifications (see bracketed notes below). Most of these figures are
available for reference in the course reader.
INTRODUCTION
============
This lecture will discuss hard disks, floppy disks, optical disks, etc.
- A disk is a "hard, flat version of a tape": they have a write head which
magnetizes little spots on the disk surface, but disks use synthetic metals in
place of iron
- Most disks are fixed-sector devices, with blocks of data and interrecord
gaps, error correction bits, etc.
[professor passes around "toys": two hard drives and a CD-ROM drive]
- Why is this technology interesting?
System Design
=============
Trends in Storage Technologies
Factors such as size, speed, power, reliability, and noise level are relevant to
system design, which is what OS design is all about! This lecture will refer a
lot to slides, many of which can be found in the reader: you don't need to
memorize all of these numbers, but you should know them within a factor of 3-4,
e.g. "How many watts does a disk take?" (A: 3-5W, not 500W)
[see "Table-1 1.8" HDD, Major specifications]
This is from a Toshiba talk at the Hot Chips conference a few years ago,
discussing PC card-type disks.
Drive Specifications:
- Run on 3.3-5V
- Use ~1W while running, 0.25W on standby
- Figures for shock and G-factor, how much drive can withstand before breaking
(dropping a disk exerts an enormous amount of G's on it, and it's a good way
to break a disk)
- Noise is also a factor: 22~32dB, depending on size of disk
- Size
- Weight: ~2oz, this matters for a laptop!
Notes on this slide:
- The figures on this slide are a little out of date, and sizes have changed a
bit, but the numbers are generally relevant.
- No. of disks: no. of platters, usually 2 heads per platter
- TPI = tracks per inch, in thousands (5GB disk, ~40kTPI)
(Q: Can you really seek to within 1/40000 of an inch?
A: No, so the disk tracks have identifiers: the disk head seeks to where it
thinks the track is, and can be off by give or take 5 tracks, so it moves
a little further to find its track.
- BPI: bits per linear inch on the track, in thousands (~500kBPI)
- GBPSI: gigabits per square inch (~21GBPSI, Prof: "Better than it used to
be!")
- Rotation speed: these figures are fairly slow (~3990 RPS)
- Internal transfer rate: the speed at which bits come off the disk (~100MB/sec)
(Question from last lecture: What's the speed of USB2.0?
Answer: 480Mbits/sec)
- Host rate: the rate at which you can get bits out of the disk drive; depending
on the interface, you can get up to ~67MB/sec
- Buffers: these disks all have buffers built in--most today are 4-8MB
- 15ms seek time: these disks are PC cards, so speed doesn't matter all that
much; we'll talk later about what "average" seek time means
[see "Magnetic disk storage areal density" slide]
- It's always fun to look at how fast these numbers are improving every year.
- Year denoted as "C." to indicate "calendar year", as opposed to Japanese
calendar (these slides are from a Japanese company)
- Density expressed as areal density in GB per square inch
- Growing at 100% per year for this period
(Prof: That's a lot! There's a story from the Arabian Nights where the Amir
promised to reward a man for a favor by granting him anything he wished for.
The man asked for one grain of wheat for the first square on a chessboard,
two for the second, four for the third, and so forth, up to the 64th square.
Somewhere around the 30th square, the Amir realized what was happening and
rewarded the man by having his head cut off, the point being that you can't
double numbers forever: at some point the number of bits per square inch on
these drives would become larger than the number of atoms in the universe.)
- This graph expressed as MB/square inch vs. year
- The RAMAC was the first magnetic disk, 36 in. wide, ~5MB; developed at IBM San
Jose: disk was made with paint with magnetic oxide
- These early disks were physically big but not capacious
[see "Head-to-media spacing" slide]
- Shows how high the disk head is flying over the surface, in nanometers (nm)
- The RAMAC was > 10**4nm, now we're down to < 10nm, and this slide is 3-4 years
old
- What determines how good you can make a disk?
1. the height of the disk
2. the size of the magnetic particles
3. the width of the gap in the read head
- Improving just one of these will not improve a disk, they must all be improved
together
Q: What's happening to reliability with this growth?
A: Once upon a time, IBM would warranty its disks to 50,000 MTBF (mean time
before failure), and they were actually better than that, but those disks
cost $30-50,000 each. Disks now are < $100, but the spec sheets say
300k-1 million MTBF. But I just had two external disks fail in one month,
and an internal disk failure as well, so as far as I can tell the number of
zeroes on MTBF has more to do with the imagination of the guys in marketing
than with real reliability figures. This is natural, because disks keep
getting cheaper!
The industry is in an economic situation called an oligopoly (like a
monopoly, but when the market is controlled by a small group of firms), where
there are only half a dozen companies who use 90% of the disks in the world
(Dell, IBM, Fujitsu, etc.). They want the lowest price, which drives fierce
price competition among the major disk manufacturers (Seagate, Quantum,
Maxtor, etc.). Also, if you design a better disk, no one wants it: Dell
doesn't care, and you don't care, if one disk is 10% more reliable than
another.
Q from Prof: Given how steep all of these curves are, how long do disk models
last before they're replaced?
A: 3 months before they're replaced by something newer and denser, i.e. 4
product cycles per year
Q: How long will current-generation disks last before we need a newer, better
technology?
A: Every few years there's a jump in disk technology, usually involving the
heads, but like semiconductors, nothing truly major has been done in a long
time.
[see "Price trends for storage" slide]
- Sharp downward curve
- Shows flash (solid state) devices, hard drives, and microdrives
- Make disk area bigger, and disk area goes up as the square of the radius, so
disks become cheaper per bit without adding new technology or electronics
[see second slide, same title: "Fixed costs of HDDs needs to be reduced"]
- This graph is from a few years ago: flash drives and microdrives are very
competitive now in terms of cost of MB/sq. inch
[see "Price history" slide]
- Price per MB versus year of introduction
- This graph also has a steep curve
- Right now, external disks are about $1/GB, internal is 1/2-1/3 of that
- Around 30 years ago, a 28MB disk cost around $28,000 and was the size of a
dishwasher: that's $1000/MB (similar figures on main memory: was $1M/MB, now
it's $.10/MB, or a factor of 10M reduction in cost)
[see "Capacity Trend by Form Factor" slide]
- 1", 1.8", 2.5" disks
- The problem is that, as you make disks smaller, not only does the area go down
as the radius, the overhead of non-data parts (hole, spindle, etc.) increases,
so small disks are not nearly as cost-effective
[see "Form Factor: Media Size" slide]
- A disk's "data area by percentage" goes down as you shrink its size
- Normalized to area/data capacity, compared to a 2.5" disk, a 3.5" disk is
actually 2.5x as big, while a 1" disk is only 13% as large
[see "Form Factor: Comparison of a Foot Print" slide]
- This slide shows the physical size of a disk in terms of its enclosure (very
important to system design)
- Units are not important here (probably normalized to 100)
[see "Access Time" slide]
- This curve isn't all that steep (only improving by ~10% per year)
- Seek time: Time to move the head
- Access time: Total time to get the data off (seek time + rotational delay)
- Performance is increasing, but CPU speed is doubling every 2 years, and disks
only every 7, so it's clear where the bottleneck is
[see "Data Rate" slide]
- Data rate: how fast you can get the bits off the disk
- Internal rate: rate at which bits go from magnetic surface to the disk head
- This is fast, but due mostly to linear increases in bit density, not to
increase in disk speed
- If you double bit density while spinning the disk the same speed, the
effective data rate doubles
- All disks used to run at 3600rpm, because they were synchronized to the
electric current from a wall socket, i.e. 60Hz (no longer the case with DC
motors)
- Why not spin disks really fast?
1. They can explode if you spin them too fast! (Cray used to manufacture disks
cased in bullet-proof glass because they were seriously afraid of
fragmentation and wanted to protect disk operators.) Laws of centrifugal
force say that if you make a disk half as large and use half as much force,
you can spin the disk twice as fast and still get the same tension.
E.g. you can spin a 2" disk much faster than you can spin a 14" disk.
Q: Does the speed affect reliability?
A: Not in particular.
2. The read head cannot keep up. The signal coming off the read head is not a
square wave. In fact, you have very sophisticated signal processing in
place to read micro/nanovolts to determine the difference between 1's and
0's. Also encoded with error correction. It's difficult to get a decent
signal, and if you spin the disk too fast, it becomes impossible.
3. There's air inside the disk because the head has to float (cf. fluid
mechanics). The disk generates tubulence as it spins, and the faster you
spin it, the more air must be pushed out of the way. Faster drives require
more power and generate more noise.
Q: Is heat an issue?
A: Power becomes heat, so yes, although most of the heat is being produced by
the processor.
Q: Why are SCSI drives warmer?
A: Because they spin faster. A typical server drive spins at 10, 12, or
15krpm, whereas a desktop drive spins at 5400/7200rpm.
[see "Performance history of IBM disk products" slide]
- We're looking at data from IBM because IBM actually publishes it, as opposed
to most companies
- Most of these figures come from three papers in the reader on I/O technology
- This is the internal date rate, the rate at which bits come off the disk,
starting with RAMAC and normalized to 1
- Curve in three distinct zones: zone 1, where they were just learning how to
build disks, rapid improvement; zone 2, only minor improvements, mostly static
numbers; and zone 3, after they changed head technology
- Things are still on this curve, but they can't continue much longer this way
Note: Regarding the growth of disk technology, physisists have said a number of
times that the limit has been reached, that you run into Planck's
constant, that you'll get down to one bit per atom, etc., but disk
manufacturers have come up with ideas such as changing the shape of the
magnetics on the disk, e.g. going from horizontal to vertical alignment,
in order to maximize data density.
[see "Areal density Mbits/sq. inch" slide]
- CGR: cumulative annual growth rate
- MR: magneto-resistive heads
- GMR: giant magneto-resistive
- AFC: anti feromagnetically coupled
(see abbreviations at bottom of slide)
- Head technology, media type, disk spacing, disk flatness is all changing
[see "Raw capacity per floor space area" slide]
- Shows how many bits can you fit per square foot of computer room space
- Relevant if you are running a computer room (ex. Google has to worry about
power consumption at a server farm)
[see "Floor space for one TB" slide]
- Now around 1TB per 1 sq. foot
- You can buy 10TB storage for around $10,000: how many years ago was that the
total storage of all of the computers in the world?
[see "Volumetric density" slide]
- no. GB per cubic inch in real-life terms (i.e. the actual volume of a hard
disk as a component)
- Potential exam question: How many bits could we get in this room if we packed
it with disk drives floor to ceiling?
- This value is doubling every year
[see "Volumetric density, GB/cubic inch" slide]
- We have switched away from old mainframe disks of 25", 14", and 11", down to
much smaller 2.5-3.5" disks, because it's much cheaper: why pay $30,000 for a
mainframe disk when you can pay a lot less for PC disks and get more storage?
- This graph shows when the switch occurred from large form factor drives to
small form factor drives
- Also, you can manufacture many more small disks (50-100M) in a year, as
opposed to big disks (10,000), which leads to improvements in engineering:
this is an "economy of scale"
[see "Price/MB" slide]
- Very steep curve
- Note that desktop storage is cheaper than high performance systems
- Shows projection at each point versus the actual price
- Again, form factor considerations
[see "Price $/MB" slide]
- Compares disk drive, paper & film, and semiconductor storage costs
- Much cheaper to store on disk than in a real notebook
- Obviously, paper/film storage depends on font size/resolution, but there is a
limit (the precise number is not critical)
[see "Pictures of form factors" slide]
- from 1956, 14" form factor, down to 8" in the 1980s, 11", 3.25", 2.5", 1"
- Prof: "They're getting a lot smaller!"
[see "System KVA/GB" slide]
- Kilovolt amps (watts) per GB
- Power is an especially important issue in laptops because of batteries, and in
machine rooms because of cooling
- Different curves represent disk speeds and power requirements for different
systems, e.g. storage systems (left)
[see "Maximum internal data rate" slide]
- SCSI disks
- Data rate: linear density * rotational speed * density
- Change in slope represents change in head technology
Q: Does the data rate change from the inside to the outside of a disk?
A: Each track used to have the same no. of bits, so the data rate was constant
but the bit-spacing was different. This was convenient for the OS because it
could determine the position of every bit on the disk by simple arithmetic,
and could layout the entire disk. There were only 3-4 possible disk models,
and the OS knew all of them, so the OS could optimize I/O (this will be
talked about in more detail in 3-4 lectures).
However, this was inefficient, because disk heads can read bits at a very
narrow range of speed, only 20-30% either way, so the speed must be very
precise. The obvious solution was to switch to a different number of bits
per track, which has been done for at least the past 15 years. In fact, not
every single track is different, but rather the disk is divided into 6-8
zones, each with different bit capacities.
Q: Why is it complicated to increase the number of zones?
A: You could do that, but then you have more cases and your code becomes more
complicated. Formerly, disk controllers were dumb and the OS did everything,
but that's not possible now because of the sheer number of disks and disk
manufacturers (Maxtor, Western Digital, Seagate, Toshiba, Fujitsu, Hitachi,
etc.). So now all of the intelligence for interfacing with a disk and
finding data is in logic boards on the physical disk drive, which knows about
zones, bits per track, etc. The original interface was a physical disk
address, with a track, zone, and block no., whereas now disks are just
numbered externally by blocks, which are translated by disk controllers.
Q: Don't you get a faster data rate on outer tracks?
A: No, because the bits are further apart on the outer tracks, or else the data
rate would get too fast for the head to keep up.
Disks used to have "timing tracks" which specified the data rate to read the
other tracks. Now, disks are self-clocking, and there is just a limited
range of possible data rates. There are more bits on the outer track, so the
data rate is faster, but it is not proportional to the radius.
Q: Is this true for optical storage as well?
A: We'll get to this later.
[see "Disk drive access/seek times" slide]
- Access time = seek time + latency
- Rotational latency = 1/RPM
- Seek time = (actuator inertial power)**(1/3)
- The "data band" has to do with zoning
- Access time is expressed in ms
- You can see an improvement with a change in technology
[see "Figure 12: Performance trend for hard disk drives" slide]
- Year vs. I/Os per second
- Note the two changes in technology: addition of a disk cache and hard disk
drive buffer, which improve performance significantly
Q: Why are there caches in drives?
A: The closer the cache is to the physical drive, the more useful it is. More
importantly, you're buffering the surface of the disk, so bits can go
directly into the cache and wait for I/O there. Functionally, a drive has a
buffer, whereas a motherboard has a cache.
[see "Acoustic noise (sound power)" slide]
- Noise is definitely an issue, since noisy drives can be annoying
- Bigger disks are noiser than small disks
[see "Microdrive: selected specs" slide]
- These are microdrives, which can actually be used in digital cameras in place
of compact flash cards
- Current generation is 2-4GB
- Form factor: 1" drive
- Density: 15GB/sq. inch
- Seek time: 12ms
- Data rate: 4MB/sec.
- Rotational speed: 3600 RPM
- Power consumption: 0.25mA to spin up, 0.25mA to perform read/write, 1/7mA
idle, 1/50mA standby
- Weight: 16g (~0.6oz)
- Can absorb 1500G's when turned off
Q: Why are disks in iPods sturdier, e.g. they can be thrown around and not be
damaged?
A: They probably have some padding in them, but if dropped from 8 ft. they will
still break. There is still a limit, but it may turn out that you'd break
the case before breaking the disk. The iPod only reads from disk ~every 20
mins., and caches data, meaning it's more tolerant to damage when the disk
isn't spinning.
Some new disks have motion-sensing technology which allows them to lock the
disk head in laptops to help prevent damage--doesn't mean it won't be
damaged, just increases tolerance.
[see "External storage: Summary of features" slide]
- All hard drives
- List price versus capacity
- Prices now down to < $1.50/GB
[see "Historical rates of increase in linear, track and areal density" slide]
- 21% per year in linear density
- 24% per year in track density
- 49% per year in areal density (total)
[see "Average seek time, rotational speed, access time" slide]
- These curves show the effect of performance
- Average seek time going down ~8% per year
- Average rotational speed increasing ~9% per year
- Average access time improving ~8% per year
- Practically, the rate at which you can get bits off of a disk is increasing
10-20% per year, but the capacity is increasing at a rate of 50% per year, so
it takes much longer to dump the contents of a disk.
[see "Actual average seek and rotational time" slide]
- Shows percentage of actual seek/rotational time as a function of the
manufacturer's specified values
- One reason these numbers are low is that "average seek time" is from a random
place on disk to another random place on disk, which is not a good measure of
actual performance because data access and data placement are not random (but
it can be considered a "worst case" assessment!)
The steps to perform a disk I/O operation:
1. Seek: move the head to the right track
2. Set sector: must wait for disk to rotate to right place (this depends on
rotational latency)
3. Perform the read/write as the sector passes under the head
Note: In the past, the OS used to control all of these things. Today, however,
it's all built into the drive, so the printed circuit board on disk may contain
as many as 100k lines of code. The OS only needs to provide the block number,
and the disk does everything else.
There used to be something called an "RPS miss," which is when the OS got an
interrupt from the disk when it got to the right sector. That's what the disk
buffer is used for today: data is read directly into the buffer whether or not
the data path to the CPU is available.
Characteristics of Modern Disks
===============================
[see "Disk characteristics 2004" slide]
- Sizes of disks available: 5.25" (dissapearing), 3.5" (desktop), 2.5" (laptop),
1.8" (ultra portable), 1" (compact flash)
- 2.5" drives available up to ~80GB
- 3.5" drives available up to 300-500GB
- Sector sizes still 512 bytes, goes back 30 years
- Areal density higher now, but still 30-60GB/sq. inch
- 15k RPM drives are server drives, which are powerful and loud
- Laptop drives are smaller and quieter
- Seek time still around 1-2ms, although the number of tracks has gotten very
great (this includes head acceleration/decelleration)
- Maximum seek time is from disk edge to disk edge
- Media transfer rate up to 650Mbit/sec (off disk surface)
- Can sustain up to ~98MB/sec (from buffer)
- Instantaneous transfer rate up to 320MB/sec (with UltraSCSI 320, ultra wide)
- Bit error rates: 1/10**12 (recoverable, with error correction), 1/10**16
(unrecoverable)
- Reliability: >= 300k hours MTBF (Prof: "I don't think so."), min. 50k
stop/start cycles (matters more for a laptop, where you turn it on/off very
often)
- Startup time: 1-10 sec
- Caches: 64kb up to 8MB
- Power: read/write 5-10W, idle 5W, standby 0.5W
Examples from spec sheets of specific drives:
- Maxtor Atlas 15K II (2004)
- 150GB, 4 platters, 8 heads
- High-end server disk, so 15k RPM
- Seek time: 3-4ms, max. 9ms
- One track seek time: 0.3-0.5ms
- Can sustain data rate of 98MB/sec
- Cache: 8mb
- Altitude: this is important because the disk head flies, so air pressure is
relevant
- Noise: up to 36dB
- Power: ~10w
- These numbers are crucial to system design
NOTE (again): You should know the parameters of the spec sheets in the reader
within a factor of 2-4.
- Most modern disks have semiconductor storage for caching
- Not uncommon for large disks to develop defects: these are remapped in the
controller
- There are spare blocks and spare tracks to which the controller can remap bad
blocks (can be mapped to same track and cylinder, or to some special place on
the disk)
Q: Can this remapping cause the size of the disk to be reported differently?
A: There shouldn't be enough defects to affect disk size. However, sectoring
wastes a lot of space, e.g. a 300GB disk becoming 250GB after sectoring.
There is a lot of space wasted in sector marks, track marks, and error
correction bits.
- Originally the OS knew where all of the bits on disk were; now, the disk is
mapped by the controller, which keeps track of bits per track, zoning, etc.
(more bits on outer zones: this gives 35-50% greater capacity than if there
were the same no. of bits on each track)
- Everything else on this slide was already explained above
Solid State Disks (SSD)
- An OS has a limited number of hardware interfaces (tape, disk, etc.)
- To make a faster disk, we can build one out of semiconductor storage and give
it an interface which looks like a hard disk
- E.g. flash memory pin/stick drives, which look like a disk to the OS
- This way the OS doesn't need a new interface for every new storage technology:
the trick is to make it look like an old technology!
- Quantum has a number of solid state devices on the market
- This data is old and outdated by a factor of 10
- Can buy up to 2GB solid state devices now
Different Forms of Storage
==========================
Drums, an Antique Technology
|