Byte sizes: making a meal of the language

A couple of weeks ago I wrote in this column about the disparity between
decimal and binary multiples. I wrote about how the prefixes "kilo" and
"mega" and "giga" and all the others, right up to "yotta", have been
appropriated by computer people to mean what they were never meant to mean.

Kilo means 1000, or 103. But in the early days of computers it was
appropriated to also mean 210, which is not 1000 at all but 1024.

This small disparity, of just 2.4 per cent, did not seem to matter, and
because computing was a specialist area, anybody who needed to tell the
difference could tell the difference.

But now computing is not a specialist area - it is mainstream - and we are
not dealing with kilobytes any more, but gigabytes and terabytes and more.

With every increase of 1000, that 2.4 per cent difference gets magnified,
until with terabytes it is almost 10 per cent, which becomes significant.

I also wrote of what will probably be a vain attempt to replace decimal
prefixes with binary equivalents when they are meant to signify powers of two
rather than powers of 10 - "kibi", "mebi", "gibi", etc.

Then we would have MiBs and GiBs and the like. This has been proposed by the
International Electrotechnical Commission, but it is probably too sensible an
idea to take hold. The column attracted a surprising amount of email. There is a
lot of interest in this area. Many of my correspondents pointed out an error I
made when referring to the sizes of disks.

I said that what is claimed to be a 60 gigabyte disk is actually about 64
gigabytes because of this disparity. But I was reminded of the fact that disk
drive manufacturers use the prefixes in their technically correct decimal
fashion, so that an advertised 60 gigabytes actually is 60 billion bytes, or
about 57 gigabytes when measured in the binary manner.

To quote disk manufacturer Maxtor's website: "It is an interesting
coincidence that every 10th power of two is approximately equal to every third
power of 10. This has resulted in two different definitions of these numbering
systems. This situation can cause confusion, especially in respect to hard disk
size measurements, where both measurements are often used. Maxtor defines one
megabyte as one million (1,000,000) bytes and one gigabyte as one billion
(1,000,000,000) bytes. All major disk drive manufacturers employ this
definition."

As indeed they do. My apologies for the error. A few correspondents corrected
me on this.

So the disk manufacturers use decimal, which makes sense, because it means
the drive seems bigger than it would if they used binary.

Problem is, RAM size is just about always measured in binary, and so
sometimes data that you think should fit onto your disk drive, won't.

In common usage 64 megabytes of RAM means 67,108,864 bytes, not 64,000,000
bytes. This situation is complicated by the fact that there is no consistency
with how different system applications such as FDISK and CHKDSK and different
operating systems treat the prefixes.

The problems do not stop there, as some people have pointed out. The
confusion goes beyond data storage to data communications.

ISPs often charge by the amount of data downloaded. But, like most people in
data communications, ISPs measure this in decimal, so you may be getting
substantially less than you think you are paying for.

Same with modem speeds - a 56.6k modem works at 56,600 bits per second, not
57,958.

"Most people would say that the PCI bus has a maximum theoretical bandwidth
of 133.3 Mbytes/sec, because it is four bytes wide and runs at 33.3 MHz. The
problem here is that the M in MHz is 1,000,000 but the M in Mbytes/sec is
1,048,576. So the bandwidth of the PCI bus is more properly stated as 127/2
Mbytes/second (four times 33,333,333 divided by 1,048,576."

These problems are far from trivial. As I mentioned, the disparity grows as
the prefixes grow larger. We are already at the stage where the entry level for
a hard disk drive is 80 gigabytes. Larger disk drives are very common, and we
are also moving towards larger and larger multiples in data transmission.

We are just a few years away from terabyte storage capacities on our PCs and
laptops. These capacities were once the preserve of enterprise data centres but,
with the massive growth in multimedia, including the storage of high-resolution
video, terabytes and petabytes - and exabytes - will soon become our daily
fare.

When that happens, the distinction between binary and decimal prefixes will
start to be very significant. The disparity approaches 20 per cent with
exabytes.

There does not seem to be any way around this problem, other than to be aware
of its existence.

While I applaud attempts to use suffixes such as kibi
and tebi, I can't see them taking off. We will just have to spell out longer and
longer numbers.