Posted
by
timothy
on Monday February 11, 2013 @05:50AM
from the internal-struggle dept.

An anonymous reader writes "When it comes to RAM, as every geek knows, 1 GB does not mean 1 billion bytes.. it means 2**30 (1,073,741,824) bytes. However, several decades ago "they" decided that GB, MB, and KB would be interpreted differently when it comes to disk drives; 1 GB means exactly 1 billion bytes. Ed Bott points out that Microsoft's marketers and Windows kernel developers aren't on the same page when it comes to these units: the marketers use the more generous decimal interpretation, while Windows measures and reports capacity using the binary (2**30) measure. Careful customers who bother to check what they've got have been known to get peeved by the discrepancy."

Article is a forum post from 2008 talking about things we knew before then.

Why was this posted?

Recently I decided there were now so many of these ludicrous stories, it was a waste of time to post regularly. As of this very second, I'm beginning to wonder if it's even worth the bother to parse the daily Slashdot headlines.

It's been on deferral until enough time has passed since Steve Jobs uploaded himself to the iCloud. The author mentioned Apple, so it got tagged. Homeland Editorial was a little slow in picking up the intel. We assure you that we have fired the editor responsible for this. Also, the person who wrote that last sentence has also been fired, as well as his manager, his manager's manager, and the entire division. We take redundancy and outdated news very seriously here at NuSlash. We take redundancy and outdate

If the computer industry can't adapt to counting the way of the rest of the world does, that's our problem. We should be pointing at whoever originally decided that they should usurp the already established term Kilo to mean 1024 and slapping them upside the head. Anything less is pure arrogance on our part.

Why should they? It's not like enough of us are rushing to use those terms right now. The entire computer industry's historic dumb acceptance of an erroneously redefined numerical term makes it's our problem to fix, no-one else's.

Does anyone else remember a CS class when a lecturer papered over the cracks of this particular issue?

Back when I first studied c.eng, it was drummed into us that base 2 units were ONLY to be used for references to perfectly binarily addressable devices. RAM as we have it today with word lengths that are also powers of two is one. CPU Registers another. Some displays at the time were as well, although no longer, and the origin there was RAM based.

All else, such as file sizes, card, tape or disk storage, network bandwidth, logic frequency and the like were strictly Base 10.

Then small systems crept in and base 2 assumptions began to spread. The 1980s brought hard drives marketed with base 2 units. In the 1990s people started to believe a 10MHz cpu was 1024*1024*10 Hz.

Now this century it's not uncommon to find self-professed geeks calculating say, theoretical throughputs based on the idea their gig-ethernet is 1073741824 bits per second, or that their CPU/RAM speeds use similar numbers in GHz.

Not all of us like kilo to mean 1024. I don't. However, there's a good argument for getting the world to switch to base 8 or 16 for the basic number system. That would trickier to achieve, but we would all be happier in the end, and everything would be consistent (I do like base 12 however, sigh...).

Well for one reason, the hard drive manufacturers have the ISO standards on their side.

There are well defined (though, unfortunately, rather ugly) prefixes for 2^10, 2^20, 2^30, 2^40: kibi, mebi, gibi, tebi. If people want to use base 2 quantities, then use the right unit and there isn't any confusion. Apple does the right thing in reporting sizes in base-10 units; GNOME also does the right thing and uses base-10 units, so I don't think you can say that Linux is in the same situation as windows here.

Likewise I say "true GB" for 1024-based and "salesman's GB" for 1000-based. Because the 1024-based units ARE the true units, and the 1000-based units WERE created just to make hard drives look bigger than they actually were.

Agreed completely. The rest of the world refuses to achnowledge these units.What is more interresting is why the harddrive manufacturers, who will surely be aware of these standardized units, don't mention them anywhere.

The United States of America can't convert to metric and SI units so it's not reasonable that they could convert to any standard. It is a country full of dumb arses (or, asses because they also cannot spell.)

Exactly. Consistency is more important than tradition in the long run. 1MB = 1 million bytes should become standard. The GP needs to live with that, as it is in fact better. Besides we have mibibytes and mebibytes if you still need the older, broken metrics.

The proble is that a KB, where it equals 1024 bytes and sometimes erroneously portrayed as kB, is a compound unit, as it should be because a byte is not a SI unit, derived or otherwise. It was "defined" first and those who used it knew exactly what it meant.

Contrast with a kB that equals 1000 bytes (usually portrayed erroneously as KB) which is a SI multiplier prefixed non-SI unit.

Basically it comes down to the case of the first letter.

The IEC had to paper over the cracks the hard drive manufactures created

The problem came from an area that they often do: Programmer laziness. Programmers of early computers called the 1024 byte blocks computers used "kilo" since it was "pretty close" to 1000. Since space and power were limited, it was considered unnecessary overhead to actually convert it to base 10 for display.

Then as things went on it stuck, and the error kept getting bigger with increasing size.

Frankly I think OSes need to get with it and just start using base 10 prefixes for drive space. I mean we already

The only reason it that 1GB = 1GiB every caught on is because RAM really relies on a power of 2 address bus, so it's always very closely tied into powers of 2 and it's convenient to round that to its nearest decimal equivalent in order to talk about it succinctly.

There was never any reason to do it for anything else, and hard disk manufacturers pretty much never used GiB when they meant GB.

And even the venerable 3.5" floppy was an unholy mixture of KB and KiB multiplied together.

Hard drives used, in the long ago good old days, to be measured in base 2 sizes. Back in the days of 20 Meg and 40 Meg and 80 Meg, they were measured using base 2 and so buying an 80 Meg HD, you got 80 "computer" Megs. This was also back in the days of 10+ different HD makers (lots of competition).

Then at some point an idiot marketer was looking for any edge to make his/her companies product look different than the competition. And they discovered that if instead of dividing the count of bytes by 1024*1024 they instead divided by 1000*1000, the result was a larger number. I.e., a 200 Meg hard drive could now be advertised as 209 Meg. Since 209 was larger than 200, they felt this gave them a "one-up" on the other guys. And once the first idiot marketer did this, the rest of the idiot marketers soon followed suit, because they could not have their own products looking "smaller" on the shelf, and the result is that now HD's are the only computer component that is advertised in base 10 sizes.

The idiot marketers are also why when you go to buy a hard disk that is only about 15 cubic inches for the disk itself you find the box to be about 5 cubic feet on the store shelf. Not all of that 5 cubic feet is for "padding". 99% of it is to make the box look larger on the shelf.

Well count me then as one of those idiot marketers, because if I was in their position, I would have been proud to do the same, not for the money, but because it simple BETTER to be consistent with the rest of the scientific world. We're behind by saying 1KB = 1024 bytes, not them.

I made sure my own calc determines "1kb as bytes" = 1000 bytes, and that's how it should be.

While far from definitive, this would seem to suggest that the first reference equivocating 1k with 1024 with an article in 1964 by Gene Amdahl, followed by a similar assumption of equivalence in a 1965 article by MV Wilkes. I think it's safe to say these references pre-date those hard drives you mention.

This would suggest that computer science did originally adopt the standard definitions of kilo etc. but then started to deviate from them in the mid-60's for the sake of ease.

Back in about 1994 I was at a Microsoft conference and they were giving away free copies of Windows NT to anyone who could answer how much NT could address (the address bus was 32 bits at the time). I answered correctly with the answer 2^32 bytes and got my free copy of NT (still in a box somewhere in storage). So at least at that conference I was at MS recognised that the correct quantity was 2^32. So, something seems to be wrong with this article.

RAM: if they make a bigger module, they usually just double the number of chips on the module -> 2^x. Another reason here is, that you have a nice address, which ends with all zeros (or fills the complete addressfield), when your maximum address is a power of two.Harddrives: they are produced independend from such considerations, you have like 100 GB, 500GB, 3 TB... all of them do not fit in a nice 2^x scheme anyway. Thats the reason, they are produced in GB and not in GiB units.

Regardless, the number that should be reported when describing capacity should be the base 2 number when talking about RAM - as RAM is by its nature a base 2 capacity mechanism. The capacity can be described exactly this way.

But for hard drives, where the storage is in effect linear across multiple cylinders, heads, etc, is base 2 what should be used - ignoring historical usage? Well, block sizes are in powers of two... but we don't have a power of two number of blocks. We therefore don't have a capacity number that can be described totally accurately using the base 2 numbering system.

And SSDs? Due to bad blocks, and reserved storage area, we are turning something that was a base 2 capacity memory system into something with less capacity.

And what about the files themselves? They're not powers of two in size, and indeed they waste capacity at the end of the file because the basic unit of storage in a drive is the formatted block size (512 bytes, 4KB, etc). Maybe block based systems should be advertised as offering "2 Billion Formatted Blocks* (* 512 byte blocks)"! In addition that file is likely compressed in some way that you can't consider that it will use the same space in memory when loaded.

A strong argument is that because computer RAM is xGB, meaning x * 2^9 bytes, then we should use the same unit for other things in a computer that are expressed in GB, because in the end it is clearer to the user who can compare the two things, e.g., "the computer has 500 times more HD than RAM".

This is what pisses me off about using, say, Ubuntu and Windows 7 on the same computer. Several years ago Ubuntu (and hence derivatives like Linux Mint) changed their units policy such that 1 KB = 1000 bytes, not 1024 bytes as Windows still does. Hence files sizes will appear differently between the two systems, which is terrifying if you're manipulating data between such operating systems.

The basic issue is Marketing Speak. Those people don't understand how to use the Geek Speak values of 1024, 1048576, and 1073741824. They are going to use 1000, 1000000, and 1000000000. Just understand that and live with it. I do. As long as the sectors come across as sizes 512 and 4096 (instead of 500 and 4000), the device can work. I remember working with mainframes and having sector sizes of 800 on some drives.

I don't use this KiB, MiB, and GiB crap in my software. The standards group that made that doesn't have oversight on software. It was intended for hardware and marketing, which hardly ever uses it. I have code for doing number conversion with metric-LIKE suffixes, but that specifically needs a single letter, so that's just gonna be the way it is. Use it where the binary-ish values apply and don't use it where you need powers of ten.

It's all about knowing which way to interpret the numbers. For disk drives I know they are talking about k=1000, M=1000000, and G=1000000000.

I saw this one pop up in my RSS feed and thought maybe/. was broken. Then I went through the comments and realized it wasn't a repost of something old, nor was it really anything new. It was something in between.

I don't know when/. devolved into what it is today, as I've been reading for years now. It's always had a bit of an anti-MS twist to it, and while I didn't always agree with the article bias, I could see how it could be used as constructive criticism for not just MS, but for other companies as well. When you're the 800-lb gorilla, people notice you. When you're the 800-lb gorilla and you tie your shoes together and fall, other people tend to not tie their shoes together.

This post doesn't really fall into a constructive criticism category, though. It's pure, unadulterated, trolling. I mean the source is a joke. It has to be. The "author" of that blog clearly understands computers. Ed's written over 30 books on software use. He's just griping about something everyone already understands. A slow news day. It happens.

By why, oh why, do the editors here feel the need to pick it up and make it front-page news along with news of Ozone holes, Corn shortages and Social Engineering your way into the Super Bowl? Those are nerdy news stories. This... is not. If you wanted to fill up the front page with stories like this, you should be including the following gems:

I don't expect this post to actually get anything done, but I'm making it just the same. Something has to change around here. While I know I'm just a drop in the bucket (just like I am with AT&T, Verizon, Comcast and T-Mobile...who I loathe), I'm out. My Excellent Karma, ad-viewing eyes, and borderline nostalgic insightfulness are out. I don't intend on letting the door hit me on the way out, either.

k, M, G, etc. are defined SI shorthands for 10, 10^6, 10^9. They have been defined that way long (as in computer age) before computers had (that many) bits. However, when the information age broke out, computer technicians also required the shorten numbers. They decided to synchronize their system with 2^10, 2^20 etc. because it is close to the SI figures (at least for kilo and mega). That was good enough. And it was so much easier to shift values by 10 bits. However, it was in violation of the real SI meaning. To solve that issue, a new terminology was proposed where k, M,G,T meant 10, 10^6, 10^9, 10^12 and ki, Mi, Gi, Pi are 2^10, 2^20, 2^30, 2^40. The idea was to fix software in short time, people should use these *i prefixes until they are able to count and divide their number according th SI units.

Slashdot is not an education site, and this is still news, because nothing has changed. You simply don't hear about it that often, because your right, most people know about this marketing[decimal] vs real[binary] measured value, and the whole lies/justifications around it. This is simply a new spin on things...[I quite like the way the heading has chosen to show marketed:real] Microsoft [perhaps unfairly; everyone else does it] because they represent the real values within the OS, and lie about the real va

Actually, when it comes to correctness: the International System of Units defines kilo-, mega- and giga- as powers of 10 instead, not powers of 2. I think it is much clearer for a user to define a megabyte as a million bytes. How memory is handled inside a computer is something developers care about, no user should be bothered with it. So all in all I agree with the marketing-people, albeit for different reasons.

It's not that common, but since I've been using Linux since I was a kid, KiB MiB and GiB are my base mental units of data. It makes no difference whether or not they're somehow "better" or "worse". It's just what I'm used to, and the base-2 convention seems to make more sense in the world of computing than base-10 numbers. Computers only work with base 2 numbers directly in any case.

I'd pretty much agree to the "we should use base 2, computers are base 2"

However, the article makes a bit of an overstatement. This is not a kernel dweeb vs. marketing dweeb issue. This is a software developer vs. hardware developer issue.

Sofware developer: Base 2 is easier to work with. We use base 2 (or more precisely, the base (2^10) derivative).Hardware developer: If we use base 10 (or more precisely, the base (10^3) derivative) our drives appear larger.

None of what you are talking about has anything to do with what I said. I am talking about the measurement of things, not the things themselves.

Memory components are power-of-two boundaries in size. This is necessary because if they were other than a power-of-two in size, math would have to be performed on each memory access. For instance, if you had memory chips that were 1000 bytes in size, and you wanted to access byte 1024, you would have to perform a calculation to find that the byte is at location 24 in the second chip. With binary sizes however, all you need to do is use the address lines to directly access the correct location in the correct chip. Also note that the word-size of the data does not matter: you could return 1 bit, 8 bits, 10 bits, anything at all. What matters is that the number of 'things' (whatever size of the 'thing' itself is) is always a power of two.

Network speeds are not dependant in the slightest on a power-of-two, regardless of the data being transported. There is absolutely no reason to say that a network that can transfer 1024 bits per second is in any way better or more natural than one that can transfer 1000 bps or one that can transfer 1100 bps. There is no reason to assume that a 'kilobit per second' is anything other than 1000 bps. And if you change the measurement to count bytes instead of bits, a network can transfer 137.5 Bps as easily as it can transfer 1100 bps, or 1.1Kbps.

Hard disk sizes are like network speeds: there is no inherent power-of-two to their size. There is no reason why a disk could not be made to hold exactly 1000000 bytes (excluding the fact that you would have a partial sector). Therefore, trying to force some power-of-two based prefix on those sizes is just silly.

"None of what you are talking about has anything to do with what I said. I am talking about the measurement of things, not the things themselves."Everything I am saying has to do with what you said. Specifically as to why you are wrong that Storage and Network should be rated in something other than power of 2 units. I can only assume you've never worked with assembly/machine code especially for smaller chips.

"With binary sizes however, all you need to do is use the address lines to directly access the corr

and yes, why is this geek news when anyone with either a passing interest, or who has ever done a wiki crawl, will know this?

Indeed and since when did it matter what Microsoft does on./ ? Stuff on./ seems to get less and less "nerd" (figuring out how stuff works / hack together solutions) and more and more "geek" (the "tech hipster" buying the latest stuff, preferably before it is cool).

No one ever uses that terminology in the real world (well, maybe a handful of standards-crazy Linux developers, but that's about it). There was an attempt to shove it down everyone's throat on Wikipedia a couple years ago and it was decisively beaten back. No one wanted this baby-talk in their articles. The Commodore 64 didn't have 64 "kibibytes" of RAM (I feel silly even typing that), it had 64 KILOBYTES of RAM. That's how prefixes have always been used in the IT world and always will be. The International System of Units can go to hell.

That's how prefixes have always been used in the IT world and always will be. The International System of Units can go to hell.

Absolutely wrong. The use of kB to mean 1,024 bytes started around 1960, and only for memory. Bandwidth has always been, and is still, measured in powers of 10, not 2. Disk space was measured in powers of 10 until Microsoft came along and muddled the issue. Disk manufacturers still use powers of 10, like they always have. Software is a mixed bag, with some developers using powers of 10 and others using powers of 2.

I think it's safe to say that most people who have been in the scene from the beginning think the "correct" SI definition of kilo can also fuck off when it comes to computers. As can kebi, mebi and friends. It had been accepted from the beginning that "kilo" meant something just a little different when it came to describing bytes. I accepted that. Everyone accepted that. There was no problem. Even in academic circles, there were no issues.

The problem came with the storage industry and their pious "oh, but that's not what SI says the units mean". If you think that conforming to strict SI is the reason they made their change, then I'd suggest you not accept kool-aid from strangers. Ever. It was marketing greed, nothing more

However, while I think kebi, mebi and friends can fall down a deep dark hole, I actually don't mind using their unit symbols. At least in that way there is no misunderstanding in writing what is meant, and the trickle down effect from intellectual papers where it's vital that it's specified what value is means to more lay writings can occur without changing the unit symbols. But I do not now, nor will I ever, read 500MiB as "five hundred mebibytes".

No. The "correct thing" would not be to confuse the consumer by labeling their sizes as 100GB, 500GB, etc. Deceit is never "the right thing to do" and that is exactly what they did. People see "GB" and think gigabyte; for it to mean anything else is intentionally confusing/deceitful. Should they be forced to use gigabytes, megabytes, etc? No, but they should have the decency to call their new decimal-based measurements something else entirely.

The problem arises when the two are used interchangeably. I don't care if a HDD's packaging expresses the capacity in powers of 10, as long as it's clear there's a difference between KiB and KB.
A much bigger problem is manufacturers having their devices marketed with 64GB of storage when only half of that amount is available for the user due to the other half being taken up by the OS and pre-installed apps.

Please remind me: How many bits is there in an SI byte? Is it 10, 100 or 1000.

There is no "byte" in the SI. The question is therefore irrelevant. There's an IEC standard containing prefixes for 2^10, 2^100, 2^1000 etc, and those prefixes are kibi-, mibi-, gibi- and so on. The SI officially references them, even if they're not strictly part of it.

If your byte contains 8 bits, you are either using the binary sizes, or you are mixing things to fool the customer.

What's the relationship between the number of bits in a byte being 8 and 2 being the base for the multiples of the byte?
Moreover, deciding that "a byte" is *the* unit of the smallest addressable memory cell of machines is a oversemplification, because there were in the past, and there might be in the future, machines having a word size which is not even a power of two. If anything, one might think that using powers of two to "size" memory comes from the fact that the widths of the ranges addressable by a bus made of binary wires are by nature powers of two - but that has nothing to do with the fact that the addressed items are bytes, 37-bit words or whatever.

Hard disks are memory, and counting that memory in powers of two makes no sense for them, since they store bits in very strange patterns, therefore hard disk manufacturers never adopted it. Computer networks transfer memory, and counting that memory in powers of two makes no sense, especially since they often transfer bits and not bytes, hence network designers prefer using bits and their decimal multiples rather than their binary counterparts, and they've always done so.

If you broaden your vision, you'll see that it's transistor-based memory to be "the exception". Therefore the onus should be on operators of that field to adopt the standard binary prefixes, as ugly as they may sound (and no I don't like them either), in order to avoid ambiguity with the terms used by the rest of the world.

Memory is allocated in increments of at least 4096 bytes and a maximum of 1,073,741,824 bytes. Please explain to me how a 1GB page cannot fit into 1GB of memory and why we would allow for the abomination of 244,140.625 4KB pages in 1GB of memory. How does the computer even handle fractional pages, is that even defined?

Screw the decimal system for computers. Using decimal for computer is as annoying as trying to figure out how many degrees Ferinheight one pound of water will increase if you apply one joul

A byte doesn't have to contain a multiple of 10 bits, it's just a base-unit, like a meter, a gram or a watt. The number of bits inside a byte is also something only developers care about, for a user the smallest unit he ever has to deal with is a byte.

"they" defined "kilo", "mega" and "giga" to mean 10^3, 10^6 and 10^9 long before binary computers even existed. The only reason 2^10, 20 and 30 have ever been used is because a coder working on an ancient system decided that 1 clock cycle for a right shift 10 was better than a few hundred odd clock cycles for a divide by 1000 in software, especially when trying to display the sizes of a good number of files on a system clocked at only a few kil

Welcome to the world of HDDs, where pretty much EVERY HDD out there is marketed as having rounded storage space (e.g. 1 TB) when that actually is 931.3xx GiB.This was news 10 years ago, when people started to see difference (why is my 40 GB HDD smaller than advertised?)

My thoughts exactly. This is an article appropriate for The Today Show or something where you are informing the illiterate masses, not something worthy of posting on Slashdot.

BTW, this reminds me - a couple of weeks ago on the Today show, they were talking about new cool comptuer terms. One they were talking about was "animated GIFs". I felt like I jumped into a time machine and went back 20 years into the past.

My thoughts exactly. This is an article appropriate for The Today Show or something where you are informing the illiterate masses, not something worthy of posting on Slashdot.

BTW, this reminds me - a couple of weeks ago on the Today show, they were talking about new cool comptuer terms. One they were talking about was "animated GIFs". I felt like I jumped into a time machine and went back 20 years into the past.

It's not the news in itself - it's the discussion which comes from it. I'm firmly in the camp that we should swallow our collective (ahem) 'pride', and realize that it's actually a good thing to standardize and be consistent with the rest of the scientific world in saying that yes, 1kB = 1000 bytes.

Failing the switch to a base number 16 system (which I think is an admirable goal for humanity, or maybe base 12), that's how it should stay.

You are obviously not a computer engineer. You try to do memory allocations in increments of base 10. Anyone who understands computers know that it isn't just "annoying", it is something that doesn't work correctly. Forcing base 10 onto a computer that works in base 2 is a logical fallacy.

What does the units you use in memory allocation have to do with how you define kilobyte? The computer doesn't care if you call 1024 bytes a kilobyte or a foomboozlebyte so what possible difference can it make? Nor does the computer care if what you call a kilobyte is 1000 bytes or 1024 bytes or 27 bytes.

Do you do malloc(1000) to get 1024 bytes allocated on your weird computer or something??? f not then how does 1 kilobyte == 1000 bytes stop you from allocating memory by powers of 2, surely your logic has to be doing that calculation already and really doesn't care what you call 2^10 bytes.

...and it's this exact denial of responsibility repeated loud enough for long enough that makes all us CS people act like Republicans whenever this subject comes up again. Why? See thread "Blame The Marketers".