a new eco-friendly, paper-based storage system capable of compacting 90 to 450GB of data on a single disk. the "Rainbow Versatile Disc" (RVD) uses geometric shapes such as circles, squares & triangles (instead of 0's & 1's) combined with various colours to preserve the data in images. as a result, it is claimed that 432 pages of content can be compacted onto a 4-inch-square piece of paper.

Since you point to two discussions that go to great lengths picking the claims apart, maybe a slightly more critical tone would be good for the summary. Any CS undergrad should be able to do the math to figure out that this is just not possible. Also, if you look at the photo, you can see a Windows start bar and window frame on the picture. So the guy is pointing at a projection, not a printout, and is probably looking at some genetic visualization or some other complex data analysis.

Somebody apparently found the picture and decided to have some fun figuring out a new caption. It's a bit sad to see how much time is wasted discussing such an obvious hoax.

Your claim that "Any CS undergrad should be able to do the math to figure out that this is just not possible" is patently false.

While this may be a hoax, it absolutely is not impossible.

And I say this as a computer science graduate (Virginia Tech, 1997).

One pixel does not have to equal one bit. One pixel can be more than one bit. One merely needs to look at a ZIP file to realize this -- when a zip of a TXT file is 10% of the original size, each 0 or 1 that comprises the stream is actually representative of 10 bits.

It's not 1 to 1. If all bits were 1 to 1, there would be no such thing as compression.

Now, it would take a long study to find out if this is a hoax; but saying it's impossible is just plain wrong.

It doesn't take long at all. You can very quickly get an estimate of how much information you can store on a sheet of paper with current technology. You will then realize that this is several orders of magnitude below what this guy claims, so it is obvious that it is not possible. Take a look at the links above, and you will see that all you need is some very basic math and a few numbers.

And btw, this is not about compression, he claims to be able to store the information. And even if it was, multiply your numbers by ten and you will still be several orders of magnitude shy of the claim.

clintjcl I have a penny that holds many terabytes of data with compression. Here is the decompression algorithm: If heads Google’s database if tails Yahoo’s.

Wed 29 Nov 2006 at 3:30 AM

smc

btw it is disappointing to see this rubbish posted to a site whose name begins with ‘information’.

Wed 29 Nov 2006 at 3:36 AM

smc

Mr. Kosara is correct. ClintJCL, there are well-defined means of determining how much lossless compression a piece of data can undergo. If this new system can beat that, it's going to overturn a lot of what we know about information theory. But, of course, it can't, and it won't (cue ascientific enthusiasts insisting that iconoclasts' occasional success at breaking new ground means we have to listen to any unserious crackpot).

Anyway, this isn't about compression -- we know how to do compression. This is reputed to be a storage system. But there's no way that it can deliver the benefits it claims. I have a longer explanation here.

‘There are well-defined means of determining how much lossless compression a piece of data can undergo’
No there isn’t, that doesn’t even make sense.

Wed 29 Nov 2006 at 4:16 AM

smc

Yes it does, smc! ;) Check out information entropy and basic information theory. You can't encode information in fewer bits than its entropy (which is commonly measured in bits), that is the theoretical limit. Most compression algorithms don't even come close to that, and they certainly can't exceed it. But that's not even the topic here!

The entropy isn’t for a ‘piece of data’ which I took to mean a particular bit string but for a set of bit strings. The canonical example being a the set of all bit strings which can comprise a message to be sent over a wire (if any bit string could comprise a message the messages couldn’t be compressed at all). To give Tom the benefit of the doubt I'll assume he meant a piece of data as it pertains to the set of possible pieces of data it was drawn from. But that makes little sense when discussing a general purpose storage device.

Wed 29 Nov 2006 at 4:42 AM

smc

You can do this both for a particular file and for a class of data. For a file, you just count the number of occurrences of each byte, and apply the formula. That will give you an upper bound, since knowing more about the data will let you apply a better metric (and that will yield a lower number of bits). But it's a start.

Replace the talk of sets in my above comment with probability distributions and such if you like but my point will still hold.

Wed 29 Nov 2006 at 4:58 AM

smc

For a single file you can invent the compression algorithm: that file -> a zero, another file -> who cares. If you want to measure the information content of a single file you’ll have to go into Kolmogorov complexity, which is a bit iffy if you ask me :)

Wed 29 Nov 2006 at 5:03 AM

smc

And not pertinent to the current discussion. Sorry for my broken commenting.

Wed 29 Nov 2006 at 5:09 AM

smc

smc, this "rubbish" story was posted because of the interesting concept of mapping huge amounts of data into geometric shapes. I find the concept of using abstract visual means as a storage medium quite intriguing.

infosthetics, Perhaps my disappointment stems from my tacit assumption that this site is more nerdy (like me), perhaps it is more arty (I struggled to phrase what I meant here). I must admit I have been unimpressed by some recent post which feature visualisations that seem to in no way aid in discerning properties of the data presented but instead are merely pretty. However since you run a site about _information_ aesthetics I suggest you learn at least the rudiments of the subject. Shannon’s landmark paper ‘A Mathematical Theory of Communication’ is available online as well as many introductions. I do find many of the posts here intriguing so you have my thanks.

Wed 29 Nov 2006 at 5:56 AM

smc

It should be noted that a pixel is not a binary unit. A pixel is a visual representation of information. It can hold all manner of information. It has a color, a location, a brightness, etc.

The storing of data and the visual representation of data are two very very different things. They should never be discussed as one in the same.

Wed 29 Nov 2006 at 6:33 AM

krees

krees, I don’t understand your point. Which data are considered stored
obviously depends on how the storage medium is read. If the data can be seen then we also have a visual representation. Pixels may not be binary units but they carry a definite amount of information relative to a reading device.

Wed 29 Nov 2006 at 7:15 AM

smc

infosthetics, I realise this is just a blog and not a magazine that I'm paying for or anything so the lack of ‘quality control’ so to speak is neither surprising nor any skin off my nose. Pretty stupid to be disappointed really. I guess I just expected these claims to raise alarm bells for anyone who concerns themselves with information. Sorry for my earlier curt comment and thanks once again.

Wed 29 Nov 2006 at 7:51 AM

smc

smc, I believe this blog is about 'infoaesthetics' as defined by Manovich, not by Shannon's information theory (although it is slightly related).

http://infosthetics.com/about.html

Wed 29 Nov 2006 at 5:54 PM

andrea

Andrea, I have never heard of Manovich so I’ll be reading up on him. Thanks for the tip.