Posted
by
CmdrTacoon Wednesday March 12, 2008 @08:44AM
from the less-than-eighty-percent-porn dept.

jcatcw writes "By 2011, there will be 1.8 zettabytes of electronic data stored in 20 quadrillion files, packets or other containers because of, among other things, the massive growth rate of social networks, and digital equipment such as cameras, cell phones and televisions, according to
a new study by IDC. Data is growing by a factor of 10 every five years. According to John Gantz, IDC's lead analyst, "at some point in the life of every file, or bit or packet, 85% of that information somewhere goes through a corporate computer, website, network or asset," meaning any given corporation becomes responsible for protecting large amounts of data that it and its customers may not have created. The study, which coincided with the launch of a "
digital footprint" calculator, also found that as the world changes over to digital televisions, analog sets and obsolete set-top boxes and DVDs "will be heaped on the waste piles, which will double by 2011.""

, and other usenet binaries, and the world's torrents.. all downloading through your ISP, which is a corporation. Anything on the internet comes through corporations- ISPs. How is that 85% figure surprising?

Some of the data transfers really seems wasteful. I download a Linux DVD ISO file, burn it onto a DVD, install the system on a new hard disk drive, then download another couple of Gigabytes of updates. Wouldn't be simpler to just have an installation DVD that creates a minimal system which then downloads the latest version of each module.

Most sane distributions let you do this if you want to (Ubuntu/Debian, Gentoo etc). Live CDs or full basic installation CDs are attractive for lots of reasons though (eg if you need a machine for a closed LAN environment and don't want to download the entire package repository)

# Upwards of 450,000 servers ranging from a 533 MHz Intel Celeron to a dual 1.4 GHz Intel Pentium III (as of 2005)
# One or more 80GB hard disks per server (2003)
So at least using these numbers, let's say on average they have 120gb per server (1 and a half, 80 GB drives...) That would mean they have 54,000 TBs or 54 PBs. I'm sure they have even more now, but as a point of reference! Yes, Google has a finite amount of space!

"Google has a finite amount of space!" Yes but there might be a million Googles.I think the article is bunk. Let's say we can store a bit in the space of just one atom. As storage grows at some point the entire Earth is covered 1 foot deep in atom sized bits. Then in 100 more years the stacked of bits reached past the orbit of the moon. This will never happen.

I think when people look back at the 21st century they will see it as a period when storage grew fast and then ettled down to a stedy state. Exp

Just a thought, but once quantum entanglement is more fully understood we could have data storage centers on the moon with instant retrieval! Hell, we could even eventually use the "moon atoms" to be a harddrive...

Please mod parent up. If I had a nickel for every person who spouted that same upscaled DVD tripe, then, then, then I'd have enough to buy a Blu Ray disk;)

There is a world of difference between 1080p and DVD quality - but you'll never see it if your TV can't natively display 1080p (or at least 720) or you use a composite video interconnect rather than HDMI/DVI or component (yes, I know, but you'd be surprised how many people still do...)

Whilst I can imagine that a true 1080p picture might look similar to upscaled DVD on a small screen (which necessarily has very small dot pitch), the difference becomes clear as you scale up the screen beyond 30 inches or so (and bleeding obvious once you get beyond 42"). Interpolation and post-processing can only get you so far. Notwithstanding CSI, even high-end upscaling cannot create genuine detail that didn't exist in the original image - and the more post-processing you do, the more artifacts you are going to see.

I've been running a Pioneer BR player via HDMI to a 1080p 60" plasma for 6 months and whilst upscaled DVD is nice, it can't hold a candle to the 1080 BR picture. Double blind test anyone on a similar system and there's no way you'd get anything but a 100% success rate of identifying HD BR vs upscaled DVD.

You're not quite putting it in the right terms for the slashdot audience. How about:When you download a 5Gb Blue-Ray Rip it will look much better than a 1Gb DVD rip if you play it on the right equipment. The right equipment being a display to do it justice, and mplayer to do the upscaling nicely:)

Seriously though, on reading your post I'm shocked by just how much hassle everything is using legal components. We got our TV cheaply as it wasn't "HD-Ready". Apart from the lack of sticker it does do 1280x1024 s

Yah, blu ray is just insane. Have fun with your giant TV- my montior is higher resolution, and my thinkpad is certainly cheaper than your plasma or LCD tv. HDMI cables are crazy expensive and you don't have the freedom to run them through a tivo or STB-- seriously, running everything through a nice set top box or media PC has been de facto since the VCR days, and you're just putting up with that freedom being taken away? Also TVs suck more power than overclocked nvidia cards so there's even more cost. Why w

Have fun with your giant TV- my montior is higher resolution, and my thinkpad is certainly cheaper than your plasma or LCD tv.

Try sitting across a 16' x 20'(or larger) living room from you monitor with a dozen of your friends sometime and then maybe an investment in a wall-mounted LCD or plasma screen might start to make more sense.

HDMI cables are crazy expensive and you don't have the freedom to run them through a tivo or STB-- seriously, running everything through a nice set top box or media PC has been

Maybe, but our tv isn't huge. It's only 28" so we got it for £500 a few years back ($1000?) It's comfortable for us to watch on the couch, but it's just as convenient for us to curl up in bed and watch movies on my macbook. The guy who replied to you has a point about tv sizes and groups of people, but for day to day stuff we don't need a monster 42" tv. For when we are watching movies with friends it would make sense to get a giant screen just for those occasions - projectors are getting much cheaper

I have the dozen-or-so friends over once or twice a week, sometimes more. Nice, because I still get to hang out with all of them and don't have to get a babysitter. I provide nice TV viewing/their cable tv fix in exchange for company and the food they bring over. Everyone wins!

You've got a point about the projector, but it wouldn't work in my place. I don't have a good place to put it where it would still project onto a suitable wall without doing some serious remodeling. Even it I did, I have TV par

If you're buying your cables from Best Buy or Radio Shack, no matter what it is it's crazy expensive because you're getting ripped off.Take a look at monoprice or Blue Jeans Cables. Both are highly regarded on AVS Forum, SA's A/V Arena, and other large home theater forums while charging prices that have a lot more to do with reality. Last time I checked, a 25 foot HDMI cable at Best Buy was in the $200 range. The same length can be had for between $25 and $75 from Blue Jeans or $15 to $50 from monoprice.

Two replies, pretty unusual! Your sig is excellent. When you say paraphrased did Godel say something quite similar (which I couldn't find on Google), or do you literally mean that it paraphrases Godel's work. I'm just curious as it's a very cool quote and I was wondering whether I should attribute it to Godel, or to you.

but you'll never see it if your TV can't natively display 1080p (or at least 720)

Having experienced both, I'd still pick upscaled DVD on a well calibrated, high quality 720p or 1080i TV than BluRay on the majority of 1080p TVs as they come out of the box.

Yes, extra resolution is a wonderful thing. IF you can see it.

Lousy upscaled DVD to lousy 1080p gives you lots more lousy pixels and a nice, reassuring feeling. Look how sharp the artificial edges of the overblown sharpening settings are now! Look how you can really get a sense of the edge of the large area that's lost in the shadows.

Some early Blu-Ray players are incapable of playing the latest discs because of DRM. Plenty of the first HDTVs will force your overpriced HD content to be downscaled to SD because they don't support HDCP, as soon as they start using ICT.

I'd say DRM matters, no matter whether you plan to copy discs or not. Probably more so than to the pirates, as usual.

I would have said that about DVDs not so long ago. Disk space and bandwidth become cheaper with time.

And besides copying, a DRM crack allows me to play discs on the operating system of my choice, to extract small parts of the feature for purposes of review, criticism or parody, and to bypass any annoying previews, trailers, propaganda, threats, or other junk that the studio may have seen fit to prepend to the show.

Blu-ray has horrifying DRM and doesn't really look that much better than DVDs with good postprocessing

You are talking out of your ill-informed inexperienced ass. There is a high degree of probability that you haven't actually seen a hi-def video on a hi-def TV but let's examine your assertion anyway.

You are saying that there is not much difference between 1920x1080p and a 720x480i picture. Think about it. I'm interested to know more about this "good postprocessing" that can somehow make DVD even approach

Were you planning on storing all your blu-ray movies on a file server at up to 50GB a pop (I have a 500GB NAS box at home and I think that would struggle to contain all my DVDs - which include a few TV series' - even if they were compressed)? It's not like the DRM isn't easily cracked anyway, what are you complaining about? Plus, how exactly does upscaling a picture compare with actual extra resolution? I've yet to buy my PS3 and try out the upscaling of course, but between an anti-aliased&sharpened/wh

If, like the summary (but not the article for some reason) states, total data is growing by a factor of 10 every 5 years, then somewhere around the year 2300 we'll have 10^80 bits stored. The number of elementary particles in the known universe is estimated to be between 10^79 and 10^81. Seems we're kind of screwed at that point.

I know you meant this to be a joke, and you got the +5 Funny mod to prove it, but I'd like to share that it really doesn't work like that.The representation of information doesn't use any matter whatsoever. The concrete representation may be any reliable organization of the matter it relies on, to as many possible permutations as that matter is capable of being configured into. It should be obvious that you can represent 10^80 things without 10^80 elementary particles.. for example, you just did, when you

85% of that information somewhere goes through a corporate computer, website, network or asset

That's all? I mean, a good deal will be created by corporations in the first place, all the major bits of internet infrastructure belong to one corporation (for-profit or not) or another, the post office is a corporation... 85% seems low, actually.

I don't know about that. Imagine all of the digital pictures taken that never travel outside the home user's computer, memory card or CDs. Even more important, consider the amount of digital video data generated by home users with their camcorders. A single 60 minute Mini-DV tape is in the neighborhood of 15 GB. That's one single tape, and my family alone has dozens of them just from a single year. Even if those videos are uploaded to the internet, they must first be converted to some other format that

no no no, the proper term for journalists to use is library of congresses. Even though I've never been to the library of congress and have no idea how big or small it might be, large amounts of data should always be given in those units.

At the risk of being modded down, isn't that distinction the whole point of the IEC's "zebibyte" proposal?Anyway, most measurements of mass storage (bandwidth quotas, hard disk capacity etc) seem to measured in actual megabytes (MB), gigabytes (GB) etc, as opposed to binary megabytes (MiB), binary gigabytes (GiB) and so on. Binary byte prefixes only seem to be used for RAM and flash these days, presumably because of the convenient manufacturing realities involved - and I really wish that manufacturers of th

In theory, yes. In practice, the whole Zebibyte thing is complete nonsense. Everyone other than hard drive manufacturers has been using the SI prefixes to refer to power of two quantities when referring to binary data for 40 years. Attempting to redefine them retroactively just causes confusion. If I see something that says KB, and don't know when it was written, I have no idea if it pre or post-dates the KiB nonsense and so I have no idea if it refers to 1024 or 1000 bytes.

Everyone other than hard drive manufacturers has been using the SI prefixes to refer to power of two quantities when referring to binary data for 40 years. Attempting to redefine them retroactively just causes confusion.

No, the confusion is cause by using a pseudo-binary based number system in a world where almost everything else is decimal.

Quick question: You have a 2000 MiB video file and a 2470 MiB video file. Will they both fit on a 4.37 GiB DVD? Now you need your calculator.

It's much easier to figure out if a 2097 MB and a 2590 MB file fit on a 4.7 GB disk. You can do that in your head.

I've been burned numerous times by programs ambiguously reporting sizes in KiB and MiB causing me to run out of space on something that I'm trying to fill. All storage sizes should always be reported in decimal numbers. If RAM manufacturers want to keep using powers of two due to the implementation detail of how their chips are constructed, they should *always* use KiB, MiB and GiB.

No, the confusion is cause by using a pseudo-binary based number system in a world where almost everything else is decimal.

No, the confusion is caused by insisting on using base ten for a system that doesn't use it.

That fact that you may be "more comfortable" with base ten is irrelevant. In fact, it's little different than those accustomed to Imperial measurements habitually recalculating metric measurements in their head, and then exclaiming, "The metric system is too complicated and too much work."

It is not. RAM is the only quantity in computers commonly measured in binary. Hard drives have always been in decimal. Floppies have always been in an even more stupid system where "MB" == 1000*1024. Clock speeds have always been decimal.

Going farther, measuring IO or network performance, to cite two trivial examples, or understanding any of those subjects in general, you're binary to binary.

You appear to have been bambooozled yourself by the confusion caused by this issue. I/O speed of buses is always decimal because it derives from MHz and GHz, which are decimal. Network bandwidth is more often measured in decimal megabits, not binary.

No, the confusion is caused by insisting on using base ten for a system that doesn't use it.

No, the confusion is caused by insisting on using an SI-prefix that has meant exactly 1000 since 1795 to now mean something else. Hence the new 'kibi' instead of 'kilo'.

Going farther, measuring IO or network performance, to cite two trivial examples, or understanding any of those subjects in general, you're binary to binary.

Interesting that you mention these two examples, since they use base 10 as is proper. 1 Gbit/s means 1,000,000,000 bits/s. From the all-knowing wikipedia:

The megabit is most commonly used when referring to data transfer rates in network speeds, e.g. a 100 Mbit/s (megabit per second) Fast Ethernet connection. In this context, like elsewhere in telecommunications, it always equals 10^6 bits.

You seem to think that only RAM manufacturers would ever need to care about 2^10 units, but it's just not true. Caches, memory pages, graphics pipelines, textures, bus widths, buffers, in short almost everything turns to multiples of two when you come close to the hardware. Sure, we can list storage sizes in 10^3 notation but then surely some guy will get confused why his 64k buffer doesn't hold 64000 bytes. Yes, I'm in favor of consistently using MB and MiB for the sake of resolving this clusterfuck, I'm j

In theory, yes. In practice, the whole Zebibyte thing is complete nonsense.

It's actually very simple. While many Americans have issues with units of measurements (no surprise), in most countries SI is the *law* and there is no ambiguity.Just remember how "Quarter Pounder" is illegal as a commercial item in Germany. There is SI, and everything else is *illegal* to use in commerce.

Could everyone please stop posting stories about how much data there will be saved on earth in such-and-such a year? Firstly, it's pure speculation/estimation, secondly, who really cares? Most of it is cached google pages and pron anyway...

secondly, who really cares? Most of it is cached google pages and pron anyway...

That's why/.ers care.

But actually, no. We're very close already to being able to generate pron on demand without involving any principle photography. You won't even need to say what you want, that will be ascertained on the fly by neuro-cranial-bio-feedback.

After enough of the male population has been brain mapped, it will probably turn out like spam: there's only so many unique permutations, as long as the scene is dressed up a little differently from time to time to maintain the novelty factor.

Pron seems to be a lot like Big Bertha, where each mortar round was larger than the last, to accommodate progressive barrel enlargement. Eventually the images become extremely shocking to get any response at all.

The future of compression is not to send the picture itself, but the reduced specification for an image that produces the same effect on the human visual system. We're already doing this with psycho-acoustic encoding.

Once we have a sufficiently sophisticated model of human sensory perception, mental and emotional responses (which will run to TBs I'm sure), we can run a competition for the best feature movie encoded in under 4KB. Mostly it would describe desired emotional responses and cognitive states, the actual images would be back-generated to achieve this effect as determined by the human perceptual model.

I was wondering if they weren't a bit wrong in their calculations. A Zettabyte is 1 Million Petabytes. Knowing that where I work has about 2 petabytes in a few SAN's and there are 1000's of larger institutions and millions that are smaller (that store in the terabytes range) around the world. The place I worked before had about a half a petabyte just in tape backups for credit card and other transactions, catalog and pricing information, images etc. and that was just an average clothing company, hardly rivaling JCPenney or Macy's. I'm also thinking about Wal-Mart with millions of products and thousands of stores. And we're just talking about SAN's here mainly in the US, not including desktops, laptops, camera's, personal information, Google.

On another note, how much does a zettabyte actually yield these days, drive manufacturers might just give you 700 Petabytes for it. Oblig. XKCD: http://xkcd.org/394/ [xkcd.org]

1 zettabyte is too little of an estimate. If as I said, there are about 1000 institutions the size of mine (there are in the US alone) that would take 1-5 exabytes. If there are about 1,000,000 institutions worldwide that are a bit smaller (there are a lot more hospitals, schools, research facilities and government agencies than that I think) they can take up another 1 exabyte all combined and that's just if they average a 1TB SAN per institution.If there are about a thousand companies equal to the company

1,800 exabytes of raw data. Anyone like to guess how much of this will be useful data! Judging by some system specifications I have read 5% is being generous. A twenty page specification can be condensed to a single page of useful information, and over half is "boilerplate" disclaimers, etc. which are the same in all the company's specifications.

I'm not saying that formatting data is entirely without worth, but there's definitely some improvements to be had WRT efficiency.

This is just a general observation. When did the use of "wrt" take off? I seem to see it everywhere. Wouldn't it be more efficient to just say "there are definitely some efficiency improvements to be had" instead of "there's definitely some improvements to be had WRT efficiency"?

The interesting thing here is the part about data being relayed through third parties and the issues involved. As for the data figures themselves, those are pretty misleading because data does not equal useful information. There is far less useful information in an MS Word file than 100Kb or whatever, for example, so these zetabyte figures bandied about aren't terribly meaningful other than to draw attention to the infrastructure needed to support digital data relaying. To see my point, turn things upside down: there is vastly more data stored on an LP record or celluloid film than on a CD or digital photograph. But is that data useful information? Only a few audiophiles and filmophiles would argue that there is.

Yes, there is a lot of data in the world. But is there really that much more information out there? A zillion copies of the same song just means more data, not more information.

Exactly. In fact, as time goes on, it will be -easier- to store information, as data storage capabilities grow faster than our information creation capabilities, and our population (ie: lets say every human on the planet walks around with as many HD cameras as they can carry, recording everything in their lives... the population growth will still make it a manageable amount of information long term).There's also a limit on how much information can be consumed per person (or searched, etc.,---beyond a certai

don't worry, we can mine landfills and recycle the plastic out of them at some point. After all, the plastic isn't going anywhere, and we're only going to get more technologically advanced, so at *some* point, surely this will make sense!

or we're one big EMP pulse away from losing almost 2 zettabytes of data.

and technicallly, there is only one SI definition of zettabyte [wikipedia.org], which is 10^21. The binary definitions used by the IEC like the zettabyte=2^70 are being renamed to avoid ambiguity (proposed to be zebibyte for zeta binary byte).