Posted
by
Zonk
on Thursday December 27, 2007 @06:40PM
from the we-all-like-storing-data dept.

Lucas123 writes "IDC just released its predictions for 2008 with regards to data storage trends. Its research shows, among other things, a greater adoption of online backup and archiving services, the 'prevalent' use of full-disk encryption in the data center, and mainstream adoption of solid-state disk drives due to falling prices. From the story: 'There are very simple situations and application scenarios where solid-state disks will be worth the risk. It does promise some great potential benefit in terms of I/O ... [and] solid state will make a significant impact on reducing heat from spindle usage in server blade deployments and to boost functionality in mobile devices.' According to IDC, storage capacity is exploding at a rate of almost 60% per year."

I imagine that full-disk encryption for datacenters is a while off as any drop in I/O and throughput will be a non-starter for the already tasked drives. IMHO full-disk encryption isn't necessary as long as the datacenter is physically secured, just that all off site backups be encrypted. Anytime data leaves the datacenter it should be encrypted, but encrypting local storage only matters if you fear someone breaking in physically (encrypted disks won't help when broken in through a network as the computer will decrypt the data for the intruder) or you are selling the disks on eBay afterwards.

I'm more interested in the de-duplication deal.Anyone know of a good home server that is client OS agnostic that can do this? We use Connected Net Backup at my work, but it's a bit pricey for my home stuff.-nB

While datacenters may be physically secured, they are also sometimes broken into. The last thing a company wants is to have personal information lost because a server was stolen. It may depend on what law or regulations are put in place to provide for data security compliance, and it may depend on what type of data the datacenter holds. I can sure see banks, insurance companies, or any company with a large amount of employee data, wanted to have that data encrypted at all times.

The last thing a company wants is to have personal information lost because a server was stolen.

Why bother breaking into a server facility -- which typically have several hard-to-circumvent layers of physical security -- when some dumbass C[EFT]O is going to leave a notebook PC full of unencrypted business intelligence on the passenger seat of his Acura?

By the time somebody responds to the OnStar alarm, the window's already smashed and 10 million customer records compromised.

Disks fail. With encryption, I can return them for a credit or throw them in the trash. Without, I have to worry about data security. The thing holding it back is not performance, it is key management.

A disk failing doesn't get me fired, but losing a key when the data is perfectly OK, sitting right there and now forever inaccessible will.

I already know some people using the Amazon data cloud [amazon.com] technology and I suspect that will increase. I'm a bit leery of putting my data in the hands of Amazon, who have essentially stated before that they will never delete anything they know about you. Probably doesn't exactly apply to this service, or does it?

For the most part, storing personal or sensitive data on Amazon S3 (like backups - see duplicity [nongnu.org]) should go hand in hand with encryption (GPG etc). I carry my laptop in my bag to work, and really do think that the data on that stands much more chance of being nicked than the encrypted data I have on S3.

Use an Amazon S3 backup tool with built-in encryption like Jungle Disk [jungledisk.com] and you won't need to worry. The fact that you can even use 3rd party tools says a lot more about Amazon's approach compared to other "cloud" storage providers.

According to IDC, storage capacity is exploding at a rate of almost 60% per year."

No, you've got it backwards -- since only 40% of our storage capacity will be unexploded at the end of next year, we'll need tubes only 0.4 of the size of the current tubes. In 2010, we'll only need tubes 0.064 the size of the current tubes. See where this is headed?

In some 15 years and change, we'll only need microtubes.

In just 23 years, we'll need nanotubes. Let's just hope no one tries to send anything bigger than a picotruck down them.

Or, the RIAA, MPAA et al actually succeed in their worldwide legal battles, thus without mountains of music and films to consume, home users' data storage use plummets and the floppy disk becomes the dominant format once more. The world begins to use floppy-based Linux distributions (because Vista takes too many disk swaps to install) and thus everyone enjoys a renaissance of console-based system rescue distros, streaming everything they might want through a lynx port of Gnash. Gradually, as more and more f

This article along with all of those who have something to say about backups should be modded "Redundant". After all, what good is a backup solution without redundancy?

That whole article sucked.

1) Says absolutely nothing that hasn't been true for over 30+ years.
2) Did this come from a random word generator?
3) Object based storage systems, maybe given enough time but 2008 isn't going to be magical.
4) Yep, we will see very high end $$$ laptops use solid state, but given the cost, current densities and M

1) Think if your organization has 5000 desktops and each has a spare 100GB that is 50TB of backup storage that is not used. 2008 will be the year we will serious start to look at distributed disk to disk backups.

Add parity and/or redundancy, and consider it a Guinness commercial.There's really little reason you couldn't load some clustering, redundant filesystem on all of your desktops. Using Linux (and probably some of the BSDs) it'd be pretty easy. Something like AFS or GFS with enough nodes wouldn't even need to be backed up explicitly if you had multiple office sites and configured your redundancies carefully.

Of course, you'd have to make sure your distributed data is only accessible to the proper people in you

A file system that is like raid 5, but double writes each entry to different networks for added redundancy so if one dies another picks up. Add the sophistication in the background to dynamically repair for missing nodes and volume segments.

Redundancy isn't the problem. Mirroring writes of something that overwrites good data with bad data is a poor strategy.Recovery is the problem. When you accidentally delete a file, save bad data on top of an existing file, or a bug or hardware crash strikes and messes important data up, you want to be able to undo that easily.

That is, while your data-scatter idea is fine, the data-gather part needs to work when the user decides the existing version of data is bad, and that a previous version might be go

1. They already exist, but for about $4000 for example here [buy.com]
2. On board RAM cache - it's called Intel Turbo Memory, it's cheap and it's been availabe on laptops for several months now and will soon be on the desktop also. Coupled with Vista readyboost it will do what you want it to, or it can also serve as a high speed flash RAM drive on which you can install frequently used apps or files.
3. They have them in 2GB also.
For the rest, they already have 32GB Flash for a reasonable price (around $300) if you make the comparison to RAM rather than spinning platters.

Problem with RAM is that it's volatile and you'd be screwed if power went out while writing back to that cache. Intel Turbo Memory uses an internal PCI Xpress slot as it's interface, and employs high speed flash memory. While not as fast as RAM memory, at least you wouldn't have to keep a battery in it to power it for long enough to write the entire contents of a RAM cache back to memory. Besides, if you want a RAM cache, isn't that what the OS does already with RAM? If you want control over what goes into your RAM cache, there are a number of softwares which will create a RAM drive, which you can then load with the data you choose at system startup.

my fourth bullet point was that i was willing to pay for a battery, which shouldn't be a big deal since it only has to last long enough to finish write back.your point that the OS should be doing the caching is a very good one. what started me on this quest was that there is a certain OS and OS-supplied service that my employer uses that isn't very good at keeping files cached in RAM. it seems to prefer to let the thread pool fill up all available space in a few minutes, reserving only a small fixed amoun

Doesn't have the ram, but then given its performance figures you shouldn't care (and if you do, let's not forget you're asking it to do what your OS already does). Same goes for your write-back: at 600MB/s, why?

Thats the reason this isn't likely to be widely sucessful: Hard-drives can be had for under 0.30 USD per GB. Lets not forget what R.A.I.D. means: Redundant Array of Inexpensive Drives. 'Redundant array' being important, but 'Inexpensive' being crucial. The purpose of a R.A.I.D. is to achieve performance of expensive things like this but without the expense.

14 15K FC won't give you this level of performance, unless you go into RAID-0. I'll discount RAID-0, because it's almost never used in real deployments.I feel certain that this class of device will appear, and quite soon, in enterprise storage solutions where it will be used as a persistent backing store (cache) in the very RAID arrays that you are talking about. This isn't just guesswork; my position in industry is such that enterprise storage vendors do backflips in order to show me their developing produ

You're missing my point, and casting wild aspersions to boot.
My point was simply: I doubt these will become popular as longterm/mass storage devices. You also assumed my mention of RAID was to say "Just put in a RAID-0, and it'll solve everything." This is nowhere near what I meant to imply. I was using the example of RAID (big, fast, and cheap) to show that you can combine a number of smaller, slower, cheaper disks into one large, fast volume. It is similar to what Cray did on one of his early supercompu

And if I had a choice between say, one of the new 64GB SATA flash drives for $50 USD, or a 500GB 7200rpm Seagate SATA HDD for free, I'd go for the Seagate.

There are many buyers who are not like you. The issue is that to many buyers, both $50 and $0 are "free". There is a price point threshold below which the cost is a non expense. This is particularly true in enterprise purchasing situations, where the processing of the paperwork to merely buy and item is hundreds of dollars. While that only assesses the im

There are many buyers who are not like you. The issue is that to corporate many buyers, both $50 and $0 are "free".

Fixed it for you...

The reason I would choose a slower 500GB drive for 0 USD over a faster 64GB drive for 50 USD is that I place capacity over speed. With the 64GB drive, while I would (in theory) have improved disk access times, which would result in better performance in software (read: Games) The reduced load times and fractionally higher FPS would be outweighed by the fact that I would have

Re your "correction": there are plenty of/consumer/ buyers that are insensitive to price below a certain price point, and for some of them, that price point is well over $50 when it comes to things like consumer electronics. I'd say my own personal wallet would open up for a card of that capability, were it available today, in the $300 range. Honestly.I see what you're saying about sensitivity to data locality. While there is unfortunately as of yet a solution for this, what's wanting here is "transparent

What I was trying to get across with my correction was that to most consumers, while they have a 'price range' in the sense you suggest, that below the upper limit of that range, the judge based on factors OTHER than price; So because I have need of mass storage, I rate a device with storage capacity higher by an order of magnitude higher than a device with marginally faster IO speed. I recognize that not all people have the same constraints I do, but I feel confident saying that most people would make the

I, too, am annoyed by their minimum size. My guess is that the size they chose has something to do with the base cost of the other circuitry (RAID like hardware) they have present. With smaller sizes, the cost per GB will start to look particularly bad, so they upped the minimum size and are going after premium buyers for now.BTW, for reference, I have a Dell PERC5e controller and 10 10K 300GB SAS drives. Configured in RAID-5, these drives manage to sustain just over 200MB/s on read. If the device performs

I don't have time to post a full reply, but from what I've heard Dell's PERC5 cards SUCK.
I've heard that Dell takes a perfectly good card, and removed just about every usefull feature, which requires that you use the drivers provided by Dell, instead of those provided by the card maker....

Except for laptops. Especially those that belong to governments and corporations. But do agree with the datacenter, it is useless in a secured area. The IDC serves up a poorly thought out storage trends should be the title.

Actually, if what another poster says is true, in that this is just a veiled attempt to make the technologies companies want to sell poplular, then FDE is a good thing. This is because a company can charge MUCH more to recover data from an encrypted disk than from a non-encrypted disk.

This assumes that the 'environmental cost' of continuing to operate obscelete technology is less than the 'environment cost' of upgrading to more efficient technology. This is not always the case; Imagine adding capacity to a PDP-11 to 'keep it modern.' The cost of powering the equipment more than makes up for any possible environmental ills.
So basically what they are saying is that next year people are goin

9. I agree with you - the cost of powering old equipment is going to be the driving force behind hardware upgrades in the next 2 years, not the requirement for more speed and capacity. I don't think people have been upgrading their systems a little bit at a time since the sub-$1000 computer became mainstream. The only systems that are going to be upgraded that way are the systems that are designed for expansion, like servers that are designed for storage expansion or blade-type expansion.

10. I don't think they mean skimping on data backups, they mean de-duplication of unnecessary hardware and not necessarily data backups. For instance not having 2TB of storage on a server when it is only using 100GB - use thin provisioning to give that server access to a dynamic storage volume that gives it only the space it needs. Cut down on duplicate hardware that handles things like backup AD controllers, data backup, etc. and put those tasks on virtual servers. Virtualize your tape libraries with an offsite hard disk backup array. All these lessen the power footprint of your datacenter without lessening the redundancy of your critical data backups.

I don't think people have been upgrading their systems a little bit at a time since the sub-$1000 computer became mainstream.

Are you talking about data centers doing piece meal upgrading or, like you said, people. Because if you honestly think people are just buying new sub 1k systems instead of incremental upgrades... Well let's just say I'd like to see the sub division you live in!

10: Data dedup -> means single-instance storage. That powerpoint you sent around about the companies revenue results for 2007. Instead of 200 copies on the network, only ONE is stored. Or for backups, instead of backing up 200 copies of the same Windows Server 2003 installation, only one is stored to tape. The saving can prove to be immense.Some products even posit to do block-level changes, so if one page of a word document changes, then only those blocks that changed will be copied. Products from

Indeed. We use the Avamar backup software from (now) EMC that does the block level deduplication in software running on the client. It really does find just the modified chunks of a file to backup. It's amazing stuff. It makes remote backups and replication over modest bandwidth WANs really painless.

This is not always the case; Imagine adding capacity to a PDP-11 to 'keep it modern.' The cost of powering the equipment more than makes up for any possible environmental ills.

When DEC introduced their 'PDP-11/70 on a board' they pretty much obsoleted their existing PDP-11 line. We did a quick analysis and realized that the reduction in power costs from having to power a single board vs. many boards in a cabinet would pay for the upgrade in less than a year.

I can not understand why massive optical writable storage has not been introduced at reasonable prices. Some solutions are born almost outdated: 25GB for a single sided Blu-Ray disk it is far from meeting mid term so-ho necessities. In my opinion, it is a necessity to push for a 100GB multilayer writable optical media, to cover the next 4-year home and small business backup and data distribution necessities.

I'm not sure I agree with your proposal, but I definately don't agree with the storage capacity you mention. The issue is that developing technology takes time. What you propose is like planning a new highway for today's needs without realizing that by the time you actually complete construction you still don't have enough capacity.What you need to do is say "how much will I need in five years?" and then build that. That said, if the purpose is long-term archival backup of hard-drives, anything smaller than

I can not understand why massive optical writable storage has not been introduced at reasonable prices. Some solutions are born almost outdated: 25GB for a single sided Blu-Ray disk it is far from meeting mid term so-ho necessities. In my opinion, it is a necessity to push for a 100GB multilayer writable optical media, to cover the next 4-year home and small business backup and data distribution necessities.

They're finicky. There's too many formats. Not everyone has the same tape drive (and very few folks even have one in the first place). The drives are expensive and the tapes are no bargain either. And going hand-in-hand with the "nobody has one" is the issue that if your tape drive dies a few years down the road, you may be SOL at getting data back off of it if you picked the wrong brand.

Then there's the whole access time issue and tapes that only last a few times befo

IDC just released its predictions for 2008 with regards to data storage trends. Its research shows...

If you've ever been involved in an IDC, Gartner or whatever marketing discussion, you know that the "research" mainly consists of going from vendor to vendor (data storage vendors in this case) and asking what, in their wildest dreams, would the ideal demand curve look like. Then they charge for actually coming up with some supporting information to meet the vendors' preferred conclusion, and release the whole thing to consumers in the hopes of stimulating some demand for the paying vendors. Very scientific.

Stimulating the market is really not how it works at Gartner. There is an element of consumer driven data in the predictions. Not all the predictions turn out to be accurate, but they've been at it for over a decade and have an impressive history to help you calibrate the quality of their market projections.

I've written a wonderful (in my opinion, anyway) plugin [libpipe.com] for Sybase's backup-server. It allows one to (among other things) send the dumps over to the outside backup-providers immediately — without waiting for the dump to complete. One can also do on-the-fly encryption and not worry about the unencrypted data remaining on disk. Etc, etc.

The price is low (compared to the cost of even a single Sybase installation) and yet I sold less than a handful of licenses in 8 months, plus a few given away to quali [libpipe.com]

In 2008 some twit with a soapbox (magazine column, TV show, whatever) will lose 3TB or more in a single failure and rant about how digital is so much worse than analogue. I bet he'll mention Laserdiscs in there somewhere and possibly The Domesday Book if he's from the UK.

The Macintosh uses an experimental pointing device called a "mouse". There is no evidence that people want to use these things.
- John C. Dvorak, SF Examiner, Feb. 1984

When I hit Ctrl-Alt-Delete, I see that the System Idle Process is hogging all the resources and chewing up 95 percent of the processor's cycles. Doing what? Doing nothing?
(http://www.pcmag.com/article2/0,4149,1334678,00.asp)
- John C. Dvorak, PC mag, 29th Sept, 2003

In limited cases will "5. Virtual servers will become an ideal conduit for iSCSI." Virtual host servers with a reasonable consolidation ratio of production, enterprise servers may stress 1Gb/s iSCSI. A SAN with both fibre channel and iSCSI capability is great to leverage iSCSI for *non-virtual* and/or test/dev servers to connect cost-effectively, but in my TCO calculations 4Gb/s fibre channel is a better choice for production virtual host servers. Once 10Gb/s iSCSI becomes less expensive and available in a