Tuesday, October 30, 2012

My co-author Mema Roussopoulos pointed me to an Extremetech report on Harvard team's success in storing 70 billion copies of a 700KB file in about 60 grams of DNA:

Scientists have been eyeing up DNA as a potential storage medium
for a long time, for three very good reasons: It’s incredibly dense
(you can store one bit per base, and a base is only a few atoms large);
it’s volumetric (beaker) rather than planar (hard disk); and it’s
incredibly stable — where other bleeding-edge storage mediums need to be
kept in sub-zero vacuums, DNA can survive for hundreds of thousands of
years in a box in your garage.

I believe that DNA is a better long-term medium than, for example, the "stone" DVDs announced last year. The reason is that "lots of copies keep stuff safe", and with DNA it is easy to make lots of copies. The Harvard team made about 70 billion copies, which is a lot by anyone's standards. Their paper in Science is here.

However DNA, like all "archival" media, poses a system-level problem. Below the fold I discuss the details.

Tuesday, October 23, 2012

Have hard disk drives peaked? We don’t think the evidence supports this
yet. There is just too much digital content to be stored and more HDDs
may be required than in prior years to keep this growing content
library. While the market for regular computers has certainly been
impacted by smartphones, tablets and other thin clients that are served
by massive data centers through the cloud there is still a market for
faster personal computers using flash memory (especially in combination
with HDDs). Thus we don’t think that HDDs have peaked but instead that
they could experience significant annual growth in a stronger economy.

It sparked a spirited discussion on a storage experts mail list but, as with the dinner Dr. Pangloss attended, the underlying assumption was that the demand for storage is inelastic; people will just pay whatever it takes to buy the 60% more storage for this year's bytes as compared to last year's.

Coughlin is talking about unit shipments, and he estimates:

overall shipments of HDDs in 2012 will be about 592 M (down about 5.2%
from 2011). This estimate for 2012, combined with the drop in 2011
means that HDDs have experienced two consecutive years of shipment
decline,

Summarizing the discussion, there appear to be six segments of the hard disk business, of which four are definitely in decline:

Enterprise drives: Flash is killing off the systems that enterprises used to use for performance-critical data. These were based around buying a lot of very fast, relatively small capacity and expensive disks and using only a small proportion of the tracks on them. This reduced the time spent waiting for the head to arrive at the right track, and thus kept performance high albeit at an extortionate cost. Flash is faster and cheaper.

Desktop drives: Laptops and tablets are killing off desktop systems, so the market for the 3.5" consumer drives that went into them is going away.

Consumer embedded: Flash and the shift of video recorder functions to the network have killed this market for hard disks.

Mobile: Flash has killed this market for hard disks.

Two segments are showing some growth but both have constraints:

Enterprise SATA: Public and private cloud storage systems are growing and hungry for the highest GB per $ drives available, but the spread of deduplication and the arrival of 4TB drives will likely slow the growth of unit shipments somewhat.

Branded drives: This market is mostly high-capacity external drives and SOHO NAS boxes. Cloud storage constrains its growth, but the bandwidth gap means that it has a viable niche.

"There are several dynamics currently limiting market demand: first,
global macroeconomic weakness, which is impacting overall IT spending;
second, product transitions in the PC industry; and third, the continued
adoption of tablets and smartphones, which is muting PC sales growth."

But they too believe that demand for storage is inelastic:

because customers have to "store, manage and connect the massive and
growing amounts of digital data in their personal and professional
lives. This opportunity extends well into the future".

Saturday, October 13, 2012

As I said in this comment on my post Formats through time, time pressure meant that I made enough of a mess of it to need a whole new post to clean up. Below the fold is my attempt to remedy the situation.

Thursday, October 11, 2012

I have been saying for years that the big problem with digital preservation is economic, in that no-one has enough money to do a good job of preserving the stuff that needs to be preserved. Another way of saying the same thing is that our current approaches to digital preservation are too expensive. One major reason why they're too expensive is that almost everything is copyright. Thus, unless you are a national library, you either have to follow the Internet Archive's model and depend on the "safe harbor" provision of the DMCA, making your collection hostage to bogus take-down notices, or you have to follow the LOCKSS and Portico models and obtain specific permission from the copyright holder, which is expensive.

Reading this excellent post by Nancy Sims, it seems as though Judge Baer, in ruling on motions for summary judgement in the case of Author's Guild v. Hathi Trust may have changed that dilemma. Nancy writes:

Although the judge did say that preservation copying, on its own, may not be transformative, he also said that preservation copying for noncommercial purposes is likely to be fair use.

If this ruling holds up, it will have a huge effect on how we go about preserving stuff and how expensive it is. If preservation copying for noncommercial use is fair use, the need to get for libraries and archives to get specific permission to make copies for preservation goes away. There is a great deal of other good stuff in Nancy's post, go read it.

Monday, October 1, 2012

I presented our paper The Economics of Long-Term Digital Storage (PDF) at UNESCO's "Memory of the World in the Digital Age" Conference in Vancouver, BC. It pulls together the modeling work we did up to mid-summer. The theme of the talk was, in a line that came to me as I was answering a question, "storage will be a lot less free than it used to be". Below the fold is an edited text of my talk with links to the sources.