Tuesday, July 24, 2012

There's an anomaly in this graph from my prototype economic model of long-term storage. It shows the endowment as a multiple of the initial storage purchase cost that would have a 98% probability of not running out of money in 100 years, plotted against the Kryder rate, the annual percentage drop in storage purchase cost.

The region between Kryder rates of 25% and 35% is flat. It seems strange that, with storage prices dropping faster, the endowment needed isn't dropping too. I finally got time to figure out what is going on, and I now believe this is an artefact of the technology replacement policy model I implemented as part of the prototype model. It isn't that the policy model is unrealistic, although there is plenty of room for disagreement about alternative policies. Rather, the parameters I gave the policy model to generate this graph were unrealistic. Follow me below the fold for what Paul Krugman would call a wonkish explanation.

In 2008, however, a substantial number of contributions to such 501(c)s
made it into the FEC database. For the agency quietly to remove them
almost four years later with no public comment is scandalous. It flouts
the agency’s legal mandate to track political money and mocks the whole
spirit of what the FEC was set up to do. No less seriously, as legal
challenges and public criticism of similar contributions in the 2012
election cycle rise to fever pitch, the FEC’s action wipes out one of
the few sources of real evidence about how dark money works. Obviously,
the unheralded purge also raises unsettling questions about what else
might be going on with the database that scholars and journalists of
every persuasion have always relied upon.

They write:

While you knew FEC data was unlikely to be the last word, you could be
confident that whatever the agency did report was as true as it could
make it. That the FEC would ever delete true reports of politically
relevant money was literally unthinkable.

Other sources had made copies of the FEC data and added value:

Comprehending their formatting and correctly interpreting their myriad
rows and columns required the patience of Job and the informal
equivalent of a BS in computer science. As a consequence, most
researchers threw up their hands. They didn’t directly use FEC data;
instead they relied upon data reworked by some for-profit reseller, or
more commonly, the Center for Responsive Politics.

The journalists were re-examining the FEC's database:

For the 2007-'08 election cycle, for example, we found millions of
dollars in political contributions that appear to have escaped earlier
nets. We are also able to do a much better job of aggregating
contributions by large donors, which is key to understanding how the
system really works.

What they found was:

We discovered the FEC deletions when cross-checking our results for
big-ticket contributors. These deletions do not at all resemble other
post-election corrections that the FEC routinely makes to its data
downloads.

They then give a series of examples of large donors whose contributions to the 2007-8 cycle have been whitewashed away.

In mid-morning, certain reporters began tweeting that it was easy to
find contributions that we specifically discussed on the FEC website. We
checked one particularly famous name that we had also looked up only a
few days before and found that he was indeed back.

The Tweeters who doubted the Alternet story were easily refuted because:

These downloads are public and dated, so anyone can verify what’s in
them. The 2008 contribution by Harold Simmons that we mentioned is in
the January download. It is not in the July 8 download. The same is true
for other contributions we discussed to Let Freedom Ring by John
Templesman, Jr., and Foster Friess. More broadly, the entire set of “C9”
files covering 501(c)4 that we discussed is gone from the July
download, with the trivial exception we mentioned. Needless to say, we
checked the FEC’s database many times ourselves and we indicated that
the original record of contributions by Simmons (and others) could still
be found, if you knew exactly where to look.

The whole story, which was made possible by archiving copies of what the government was publishing when it was being published, seems to be having a happy ending. Except, perhaps, for the FEC. The credibility of the information they publish has been degraded, and will stay degraded until they come up with an explanation of what exactly happened and how they plan to make sure it never happens again. The FEC should look to Amazon for an example of how to make this kind of information public.

Tuesday, July 17, 2012

From my continuing series on the dismal outlook for Kryder's Law, The Register reports on the continuing impact of the Thai floods:

Hard disk drive prices are unlikely to return to pre-flood levels until
2014 despite rising production levels, thanks to surging demand, vendor
lock-in and a market dominated by just two suppliers, according to
analysts.

Had there been no floods, and had the industry managed even just the 20%/yr increase in bit density that is now projected, and had margins remained stable instead of increasing dramatically,
prices in 2014 would have been about 1/2 their pre-flood levels. So,
three years later, costs will be at least double what would have been
predicted before the floods.

Keene said WD had "recovered" more quickly than industry commentators had forecast – the plants were nearing full capacity some months ago – and said only some high-spec products remained in tight supply.

Seagate will miss its fourth quarter's sales target as its competitors recover faster than expected from floods that knackered hard drive supplies.

As part of a feature series on the solid state storage revolution, Ars Technica has a useful overview of the future of SSDs., with a clear explanation of why shrinking the flash cell size reduces not just the data retention time but also the number of write cycles a cell can sustain, and why moving from MLC (2 bits per cell) to TLC (3 bits per cell) makes the problems even worse. The article describes the contortions flash controller makers are going through to mitigate these problems, which are starting to sound like those the hard disk makers are struggling with. The article is suitably sceptical about the prospects for alternative solid state memories, but moderately enthusiastic about memristors. This is good news because:

a significant number of folks who have suffered through SSD failure have
so suffered not because the flash wore out, but rather because of an
actual component failure—a flash chip died, say, or the controller died.
Component failure can occur in any device, and if memristor-based
drives are eventually manufactured and assembled in the same plants by
the same companies who make today's HDDs and SSDs—and we have every
reason to believe they will be—then a memristor-based storage drive with
a million-write endurance rating would likely be effectively immortal.
Other components are likely to fail long before the storage medium
reaches its write endurance limit.

There are several post-NAND technologies jostling for prominence,
such as Phase-change memory, resistive RAM, memristors and IBM's
Racetrack memory. All promise greater capacity, higher speed and longer
endurance than flash. It's not clear which one of them will become the
non-volatile memory follow-on from NAND, but, whichever it is, the
controller software crafter to cope with NAND inadequacies won't be
needed.

MLC NAND wear-levelling and write amplification reduction technology
won't be needed. The NAND signal processing may be irrelevant. Garbage
collection could be completely different. Entire code stacks will need
to be re-written. All the flash array and hybrid flash/disk startups
will find their software IP devalued and their business models at risk
from post-NAND startup's IP with products offering longer life and
faster-performance.

One solid state storage technology wasn't included in these reviews because it was only announced July 10. Karlsruhe Institute of Technology managed to build a memristor-like 1-bit device using only 51 atoms:

They point out that a bit on a hard disk drive uses about 3 million
atoms while their molecule has just 51 atoms inside it, including the
single iron atom they've inserted at its centre. Wow, that's 58,823.5
times denser, say 50,000X for argument's sake, which would mean a 4TB
hard drive could store 200PB using molecular bit storage. - except that it couldn't ... If molecular bit storage is ever going to be used, it will be in
solid state storage and you will still need to access the bits – with
one wire to deliver the electric pulse to set them and another wire to
sense their setting. Even using nanowires, more space in the device
would be taken up by circuitry than by storage.

New numbers show hybrid drives, which combine NAND flash with
spinning disk, will double in sales from 1 million to 2 million units
this year. Unfortunately for Seagate — the only manufacturer of hybrids —
solid-state drive sales are expected to hit 18 million units
this year and 69 million by 2016. ... If hybrid drives are to have a
chance at surviving, more manufacturers will need to produce them, and
they'll need to come in thinner form factors to fit today's ultrabook
laptops.

The demand for HDD capacity continues unabated as there is no
alternative whatsoever for fast access to online data; NAND flash fabs
are in short supply and have a long lead time. In 2011, 70 per cent of
all the shipped petabytes were HDDs, with optical, tape, NAND and DRAM
making up the rest. Some 350,000PB of disk were shipped, compared to
about 20,000 PB of NAND.

WD's COO, Tim Leyden, reckons another 40 NAND fabs would be needed to
make 350K PB of NAND, and that would cost $400bn at $10bn/fab capable
of shipping 8.8K PB/year. It isn't going to happen any time soon.

Here is WD's version of the Kryder's Law curve, on a scale that minimises the slowing from 40%/yr to 20%/yr in the last few years, and they play that down as temporary in words too:

WD says that HDD areal density growth has slowed from an annual 40 per
cent compound annual growth rate to just 20 per cent as the current PMR
recording technology nears its maximum efficacy, with a transition to
energy-assisted magnetic recording (EAMR) coming and representing the
possibility of regaining the 40 per cent rate.

But they are clearly concerned enough to raise the possibility of adding platters (and thus cost):

Our contact said that 5-platter 3.5-inch drives were a possibility.

With a trend towards thin and light drives for Ultrabooks, the possibility of platter addition is denied in this market sector.

The decreasing proportion of the market occupied by PCs that use 3.5" drives is clear from this graph from Forester Research. The competition is not just laptops, netbooks and tablets but also smartphones:

Vendors shipped close to 489 million smartphones in 2011, compared
to 415 million PCs. Smartphone shipments increased by 63% over the
previous year, compared to 15% growth in PC shipments.

Canalys includes pad or tablet computers in its PC category
calculation, and this was the growth area in PCs. Pad shipments grew by
274% over the past year. Pads accounted for 15% of all client PC
shipments.Â Desktops grew by only two percent over the past year, and
notebooks by seven percent.

At 7:24pm PDT, a large voltage spike was
experienced by the electrical switching equipment in two of the US
East-1 datacenters supporting a single Availability Zone. ... In one of the datacenters, the transfer completed without
incident. In the other, the generators started successfully, but each
generator independently failed to provide stable voltage as they were
brought into service. ... servers operated without interruption during this period on the
Uninterruptable Power Supply (“UPS”) units. Shortly thereafter,
utility power was restored ... The utility power in the Region
failed a second time at 7:57pm PDT. Again, all rooms of this one
facility failed to successfully transfer to generator power ... all servers continued to operate normally on ... UPS ... power. ... the UPS systems
were depleting and servers began losing power at 8:04pm PDT. Ten minutes
later, the backup generator power was stabilized, the UPSs were
restarted, and power started to be restored by 8:14pm PDT. At 8:24pm
PDT, the full facility had power to all racks. ... Approximately 7% of the EC2 instances in the US-EAST-1 Region were ... impacted by the power loss. ... The vast majority of these
instances came back online between 11:15pm PDT and just after midnight.
Time for the completion of this recovery was extended by a bottleneck
in our server booting process. ... The majority of EBS
servers had been brought up by 12:25am PDT on Saturday. ... By 2:45am PDT, 90% of outstanding volumes had been turned
over to customers. We have identified several areas in the recovery
process that we will further optimize to improve the speed of processing
recovered volumes.

They also provided detailed descriptions of the mechanisms that caused the customer impacts, which should be required reading for anyone building reliable services.

Many
people think that cloud economics starts to pay dividends immediately
when you move to an on-demand usage model, paying only for what you use.
Cloud can actually be fairly expensive when hosting environments in use
are not elastic. I was surprised to see how many startups and SaaS
organizations still run their applications in the cloud just as they
would in any static hosting environment. While the pay-per-use model has
a lot of promise for cost-savings if our applications aren’t designed
for elasticity the cost of running the application in the cloud may end
up costing you more. Most of the mission-critical applications have not
been designed for elasticity and on-demand usage.

The flaw in the concept of DuraSpace, Preservica and other
preservation in the cloud ideas is that digital preservation is the
canonical example of a mission-critical application not "designed for elasticity and
on-demand usage". The business models of cloud service providers base pricing on the value to their customers of eliminating the
over-provisioning needed to cope with peak demand. Digital preservation
is a base-load application, it doesn't have peaks and troughs in demand
that require over-provisioning.

But the overall driver of cloud computing, at least for now, is business agility. The early adopters driving cloud computing don't need discounts,
because cost isn't their primary motivation. Help them figure out how
to do more, faster, and the cost equation of public versus private
clouds becomes somewhat of a non-issue.

Again, does anyone think the primary need for digital preservation is agility?