Freedom, Electronics and Tech

gnuplot

Update: Empirical evidence to go with the theoretical numbers.
Summary: It checks out; SSDs last a very long time.

Background

The myths about how you should use an SSD, and what you should not do with it keep on spinning. Even if there are frequent articles which crunch the actual numbers, the superstition persists. Back in 2008, Robert Penz concluded that your 64 GB SSD could be used for swap, a journalling file system, and consumer level logging, and still last between 20 and 50 years under extreme use.

Fast forward to 2013, with 120 and 240 GB drives becoming affordable, the problem should have virtually disappeared from consumer grade hardware, but people are still worried. So when Magnus Deininger did some estimates on SSD stress testing, he got flack from Slashdot since he did not cover the consumer level disks. The write endurance and number of estimated write cycles on a single block before it goes bad varies widely between consumer and enterprise grade disks, ranging from only 1000 cycles to a million. This article from Centon explains why that is. As can be seen from the simplified figure below, the cheaper consumer drives using “TLC” (Three Layer Cell) or “MLC” (Multi Layer Cell) memory cram the data a lot closer, and thus degrade quicker than enterprise grade “SLC” (Single Layer Cell) memory.

Stress test of consumer SSD

Deininger concerned himself only with the high end drives, with 100k to 1M write cycles, while most folks over at Slashdot seems to have the low end ones, at 1k – 10k write cycles, and thus the furore. However, Deininger’s estimates were also skewed against the high end drives, since he used the maximum write speed of the SATA 3 controller, which at 6 Gbit/s (750 MByte/s) is lot more than the ~500 MB/s a typical SSD is rated for, or the ~250 MB/s you probably get out of it on a consumer system. And even that is still estimates for a stress tests, and does not even start to model a typical consumer usage pattern.

Deininger goes into detail on how he came up with the estimates, and also how to plot his graphs in Gnuplot. So based on that, let’s run a few numbers, covering the 10k and 1k disks and typical use. However, let’s first drop the stress test write speed down from max controller speed to typical system speed of 250 MB/s. Also, for his plots, he uses multiples of 1024, which for flash memory based drives might be correct, but is not universally used; for example Intel specifies in GB (base 10), while OCZ in GiB (base 2). For transfer speed, this is wrong, as base 10 is the norm. Although it does not make a big difference, I’ve changed to base 10 numbers.

The graphs show time on the x-axis (days on the two first, and years in the next section), and the fraction of broken memory cells (or blocks) on the y-axis. That is, from 0 damaged cells, to 100% or all of them at the top. A horizontal lines marks the 10% point in all graphs since this is usually the point where damaged cells will be visible to the end user. Before that, the internal write levelling on the disk controller will hide these cells, since most disks come with about 10% space reserved for this. (Thus a disk with 128 GiB space is sold as 120 GB, and 256 GiB is sold as 240 GB).

First, there are a few fundamentals based on Deininger equations which can be seen in his examples, and also becomes clear in the graphs above: Doubling storage capacity of the drive doubles the time to failure (at the 10% line). And increasing flash lifespan by a factor of ten also increases time to failure by a factor of ten. All linear relationships, and no magic there, in other words.

For the three drive sizes considered (I dropped the 32 GB size, as I did not find it worthwhile for almost any application any more), the failure times at 10.000 write cycles are 26, 51, and 103 days for 64, 128 and 256 GB respectively. For TLC memory, at only 1000 cycles, the times are thus also a tenth; 2.6, 5.1 and 10.3 days.

If you were to conduct a stress test of drives from different manufactures, these numbers would be interesting. You could for example do the write, check and remove operations continuously till you start to see errors on the data written. However, as read speeds are typically around the same as write speeds for most SSDs, it would actually take at least twice as long as the points in these graphs. (The remove operation also has to be factored in, but is only a fraction of full read and write).

Typical usage

For any performance test it is important to understand where the critical failure points are. However, it does still not tell us what will happen on a typical home user system. A typical consumer would not fill up his whole drive multiple times a day, only to remove it all and start over. So how best to simulate typical user behaviour. Well, we could of course just leave the drive in a machine, and run user software over many years to see what happens. That would not be practical, as we’d never get any useful results in a reasonable time. So, we’re left with estimates, but at a different write speed than the stress test above.

How much would a typical user write to his disk? There will be different use cases of course, but let’s assume two scenarios: a low to medium use case, where 1 GB is written every day, and a heavy home user who writes 1 GB an hour, every day (although, even that is probably beyond what could be labelled as consumer usage). At this point, a table of the different speeds and units comes in handy, so we can wrap our head around the numbers. It then becomes clear how extreme the 250 MB/s stress test actually is, as it will fill up a 64 GB disk 337 times over in 24 hours (250 MB/s * (24*60*60) second = 21600 GB. And 21600 / 64 GB = 337.5 times).

MBit/s

MByte/s

MByte/hour

GByte/day

GByte/year

SATA3 max speed

6000

750

2700000

64800

23652000

Stress test

2000

250

900000

21600

7884000

Heavy use

2.2222

0.2778

1000

24

8760

Low/Medium use

0.0926

0.0116

41.67

1

365

Now for some graphs. You’ll have to watch them carefully as the plotted lines are all the same, the y-axis are all the same, the disk sizes are the same, and the only parameters changing are write speed; 1 GB vs. 24 GB a day, and cell cycle life span; 10k vs. 1k. And watch out for the x-axis which are now in years, instead of days above. The first graph shows 10k write cycle disks, where 1 GB is written every day. The smallest disk, at 64 GB, will then last for 1524 years!

Can that be right, you ask? There must be some a mistake in the numbers somewhere? Well, let’s do a quick check to see if it matches Deininger’s graphs: First, his plots were in days, so 1524 years makes 1524 * 365 = 556260 days. Next, the ratio between 6 GBit/s and 1 GByte / day we get from the table above: 64800 (GB / day). Finally, In his first graph, he considered 100k write cycle disks, so we multiply by a factor of 10. Plug in the numbers: 556260 / 64800 * 10 = 86. Exactly matching 86 days for the 64 GB disk at 100k cycles in his first graph. The math works out.

Even in the most unrealistic use case, where a 64 GB drive rated for 1000 write cycles (TLC memory) is filled up almost three times per week, it will last more than six years before the first dead memory cells are likely to show. Moving to a MLC based drive at 10k (still consumer grade), the time to failure moves to 63 years, most likely far outlasting the system it was hosted in, or maybe even the consumer who bought it.

(For the Gnuplot scripts to generate all the graphs above, please see this file).

Conclusion

So will Sold State Drives last till the end of time? Of course not! In fact, plenty of other components are prone to fail just the same way as in old HDDs: Capacitors are infamous for their short lifespan; solder joins might crack. The important point is, it is not the memory cells which are likely to fail first, even under the most extreme use.

Still, it makes sense to deploy tools fit for purpose: An enterprise drive drive using SLC memory, with 100k or 1M write cycles will leave all doubts behind. There will be no need to consider special use cases or take special precautions (beyond normal backup and security procedures which should be in place regardless of drive type). For the home user, the same is true: Even the smallest drives with shortest cell lifespan will not fail under normal use.

More specifically, there are no problems or worries with

using ext3, ext4 or other journalling file systems on an SSD.

storing /tmp or logs on the SSD.

using an SSD partition for memory swap.

any normal consumer usage pattern.

In summary: Exchanging the old spinning disk with solid state will pose no extra risk of data loss. It will of course not reduce the risk of loss from other threats either, so normal backup and security procedures should always be in place.

Computer storage, primary and secondary memory, has seen a tremendous phase of development over the last fifty years. As new technology has been brought to the market prices have continued to decline steadily at a logarithmic scale. For magnetic storage, the trend has been very stable over the last thirty years, with prices per MB going down around a third every year, or a ninety percent every five years. For primary storage, the trend has been more volatile, but overall we see a similar rate of decline all the way back to the first flip-flops in the 1950s.

John C. McCallum has done a good job collecting all the data over the years, and going back to computer magazines for reference. However, since the beginning of 2012 there have been no updates, so I’ve taken up the work where he left off. I’ve added a new page to my site, where I will collect the data and update the graphs over time: hblok.net/storage

(Click image for larger version)

In the first update, the harddisk prices are most interesting, and we can now clearly see the effect of the flood disaster in late 2011. It has interrupted a thirty year trend, and as a result prices are about the same per MB as they were one and a half years ago. Now the question is, will this have a lasting effect on the magnetic harddisk prices, or will it be just a blip in history, as technological improvements bring us cheaper storage at the same phase.

The two plots below extrapolate the trend over the last thirty years, with two different scenarios: 1) Improvements in technology will catch up with the delay over the last year, and thus the thirty year trend will continue unaffected (red line). Or 2) phase of improvments will not change, and thus the rate of decline in price will stay the same, but shift the line by about a year (blue line).

(Click image for larger version)

The price is 4 cents per GB today (4e-5 per MB). If we look two years ahead, with the uninterrupted scenario (red line), the price would be 0.5 cents per GB in 2015 (5e-6 per MB), or put in different ways: 3 TB of storage which costs $125 today would have to go down to about $15 in two years, or for the same $125 you’d have to get a whopping 25 TB (yes, twenty five!). Given the recent news from the major harddisk vendors, that seems rather unlikely to happen; they’re only planning for 5 TB drives at the end of this year. So, over two years time, prices will not catch up. Perhaps this will change looking even further ahead, however, extrapolating technological trends beyond a year or two is merely guessing.

If we look at the second scenario, where we assume that the prices will continue to decline at the same rate as they have done in the past, given today’s price we’re then looking at about 1.5 cents per GB (1.6e-5 per MB). That would mean that today’s 3 TB would go for around $50, while $125 would buy you about 8 TB. That seems more reasonable, and also in line with what products are being brought to market and in research right now. If the rumoured 5 TB Western Digital disk will be realised with four platters (4 * 1.25 TB) at the end of this year, it means five platter 6.25 TB (5 * 1.25) disks are already a possibility. Increasing storage density another 30% to reach 8 TB over the following year seems a reasonable assumption.

Edit: A previous version of this article placed the decimal point for price per GB incorrectly, at 0.4 cents rather than 4 (although the other numbers were unchanged, as were the extrapolated predictions).

After struggling with fonts in Gnuplot 4.6 (on Fedora 17) (getting the not so useful error “gdImageStringFT: Could not find/open font while printing string”), I found tonicas post on debugging the issue. Although helpful, it did not give the full solution to my problem. It turns out, many of the old fonts are not available in Fedora 17 at all.

I wanted a sans-serif font, and in the end I went for the DejaVuSans. After installing the font packages, I specifically exported that path for use with Gnuplot:

I wanted to make a graph on the amount of data served from by Apache server, with a bit finer granularity than AWStats could give. The http_access file has all the information I needed, including the time of each request and bytes served. Assuming the standard combined format, the time stamp is at the 4th field, and the bytes served at the 10th.

Thus, the following will isolate the necessary data for my graph. (Note, the log can usually be found at /var/log/httpd/access_log).

cat /tmp/access | cut -f 4,10 -d ' '

However, it turns out not all log entries store the bytes served. This includes file not found, and certain requests which return no data. Some cases will have a hyphen, while others will simply be blank. To pick out only the lines which contained data, I appended the line above with:

cat /tmp/access | cut -f 4,10 -d ' ' | egrep ".* [0-9]+"

The first plot

This is enough to start working with in gnuplot. First we have to set the time format of the x-axis. The Apache log file is on this format: “[10/Oct/2000:13:55:36″, or in terms of strftime(3) format: “[%d/%b/%Y:%H:%M:%S”. (Note that the opening bracket from the log is included in the formatting string).

To set the time format in gnuplot, and furthermore specify that we work with time on the x-axis:set timefmt "[%d/%b/%Y:%H:%M:%S"
set xdata time

The data can then be plotted with the following command:plot "< cat /tmp/access | cut -f 4,10 -d ' ' | egrep '.* [0-9]+'" using 1:2

To output to file, the following will do. The graph below shows the served files from my logs in the last couple of days.set terminal png size 600,200
set output "/tmp/gnuplot_first.png"

Improvements
There are a few improvements to be made on the graph above: Most importantly the data is slightly misleading, since files served at the same time is not accumulated. Furthermore, the aesthetics like legend, axis units, and title formatting are missing. Also note that the graph is scaled to a few outliers: I have a 7 MB video on my blog, which is downloaded occasionally. For the following examples, I will focus on the first day, where this file is not included.

First, I’ve made some minor improvements, and in the second graph I’ve applied the “frequency” smoothing function. Notice how the first graph has a maximum around 440 kb, while the smoothed and accumulated graph below peaks at around 900.set terminal png size 600,250
set xtics rotate
set xrange [:"[24/Apr/2011:22"]

awk
Although the frequency smoothing function gives an accurate picture, some of the accumulations are done at a too wide range, thus giving the impression of higher load than is the case. Another way to sum up the data is to aggregate all request on the same second into a sum. This can be done with the following awk script:

Plotting these two functions in the same graphs shows the difference between the peaks of the frequency function, and the simple aggregation:plot "< cat /tmp/access | cut -f 4,10 -d ' ' | egrep '.* [0-9]+'" using 1:($2/1000) title "frequency" smooth frequency with points, "/tmp/access_awk" using 1:($2/1000) title "awk" with points lt 0

Moving average in Gnuplot
For the daily graph, I think I’d prefer the one using the awk output, and perhaps using lines or “impulses” as style instead. However, it does not address the outliers. To smooth them out, we could try a moving average. This is not supported by any native function in gnuplot, so we have to roll our own. Thanks to Ethan A Merritt, there is an example of this.

Of course, this will put a lot less emphasis on peaks, and the outlier at 650 kb in the graphs above is now represented with a spike of less than 200. Furthermore, there is a problem with the moving average of time data of inconsistent frequency. The values will be the same whether the last five request were over an hour or a few seconds.

Zooming out to the day view, the average is maybe more appropriate here, since data is overall on a more consistent frequency.

set xrange [*:*]
set format x "%d"

plot "/tmp/access_awk" using 1:(avg5($2/1000)) title "awk & avg5" with lines

Cumulative
Finally, another interesting view is the cumulative output day by day. This can easily be achieved by inserting a blank line in the data file between each day. In awk, using the previous sum file generated above, it can be done like this:

Or an alternative, based on the original access_log file. The aggregation per second this not necessary, since the “cumulative” function will do the same operation, and the graph will be exactly the same:cat /tmp/access | cut -f 4,10 -d ' ' | egrep ".* [0-9]+$" | awk 'BEGIN { FS = ":" } ; { date=$1; if (date==olddate) print $0; else { print ""; print $0; olddate=date}}' > /tmp/access_awk_days

And the gnuplot. Note that the tics on the x-axis are set manually here, starting on a day before the first day in plot, and ending on the last. The increment is set to a bit less than a day in seconds (60 * 60 * 24 = 86400) to approximately center it under each line. Also note, that the format of the start and end arguments still have to be the same as set in the beginning, with timefmt.set xtics "[23/Apr/2011:0:0:0", 76400, "[29/Apr/2011:23:59:59"
set format x "%d"