Monday, June 4, 2012

June I

According to this article in CW, "the National Center for Supercomputing Applications is rolling out a storage infrastructure that will include 380 petabytes of magnetic tape capacity and 25 petabytes of online disk storage made up by 17,000 SATA drives."

I think this is noteworthy for two reasons:a) they realize that they cannot afford to keep all the data on spinning disk, so they provision about 95% of the data storage using IBM TS1140 tape drives and media (has someone invented the term "Tape Provisioning" already?)b) the storage will be connected to the servers using 40 Gbps Ethernet, not sure if they actually use FCoE or something else (like AoE: ATA over Ethernet), but it still shows a trend towards high-speed converged networks. Ethernet has a great potential there with its 40Gbps and 100Gbps standards and roadmap!
In that project, they use the Lustre file system as the above article lines out. But they might as well have looked into using GPFS: IBM just announced the Rel. 3.5 with many enhancements as outlined here.
As you probably know, IBM uses the GPFS file system in many of its current storage offerings, like the TS7700, SONAS and most recently the Storwize V7000 Unified. So we can expect to see the functions outlined in GPFS 3.5 implemented in these products as well! Watch this space!

And to conclude todays excursion into the Big Data space, here's an article from Jon Toigo which outlines quite nicely, what the difference between "lots of data" and "Big Data" is: “Big Data analytics help you make sense of what you are observing while you are observing it. You glean knowledge from data as quickly as it arrives — in something like 200 milliseconds – rather than waiting to batch process the data and produce a report.”