Monthly Archives: November 2014

(Excerpt from original post on the Taneja Group News Blog)

In-Memory processing was all the rage at Strata 2014 NY last month, and the hottest word was Spark! Spark is big data scale-out cluster solution that provides a way to speedily analyze large data sets in-memory using a “resilient distributed data” design for fault-tolerance. It can deploy into its own optimized cluster, or ride on top of Hadoop 2.0 using YARN, (although it is a different processing platform/paradigm from MapReduce – see this post on GridGain for a Hadoop MR In-memory solution).

(Excerpt from original post on the Taneja Group News Blog)

Recently I wrapped up a report on GridGain’s In Memory Hadoop Accelerator in which I explored how leveraging memory can vastly improve the production performance of many Hadoop MapReduce jobs, and even tackle streaming use cases without re-writing them or implementing newer streaming paradigms. GridGain drops into existing Hadoop environments without much fuss, so it’s an easy add-on/upgrade. Now GridGain has just transferred the core in-memory platform over to Apache Software Foundation as the newly accepted Apache incubator Ignite project, completely contributed to the community at large.

An IT industry analyst article published by SearchSolidStateStorage.

Sometimes comparing the costs of flash arrays is an apples-to-oranges affair — interesting, but not very helpful.

We’re often told by hybrid and all-flash array vendors that their particular total cost of ownership (TCO) is effectively lower than the other guy’s. We’ve even heard vendors claim that by taking certain particulars into account, the per-gigabyte price of their flash solution is lower than that of spinning disk. Individually, the arguments sound compelling; but stack them side by side and you quickly run into apples-and-oranges issues.

Storage has a lot of factors that should be profiled and evaluated such as IOPS, latency, bandwidth, protection, reliability, consistency and so on, and these must match up with client workloads with unique read/write mixes, burstiness, data sizes, metadata overhead and quality of service/service-level agreement requirements. Standard benchmarks may be interesting, but the best way to evaluate storage is to test it under your particular production workloads; a sophisticated load gen and modeling tool like that from Load DynamiX can help with that process.

But as analysts, when we try to make industry-level evaluations hoping to compare apples to apples, we run into a host of half-hidden factors we’d like to see made explicitly transparent if not standardized across the industry. Let’s take a closer look…

An IT industry analyst article published by SearchCloudStorage.

Hybrid clouds are most frequently used for backup purposes because they relieve enterprises of the need to use a secondary data center.

The hybrid cloud model tends to get used today mostly for cold storage — or backup and disaster recovery purposes.

Certainly that situation is evolving. People are using cloud storage and hybrid cloud storage for more use cases than ever, such as experimenting with partition data for big data analytics, or looking at some applications that were born in the cloud and figuring out how to make them work with data that’s on-site.

But today, they’re using it by and large as a cold storage tier. Using the hybrid cloud as a backup site is really a great thing because you don’t have to build a complete second data center or another off-site repository. If you just have one data center — or primary data center — you can take those backup images and put them into a public cloud.

After that, you can pull those images out at any point. But the great thing is you don’t have to pull them out back to the exact same place. If you lost your primary site and you want to restore those images to a second site or a different site, you can do that. If you’re careful about how you build this and you’re completely virtualized, you can restore your backup images to the same cloud or even a different cloud. So now, if you lose your primary site, you can still back up and restore within the cloud…

RT @TruthinIT: There's no cost of goods like a traditional NAS device where I've got disks I've got to pay for. And if I'm not using the data on those disks, I still got to pay for those disks. bit.ly/2BBX073@Nasuni@smworldbigdata

In 30 min I'm interviewing @Cohesity (and customer) on @TruthinIT about Mass Data Fragmentation. It's about having too many copies in about four or five different "dimensions", including cloud! Join us webcast (12.11.18) @ 1pmET (and there will be prizes) bit.ly/2PdqrQn