Monthly Archives: June 2015

(Excerpt from original post on the Taneja Group News Blog)

At this month’s Hadoop Summit 2015 I noted two big trends. One was the continuing focus on Spark as an expansion of the big data analytical ecosystem, with main sponsor Hortonworks (great show by the way!) and most vendors talking about how they support, interact, or deliver Spark in addition to Hadoop’s MapReduce. The other was a very noticeable direction shifting focus from trotting out ever more gee-whiz big data use cases towards talking about how to make it all work in enterprise production environments. If you ask me, this second trend is the bigger deal for IT folks to pay attention to.

(Excerpt from original post on the Taneja Group News Blog)

We are seeing convergence everywhere in IT these days. AccelOps shows how convergence in systems management offers many of the same kinds of values as it does in other areas of IT – leveraged capabilities across formerly silo’d practices, simplified tasks and automation embedding best practices, and ready to roll deployment out of the box. AccelOps has tied security, compliance and network operations together into a one stop SOC and NOC “in a box”.

An IT industry analyst article published by SearchCloudStorage.

Cloud and on-premises storage are increasingly becoming integrated. This means cloud tiering is just another option available to storage administrators. Organizations aren’t likely to move 100% of their data into cloud services, but most will want to take advantage of cloud storage benefits for at least some data. The best approaches to using cloud storage in a hybrid fashion create a seamless integration between on-premises storage resources and the cloud. The cloud tiering integration can be accomplished with purpose-built software, cloud-enabled applications or the capabilities built into storage systems or cloud gateway products.

This may be the year that public cloud adoption finally moves beyond development projects and Web 2.0 companies and enters squarely into the mainstream of IT. Cloud service providers can offer tremendous advantages in terms of elasticity, agility, scalable capacity and utility pricing. Of course, there remain some unavoidable concerns about security, competitiveness, long-term costs and performance. Also, not all applications or workloads are cloud-ready and most organizations are not able to operate fully in a public cloud. However, these concerns lead to what we are seeing in practice as a hybrid cloud approach, attempting to combine the best of both worlds.

Taneja Group research supports that view, determining that only about 10% of enterprise IT organizations are even considering moving wholesale into public clouds. The vast majority of IT shops continue to envision future architectures with cloud and on-premises infrastructure augmented by hyperconverged products, at least within the next 3-5 years. Yet, in those same shops, increasing storage consolidation, virtualization and building out cloud services are the top IT initiatives planned out for the next 18 months. These initiatives lean toward using available public cloud capabilities where it makes sense — supporting Web apps and mobile users, collaboration and sharing, deep archives, off-site backups, DRaaS and even, in some cases, as a primary storage tier.

The amount of data that many IT shops will have to store, manage, protect and help process, by many estimates, is predicted to double every year for the foreseeable future. Given very real limits on data centers, staffing and budget, it will become increasingly harder to deal with this data growth completely in-house.

An IT industry analyst article published by SearchDataCenter.

Our data center machines, due to all the information we feed them, are getting smarter. How can you use machine learning to your advantage?

Machine learning is a key part of how big data brings operational intelligence into our organizations. But while machine learning algorithms are fascinating, the science gets complex very quickly. We can’t all be data scientists, but IT professionals need to learn about how our machines are learning.

We are increasingly seeing practical and achievable goals for machine learning, such as finding usable patterns in our data and then making predictions. Often, these predictive models are used in operational processes to optimize an ongoing decision-making process, but they can also provide key insight and information to inform strategic decisions.

The basic premise of machine learning is to train an algorithm to predict an output value within some probabilistic bounds when it is given specific input data. Keep in mind that machine learning techniques today are inductive, not deductive — it leads to probabilistic correlations, not definitive conclusions.

An IT industry analyst article published by SearchStorage.

Big data sure is exciting to business folks, with all sorts of killer applications just waiting to be discovered. And you no doubt have a growing pile of data bursting the seams of your current storage infrastructure, with lots of requests to mine even more voluminous data streams. Haven’t you been collecting microsecond end-user behavior across all your customers and prospects, not to mention collating the petabytes of data exhaust from instrumenting your systems to the nth degree? Imagine the insight management would have if they could look at all that data at once. Forget about data governance, data management, data protection and all those other IT worries — you just need to land all that data in a relatively scale-cheap Hadoop cluster!

Seriously, though, big data lakes can meet growing data challenges and provide valuable new services to your business. By collecting a wide variety of data sets relevant to the business all in one place and enabling multi-talented analytics based on big data approaches that easily scale, many new data mining opportunities can be created. The total potential value of a data lake grows with the amount of useful data it holds available for analysis. And, one of the key tenets of big data and the big data lake concept is that you don’t have to create a master schema ahead of time, so non-linear growth is possible.

The enterprise data lakes or hub concept was first proposed by big data vendors like Cloudera and Hortonworks, ostensibly using vanilla scale-out HDFS-based commodity storage. But it just so happens that the more data you keep on hand, the more storage of all kinds you will need. Eventually, all corporate data is likely to be considered big data. However, not all of that corporate data is best hosted on a commodity scale-out HDFS cluster.

So, today, traditional storage vendors are signing up to the big data lakes vision. From a storage marketing perspective, it seems like data lakes are the new cloud. “Everyone needs a data lake. How can you compete without one (or two or three)?” And there are a variety of enterprise storage options for big data, including enterprise storage, that can provide remote storage that acts like HDFS, Hadoop virtualization that can translate other storage protocols into HDFS, and scalable software-defined storage options.

RT @TruthinIT: There's no cost of goods like a traditional NAS device where I've got disks I've got to pay for. And if I'm not using the data on those disks, I still got to pay for those disks. bit.ly/2BBX073@Nasuni@smworldbigdata

In 30 min I'm interviewing @Cohesity (and customer) on @TruthinIT about Mass Data Fragmentation. It's about having too many copies in about four or five different "dimensions", including cloud! Join us webcast (12.11.18) @ 1pmET (and there will be prizes) bit.ly/2PdqrQn