A Cost-Effective Hadoop Cluster for Big Data Environments

When it comes to demanding workloads, few can match big data's need for performance, scalability and manageability. Today, big data is a driving force behind a growing number of enterprises using banks of servers, storage and network infrastructure to analyze huge data sets of information to gain or maintain a competitive edge.

Any discussion of big data workloads is almost certainly going to center on Hadoop. If you need to access both structured and unstructured data for deep analytics and sophisticated intelligence, Hadoop is an invaluable tool. In compute-intense environments where server clustering is a standard infrastructure design, such as indexing or understanding customer buying patterns, Hadoop is likely going to be a critical part of the solution set.

By using high-performance algorithms to deduce otherwise hard-to-detect patterns and trends, Hadoop makes big data an essential part of any organization's business-critical workload inventory. But Hadoop is highly demanding when it comes to infrastructure requirements. It runs on distributed storage and compute clusters to enable parallel processing of “chunks” of those very large and complex data sets.

Managing TCO in a Hadoop Environment One of the biggest challenges organizations face when implementing and supporting Hadoop workloads is enabling a solution in a cost-effective way, both in terms of capital equipment for the initial deployment and long-term scalability, and for ongoing management of the solution. Hadoop workloads traditionally have required dozens, even hundreds of nodes, which translates into lots and lots of data center racks. If you are using a colocation environment, that means you'll pay more on a per-node or per-rack basis.

Whether you're looking at a Hadoop infrastructure node, memory-intensive node or compute-intense node, you can't just throw more CPUs or disk spindles at the problem to keep up with the demand for these workloads. Not only will the capital equipment expenses spiral out of control, but expanding the underlying infrastructure in such an ad hoc manner is also going to result in the use of much more physical space, as well as higher power and cooling costs.

One of the negative byproducts of high infrastructure costs associated with Hadoop is how it can depress organizations' appetite for proof-of-concept deployments. Even piloting a modest Hadoop workload can take up a lot of rack space, which can prevent or simply delay the implementation of important big data sandbox programs.

An Elegant Solution: Dell PowerEdge FX Converged InfrastructureFortunately, there are new infrastructure options for Hadoop workloads that lower the cost of entry for organizations looking to take advantage of Hadoop for big data, as well as keeping ongoing costs under control. Converged infrastructure solutions from Dell, such as Dell PowerEdge ™, help organizations dramatically reduce total cost of ownership for Hadoop workloads by installing highly dense compute and storage resources in a small, affordable package that can be situated almost anywhere and can dramatically reduce power and cooling costs.

Must Read

The Cost of Using the Public Cloud

Read the report from the Evaluator Group comparing the cost of an on-site HCI solution with a public cloud option.

Continue Reading

PowerEdge is particularly relevant for Hadoop workloads because of its ability to support one of Hadoop's biggest requirements—high-capacity direct-attached storage. The PowerEdge solution not only brings more storage closer to the compute function than any other modular architecture, but its FD332 storage sled also provides the ability to mix and match the most appropriate rations of compute and storage in order to provide unique workload granularity.

PowerEdge supports Hadoop workloads in a standard 2U form factor, and it further reduces deployment costs through a unique cable aggregation method that dramatically reduces the number of cables required for compute, storage and power.

For organizations that need not only high performance but also highly available, resilient infrastructure in Hadoop environments, the PowerEdge supports I/O aggregation in order to further reduce total cost of ownership.

Finally, one of the biggest financial benefits of using PowerEdge systems for Hadoop workloads is the ability to reliably and affordably keep up with ever-mounting data sets in order to handle bigger and more complex analytics. The PowerEdge system's scale-out capabilities allow a logical, balanced and affordable alignment of infrastructure availability in lockstep with deeper and broader analytics requirements.