TACC's Hadoop Cluster Makes Big Data Research More Accessible

Over at the Texas Advanced Computing Center, Aaron Dubrow writes that researchers are using a specialized cluster at TACC to do experimental Hadoop-style studies on a current production system.

This system offers researchers a total of 48, eight-processor nodes on TACC’s Longhorn cluster to run Hadoop in a coordinated way with accompanying large-memory processors. A user on the system can request all 48 nodes for a maximum of 96 terabytes (TB) of distributed storage. What’s special about the Longhorn cluster at TACC isn’t simply the beefed-up hardware for running Hadoop; rather it’s the ability for researchers to leverage the vast compute capabilities of the center, including powerful visualization and data analysis systems, to further their investigations. The end-to-end research workflow enabled by TACC could not be done anywhere else, and as a bonus, researchers get access to the full suite of tools available at the center to do computational research.

According to TACC Research Associate Weijia Xu, the best part is that Hadoop is easy to use without requiring users to be experts. It handles a lot of the low-level computing behavior, so people don’t need to have a lot of knowledge about I/O or memory structures to get started.

Resource Links:

Latest Video

Industry Perspectives

"Exascale computers are going to deliver only one or two per cent of their theoretical peak performance when they run real applications; and both the people paying for, and the people using, such machines need to have realistic expectations about just how low a percentage of the peak performance they will obtain." [Read More...]