What is big data?

“Big Data” is a catch phrase that has been bubbling up from the high performance computing niche of the IT market. Increasingly suppliers of processing virtualization and storage virtualization software have begun to flog “Big Data” in their presentations. What, exactly, does this phrase mean?

If one sits through the presentations from ten suppliers of technology, fifteen or so different definitions are likely to come forward. Each definition, of course, tends to support the need for that supplier’s products and services. Imagine that.

In simplest terms, the phrase refers to the tools, processes and procedures allowing an organization to create, manipulate, and manage very large data sets and storage facilities. Does this mean terabytes, petabytes or even larger collections of data? The answer offered by these suppliers is “yes.” They would go on to say, “you need our product to manage and make best use of that mass of data.” Just thinking about the problems created by the maintenance of huge, dynamic sets of data gives me a headache.

What is Hadoop?

There are many different papers and articles that address the question "what is hadoop?" For brevity, I'll just point to the one found on Wikipedia. It defines Hadoop in the following way:

Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.

Hadoop is a top-level Apache project being built and used by a global community of contributors, written in the Java programming language. Yahoo! has been the largest contributor to the project, and uses Hadoop extensively across its businesses.

Hadoop was created by Doug Cutting, who named it after his son's toy elephant. It was originally developed to support distribution for the Nutch search engine project.

What is MapR Technologies doing to improve Hadoop?

MapR Technologies is one of a number of companies that have taken the open source software and have enhanced it in some way. MapR has taken the base Hadoop system and added the following capabilities:

Software making it possible to "mount" Hadoop as if it was a direct access NFS™file system.

Graphical provisioning and planning tools

Monitoring tools making it possible to better understand the distribution of data and processing so it can be optimized for better performance

High Availability tools increasing the reliability of the Hadoop file system. This includes mirroring to disaster recovery and cluster synchronization. It also includes the capabilities to capture data snapshots making it easier to recover from a specific point.

Higher performance database performance through enhanced data reduction and database caching. This also includes a "lockless" architecture making it possible for performance to scale as processing nodes and processor cores are added.

Snapshot analysis

Many suppliers are commercializing Hadoop to make it less of a computer science project and more of a tool for organizations to use to glean important insight from masses of data. It is clear from my discussion with John that he and his team have a deep understanding both of how Hadoop works and how to make it work better for their customers. While I'm no Doctor Technical, John was able to address each of my concerns and questions without a single pause to think. I was impressed.

If your organization is trying to make sense of masses of retail point of sale data, research data, engineering data, geophysical data or financial data, it would be good to talk with MapR. They also offer M3, a free version of their product that includes the NFS access, performance enhancements, but not all of the management tools of their commercial offering, called M5.

Thank You

By registering you become a member of the CBS Interactive family of sites and you have read and agree to the Terms of Use, Privacy Policy and Video Services Policy. You agree to receive updates, alerts and promotions from CBS and that CBS may share information about you with our marketing partners so that they may contact you by email or otherwise about their products or services.
You will also receive a complimentary subscription to the ZDNet's Tech Update Today and ZDNet Announcement newsletters. You may unsubscribe from these newsletters at any time.