Wednesday, February 12, 2014

Wikibon Principal Research Contributor Jeff Kelly provides an inclusive basic tutorial of the big data environment, including technologies, skill sets, and use cases, in “Big Data: Hadoop, Business Analytics and Beyond”, and while the environment starts with Hadoop and Map Reduce, it extends far beyond that. While parts of this report may seem basic to technical readers, it provides an excellent overview of technologies, pros and cons, and market issues such as the lack of trained technical personnel and data scientists.

To learn more about what is happening and how your organization can start gaining business advantage through new Big Data technologies and processes, watch theCUBE’s big data event, BigDataSV 2014, on SiliconAngle February 11-13.

Hadoop is often considered the central technology for Big Data, and Kelly provides a good basic discussion of how it works, how it differs from traditional RDBMS technology, and its pros and cons. However, he goes beyond that to discuss NoSQL and massively parallel analytic databases in general terms and concludes that a full big data processing environment requires all three as well as the traditional RDBMS data warehouse to support the full range of data capabilities companies need.

However, Big Data only has value when it is analyzed to provide valuable business insights that then are delivered to the right business decision-makers in a compelling form. That requires data analysis tools designed to work with big data and the underlying database technologies and visualization engines such as Tableau to deliver the results of that analysis to business decision makers.

The big data team

XXXX

Most of all, this requires a knowledgeable team of technical staff and data scientists. While RDBMS remains important, the Hadoop, NoSQL and massively parallel database technologies require new technical skills, and Hadoop in particular is still immature and developing quickly. Many of these skills, and in particular those of data scientists, are in very short supply today, and while vendors are providing technical training and some universities are now providing degrees in data science, Kelly says that more is needed. In particular he says vendors and the Hadoop open source community need to do all they can to make their user interfaces as easy to use as possible to help decrease the technical training required to manage these infrastructures.

Data scientists, however, should not be seen as technical staff. Their responsibility is first to identify the use cases that will provide the most value from Big Data for their organization. Then they must identify the relevant data types and design and complete the analysis. Finally, and perhaps most important, they must present the results of their analysis to business decision makers in a compelling form.

Kelly describes a list of use cases ranging from recommendation engines and sentiment analysis to fraud detection and customer experience analytics. However, he acknowledges that this is only a sample list and says “the most compelling use case at any given enterprise may be as yet undiscovered. Such is the promise of Big Data.”

He does not discuss individual vendors and service providers in this report, beyond providing a graphic listing many of the vendors, cloud, technical and professional services providers and tools available today. A discussion of vendors is available in his previous report, “Big Data Vendor Revenue and Market Forecast 2012-2017“.

He also does not discuss Big Data services, including SaaS analysis services provided by several companies, or how these might fit into an organization’s Big Data strategy. The big advantage of these, of course, is that they are preconfigured and ready to provide analysis on day one. The disadvantage is that they are built to predefined use cases, so the organization cannot design its own unique application of Big Data. But to the extent that these pre-designed use cases offer business value to the organization it should consider them.

Recommendations

XXXX

Graphic courtesy IBM

Big Data is already proving its value in early adaptor companies, and Kelly recommends that enterprises across all industries should evaluate big data use cases and “engage the big data community to understand the latest technological developments.” One opportunity to do that will be afforded to CIOs next week, when theCUBE webcasts its own big data event, BigDataSV 2014. Vendors, he writes, should help organizations identify the best use cases for them and make big data technologies easier to deploy, manage and use. They also need to help user organziations develop the skills needed to deploy and manage their technologies. And most important, they should listen and respond to customer feedback about their needs.

As with all Wikibon research, this report is available without charge on the Wikibon Web site. IT professionals are encouraged to register for membership in the Wikibon community, which allows them to connect with peers, participate in Wikibon research, comment on published research, and publish their own questions, tips, Professional Alerts and longer research.