Integrate or disintegrate: How to keep your big data strategy from falling apart

With big data being such an important topic, enterprises are looking at a variety of ways to integrate Hadoop, map reduce and other big data technologies into the enterprise. But both a rushed and reluctant approach can cause problems. This article will show you how to make your organization's big data strategy work.

Internal enterprise data in SQL data stores is often critical for interpreting both the veracity and relevance of big data from other sources.

Jason Tee, Enterprise Software Architect

This is not business as usual

Much of this data is nothing like the enterprise data that businesses are used to handling. With large scale structured data, most of the challenges with data proliferation could be resolved by addressing scalability, redundancy and analytics. With big data, those are just a few of the problems that enterprises must solve. The types of data collected today come from a much broader array of sources. Data from embedded sensors, RFID chips, audio and video feeds, document and image files, graphs, and much more come through the database doors. Social media is blowing away all preconceived notions about what data should look like. That's not even counting big data shared among business partners.

Organizations can no longer readily dictate or constrain the exact format in which data is presented. In fact, attempts to do so would substantially decrease the value of the data itself. An enterprise can only anticipate a certain number of potential scenarios or responses. No matter how many checkboxes or data fields they create, there will always be data that spills over outside the box. The outcome of ignoring everything that doesn't look like traditional data could be devastating from a competitive standpoint. The recent McKinsey Global Institute study, Big Data: The next frontier for innovation, competition, and productivity, suggests that enterprises are leaving hundreds of billions of dollars on the table by failing to fully leverage their currently available data.

Relational databases are only partial solutions

The burgeoning volume and variety of data is why tools and technologies for managing unstructured data have become so important. These non-relational NoSQL, XML and key/value data stores assist enterprises in resolving both scalability and accessibility issues for much of their big data. Solutions like Hadoop using MapReduce coupled with the Hive Query Language offer enterprises a starting point to manage their big data and gain business intelligence. Other major NoSQL database management systems such as MongoDB and Cassandra already offer integration with Hadoop, making it easier for customers to at least have an interface or overlay that connects disparate data streams.

The data itself is now more mobile within the enterprise as well. Parallel processing and intelligent data chunking tools like JitterBit are designed to permit the flow of data from one application to the next while maintaining quality. Such integration across data types and applications is key for time-sensitive activities involving real-time analysis. Often, this form of analysis must query both current and historical data to identify emerging trends. This is where SQL often comes back into play.

SQL, NoSQL and big data technology

The new data coming in does not negate the value of the carefully tailored business data that has been collected and generated over the past several decades. Internal enterprise data in SQL data stores is often critical for interpreting both the veracity and relevance of big data from other sources. Many organizations find they still need to maintain a SQL structure for their enterprise data to support their own best business practices. Pushing everything into a non-structured format isn't integration, it is just homogenization. At the same time, trying to force structure onto all unstructured data is likely a wasted effort.

The goal of integration from the enterprise perspective may be less about structure than about organization. Tools like the new Oracle Data Integrator attempt to find balance by loading and transforming Hadoop data so that it can be more readily analyzed in conjunction with traditional enterprise data. This approach enables the fusion of data from multiple sources and stores during the analytics process, where integration is really needed. This middle-of-the road approach leaves the original data free to "be what it is", maintaining the hidden value it may hold for new methods of analysis in the future.

Start the conversation

0 comments

Register

I agree to TechTarget’s Terms of Use, Privacy Policy, and the transfer of my information to the United States for processing to provide me with relevant information as described in our Privacy Policy.

Please check the box if you want to proceed.

I agree to my information being processed by TechTarget and its Partners to contact me via phone, email, or other means regarding information relevant to my professional interests. I may unsubscribe at any time.