Big Data's 2 Big Years

During the past two years, Hadoop hit the big time, in-memory arrived, and practitioners saw data analysis deliver real results.

Two years might seem like a short time to be measuring shifts in technology, but in the fast-moving big-data arena, a lot has changed.

Just 26 months ago we published a collection of "12 Top Big Data Analytics Players," but changing times and technologies demanded this update: "16 Top Big Data Analytics Platforms." It's a fresh look at the data-management vendors offering the database management systems (DBMSs) and Hadoop platforms underpinning big-data analytics. (We did not include focused analytics vendors, such as Alpine Data Labs, Revolution Analytics, and SAS, nor NoSQL and NewSQL DBMS vendors, such as Couchbase, DataStax, and MongoDB, which deserve separate treatment.)

So what has changed in these two short years? Here are the three big factors.

1. Vendors expect Hadoop to be in the mix.Practically every vendor out there has embraced Hadoop, going well beyond the fledgling announcements and primitive "connectors" that were prevalent two years ago. Industry heavyweights IBM, Microsoft, Oracle, Pivotal, SAP, and Teradata are all selling and supporting Hadoop distributions -- partnering, in some cases, with Cloudera and Hortonworks. Four of these six have vendor-specific distributions, Hadoop appliances, or both.

Traditionalists complain that Hadoop remains a slow, primitive, and disparate collection of systems mired in iterative, hard-to-manage, hard-to-code MapReduce processing. But 2013 brought a Hadoop 2.0 release that promises easier management of myriad workload types, extending beyond MapReduce to improved SQL querying, graph analysis, and stream processing. In fact, SQL-on-Hadoop products and projects have exploded over the last year, and vendor options now range from Cloudera Impala, IBM BigSQL, Apache Drill, and a higher-performance Hive. They also include Pivotal HAWQ and InfiniDB engines running on HDFS, Polybase data exploration in Microsoft SQL Server, and HP Vertica and Teradata SQL-H exploration of HDFS with help from HCatalog.

2. Low-latency expectations are on the rise. With steady improvements in processing power and declining costs for performance, in-memory and even streaming processing speeds are increasingly in demand. SAP has been the most prominent champion here with its Hana platform, but IBM has introduced BLU Acceleration for DB2, and Microsoft and Oracle are preparing in-memory options for their flagship databases. Among data warehousing and data mart specialists, Teradata and others also are making the most of RAM by offering options for high RAM-to-disk ratios and providing ways -- automated in Teradata's case -- to pin the most-queried data into RAM.

In the Hadoop arena, projects such as Spark and Storm are pursuing in-memory and streaming performance at high scale for breakthrough applications in ad delivery, content personalization, and mobile geo-location services.

3. Practitioners get the big-data religion.The third -- and most important -- change in the big-data arena over the last two years has been in the awareness of practitioners. Tech buyers have opened their eyes to the falling cost of storing and using data and to the tremendous opportunities for them to make use of this information. Here are examples I've heard of in the past few days:

A payroll and benefits-management company that depends on processing fees now recognizes that it's sitting on a trove of data on hiring, salary, career, and economic trends, and analyzing that could become a new revenue source.

A WiFi infrastructure provider to retailers lives on razor-thin margins amid plenty of competition, so it's looking into ways to give retailers insight on the customers tapping into the hot spots. Such analysis could provide insight on how long people linger in a store, which could be combined with customer profile data and reveal whether customer profiles and traffic patterns are changing by store location.

A car manufacturer is streaming auto-performance data from satellite-connected vehicles so it can better understand service, maintenance, and warranty trends by model, with the potential to trigger proactive service recommendations.

An agricultural retailer is selling data on what seeds customers are buying and planting by store location.

We're also hearing a lot of hype and inflated claims around big data and, yes, the term itself has its flaws. But where there's smoke there's fire. We've recently reported on companies, including Amadeus, MetLife, Paytronix, and The Weather Company, that are seeing big returns on their big-data investments. Over the next two years we expect many of today's fledgling big-data projects to lead to very real and repeatable business successes. As always, we're ready to share the success stories and dissect the disasters.

Too many companies treat digital and mobile strategies as pet projects. Here are four ideas to shake up your company. Also in the Digital Disruption issue of InformationWeek: Six enduring truths about selecting enterprise software. (Free registration required.)

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

As Dr Victor Tang and MIT pointed out.. " I see NO NEW SCIENCE HERE!! only more data to work with!" and I agree with him as well as several others.. in his words.. "slapping together a bunch of bottle rockets with string does not make it a jet engine!!" Creating ever larger and larger containers for data (which is what all present technologies do, save one) by definition will not scale.. a perfect example of this is Bitcoin hash chain.. 2.5 exabytes and counting..? takes days to get any days and 50k $ machines to get any data out of it.

The BIG DATA truth of all this is really quite simple.. there is a reason for the existence theoretically of 6 normal forms of normalization .. it was not put there by accident.. only one or two systems have reach the 4th normal form. and only one has reached the 6th normal form.. the FUTURE is EFFICIENCY not SIZE.. Efficiency grants you speed, simplicity, and capacity.. where anything else simply cannot compete

Associative Data Management and Knowledge Operating System using a Data Instance centric architecture, where Data Instances are typically atomic. Each Data Instance can be at the center with all its associations. The base structures encapsulate the Data Instances and can generally be identical in form and function, and application independent. Encapsulate references can include references to all other directly related

independently encapsulated Data Instances. The encapsulated references can be both unique identifiers for each and every associated Data Instance and also logical indexes that encode the abstracted location of each Data Instance, making it possible to both identify and locate any Data Instance using the same reference key.

http://www.atomicdb.org/

It's been 50 years since Ed Codd announced his famous 12 Rules on how to store data in 2-dimensional tables and gave birth to the Relational Database

In the last 50 years, everything in the world of data management technology has evolved with dizzying speed, except this one area. We still build most of our information systems with 50-year old technology that causes enormous frustration, cost and struggle

Fifty years later, we have invented and proven a technology that easily captures information will all the context and relationships that the organization wants. In this new worldThat's not all there is to this new technology. There are other capabilities that make it essential and mandatory for the next 50 years:

the ability to change and evolve the system on the fly

the ability to connect organically with any other system built in the same technology (solving the warehouse problem)

the ability to store and transmit information with a much higher level of inherent security

It's still early days for new platforms like Hadoop and (soon to be covered in a separate collection) NoSQL and NewSQL databases. According to recent research by Gartner, nearly half of practitioners in a survey said they're unsure they'll get value from the platform. About 30% of enterprises that have invested in big data technologies including Hadoop and NoSQL database, according to the data.

Breaking down the Hadoop users, Cloudera says 1/3rd are just getting started and experimenting, 1/3 are using the platform for specific, mission-critical workloads, and 1/3 are maturing into multiple production applications and are actually using the platform to replace incumbent data-management tech (like databases, ETL etc.). But back to Gartner's stats, 31% have no plans to roll out a big data project - about the same % as in 2012. Long story short, Hadoop is far from established as a must-have enterprise data-management platform.

ITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.