5 trends that are changing how we do big data

It’s time to rethink the who, what, where, why and how of big data. After a surge of important news in the past couple weeks, we’re approaching a period of relative calm and can finally assess how the space has evolved in the past year. Here are five trends shaping up that should change almost everything about big data in the near future, including how it’s done, who’s doing it and where it’s consumed. Feel free to share the trends you’re seeing in the comments.

The democratization of data science

The amount of effort being put into broadening the talent pool for data scientists might be the most important change of all in the world of data. In some cases, it’s new education platforms (e.g., Coursera and Udacity) teaching students fundamental skills in everything from basic statistics to natural language processing and machine learning. Elsewhere, it’s products such as 0xdata that aim to simplify and add scale to well-known statistical-analysis tools such as R, or, like Quid that try to mask the finer points of concepts such as machine learning and artificial intelligence behind well-designed user interfaces and slick visual representations. Platforms such as Kaggle have opened the door to crowdsourcing answers to tough predictive-modeling problems.

[youtube http://www.youtube.com/watch?v=e0WKJLovaZg]

Whatever the avenue, though, the end result is that individuals who have a little imagination, some basic computer science skills and a lot of business acumen can now do more with their data. A few steps down the ladder, companies such as Datahero, Infogram and Statwing are trying to make analytics accessible even to laypersons. Ultimately, all of this could result in a self-feeding cycle where more people start small, eventually work their way up to using and building advanced data-analysis products and techniques, and then equip the next generation of aspiring data scientists with the next generation of data applications.

From this point on — like with the Google MapReduce framework on which Hadoop’s version of MapReduce was modeled — it seems likely we’ll see the latter grow less important. Presumably, the Hadoop community will focus more on using the platform’s distributed nature to support real-time processing and other new capabilities that make Hadoop a better fit in next-generation data applications. If Hadoop can’t fill the void, there are plenty of people working on other technologies — Storm and Druid, for example — that will gladly do so.

The HBase NoSQL database that’s built atop the Hadoop Distributed File System is a good example of what’s possible when Hadoop is freed from the MapReduce constraints. Large web companies such as Facebook and eBay already use HBase to power transactional applications, and startups such as Drawn to Scale and Splice Machine have used HBase as the foundation for transactional SQL databases. More new products and projects, such as graph database Giraph, will look for ways to leverage HDFS because it gives them a file system that’s scalable, free, relatively mature and, perhaps most importantly, tied into the ever-growing Hadoop ecosystem.

Coming soon to an app near you

Of course, all of this technological improvement is nothing without applications to take advantage of it, so it’s good news that we’re seeing a wide range of approaches for making this happen. One of these approaches is making big data accessible to developers, which is where startups such as Continuuity, Infochimps and even Precog (a big data BI engine, by nature) come into play. They make it relatively easy for developers to create applications that tie at least some functions into a big data backend, sometimes via a process as simple as writing a script or generating a piece of code that programmers can insert directly into their application’s code.

Machine learning is everywhere

Machine learning has had something of a coming-out party in the past year and is now so prevalent it might be easy to mistake it for something that’s not difficult to do well. It’s easy to see why machine learning is so popular, though: In an age where consumers (and advertisers) want more personalization, and where computer systems are overwhelmed with data flying at them from all different directions, the prospect of writing models that continuously discover patterns among potentially countless data points has to be appealing.

Now, it’s difficult to imagine a new tech company launching that doesn’t at least consider using machine learning models to make its product or service more intelligent. Heck, even Microsoft appears to be making a big bet on machine learning as the foundation of a new revenue stream. The technology to store and process lots of data is out there, and the brainpower looks to be coming along as well. Soon, there will be few excuses for building applications that don’t learn as they go, for example, what users want to see, how systems fail or when customers are about to cancel a service.

They know where we go, who our friends are, what’s on our calendars and what we look at online. Thanks to a new generation of applications such as Siri, Saga and Google Now trying to serve as personal assistants, our phones can understand what we say, know the businesses we frequent and the foods we eat, and the hours we’re at home, at work or out on the town. Already, their developers claim such apps can augment our limited vantage point by automatically telling us the best directions to our upcoming appointment, or the best place to get our favorite foods in a city the app knows we haven’t been to before.