mardi 6 novembre 2012

5 trends that are changing how we do big data

In just a few years, big data has turned from a buzzword and
concept best left for large web companies into a force that drives much
of our digital lives. Here are five technological trends that will
change how data is processed and consumed going forward.

It’s time to rethink the who, what, where, why and how of big data.
After a surge of important news in the past couple weeks, we’re
approaching a period of relative calm and can finally assess how the
space has evolved in the past year. Here are five trends shaping up that
should change almost everything about big data in the near future,
including how it’s done, who’s doing it and where it’s consumed. Feel
free to share the trends you’re seeing in the comments.

The democratization of data science

The amount of effort being put into broadening the talent pool for
data scientists might be the most important change of all in the world
of data. In some cases, it’s new education platforms (e.g., Coursera and
Udacity) teaching students fundamental skills in everything from basic statistics to natural language processing and machine learning. Elsewhere, it’s products such as 0xdata that aim to simplify and add scale to well-known statistical-analysis tools such as R, or, like Quid
that try to mask the finer points of concepts such as machine learning
and artificial intelligence behind well-designed user interfaces and
slick visual representations. Platforms such as Kaggle have opened the
door to crowdsourcing answers to tough predictive-modeling problems.
Whatever the avenue, though, the end result is that individuals who
have a little imagination, some basic computer science skills and a lot
of business acumen can now do more with their data. A few steps down the
ladder, companies such as Datahero, Infogram and Statwing
are trying to make analytics accessible even to laypersons. Ultimately,
all of this could result in a self-feeding cycle where more people
start small, eventually work their way up to using and building advanced
data-analysis products and techniques, and then equip the next
generation of aspiring data scientists with the next generation of data
applications.

From this point on — like with the Google MapReduce framework
on which Hadoop’s version of MapReduce was modeled — it seems likely
we’ll see the latter grow less important. Presumably, the Hadoop
community will focus more on using the platform’s distributed nature to
support real-time processing and other new capabilities that make Hadoop
a better fit in next-generation data applications. If Hadoop can’t fill the void, there are plenty of people working on other technologies — Storm and Druid, for example — that will gladly do so.
The HBase NoSQL database that’s built atop the Hadoop Distributed
File System is a good example of what’s possible when Hadoop is freed
from the MapReduce constraints. Large web companies such as Facebook and eBay already use HBase to power transactional applications, and startups such as Drawn to Scale and Splice Machine have used HBase as the foundation for transactional SQL databases. More new products and projects, such as graph database Giraph,
will look for ways to leverage HDFS because it gives them a file system
that’s scalable, free, relatively mature and, perhaps most importantly,
tied into the ever-growing Hadoop ecosystem.

Machine learning is everywhere

Machine learning has had something of a coming-out party in the past
year and is now so prevalent it might be easy to mistake it for
something that’s not difficult to do well. It’s easy to see why
machine learning is so popular, though: In an age where consumers (and
advertisers) want more personalization, and where computer systems are
overwhelmed with data flying at them from all different directions, the
prospect of writing models that continuously discover patterns among potentially countless data points has to be appealing.
Here’s a small sample of apps you’ve likely heard of, or that we’ve covered, that rely machine learning to work their magic: Prismatic, Summly, Trifacta, CloudFlare, Twitter, Google, Facebook, Bidgely, Healthrageous, Predilytics, BloomReach, DataPop, Gravity. I could go on for days, I think.

Prismatic learning my interests

Now, it’s difficult to imagine a new tech company launching that
doesn’t at least consider using machine learning models to make its
product or service more intelligent. Heck, even Microsoft appears to be making a big bet on machine learning
as the foundation of a new revenue stream. The technology to store and
process lots of data is out there, and the brainpower looks to be coming
along as well. Soon, there will be few excuses for building
applications that don’t learn as they go, for example, what users want
to see, how systems fail or when customers are about to cancel a service.

Mobile data as the engine for AI

Long before Skynet takes over and the machines turns on humans, our
mobile phones will know better than us what we want to do. That’s
because until technologies like Google’s Project Glass actually make their way into the wild, our phones and the apps on them are probably the richest source of personal data around. And thanks to machine learning, speech recognition and other technologies, they’re able to make a lot of sense of what they’re given.They
know where we go, who our friends are, what’s on our calendars and what
we look at online. Thanks to a new generation of applications such as Siri, Saga and Google Now
trying to serve as personal assistants, our phones can understand what
we say, know the businesses we frequent and the foods we eat, and the
hours we’re at home, at work or out on the town. Already, their
developers claim such apps can augment our limited vantage point by
automatically telling us the best directions to our upcoming
appointment, or the best place to get our favorite foods in a city the
app knows we haven’t been to before.
The race is officially on to see who can build the smartest app, pull in the most data sources and figure out how to best display it all on a 4-inch screen.Feature image courtesy of Shutterstock user Sebastian Kaulitzki.