You might think journalism and data science don’t really go together, but on that, I differ. Below are some thoughts on the topic and lessons we can draw from data science on how to make journalism better and more effective in these times.

Data Science has become an exploding field in recent years, and depending on whether you are focusing on machine learning, artificial intelligence, or citizen data science, the discipline of data science is creating very high expectations.

There is indeed much promise for data science, where predictive models and decision engines can target skin cancer in patient imagery, presciently recommend a new product that piques your interest, or power your self-driving car to evade a potential accident.

However, promise requires much effort for it to be realized. It takes a lot of work and brand new engineering disciplines that are not yet mature or even employed on a wide scale. As there is greater recognition of the value of data science, and the generation of data is increasing at exponential rates, this engineering effort is starting and will grow beyond its adolescence soon.

This is why we are at the advent of a new engineering discipline that can truly realize the promise of data science – a discipline that I call “analytics engineering”.

Richard Feynman is one of the greatest scientific minds, and what I love about him, aside from his brilliance, is his perspective on why we perform science. I’ve been reading the compilation of short works of Feynman titled The Pleasure of Finding Things Out, and I recently came across a section that really hit home with me.

In the world of data science, much is made about the algorithms used to work with data, such as random forests or k-mean clustering. However, I believe there is a missing component – one that deals the fundamentals underlying data science, and that is the real science of data science.

I’ve been performing data science before there was a field called “data science“, so I’ve had the opportunity to work with and hire a lot of great people. But if you’re trying to hire a data scientist, how do you know what to look for, and what should you consider in the interview process?

I’ve been doing what is now called “data science” since the early 1990s and have helped to hire numerous scientists and engineers over the years. The teams I’ve had the opportunity to work with are some of the best in the world, tackling some of the most challenging problems facing our country. These folks are also some of the smartest people I’ve ever had the opportunity to work with.

That said, not everyone is a good fit, and the discipline of data science requires important key elements. Hiring someone into your team is incredibly important to your business, especially if you’re a small startup or building a critical internal data science team; mistakes can be expensive in both time and money. This can be even more intimidating if you don’t have the background or experience in hiring scientists, especially someone responsible for this new discipline of working with data.

Ever wonder what your own personal network looks like? You are likely connected to many different groups (family, friends, community, work), but do you know how they are connected? Or are they connected at all? Are you the glue that connects these various groups?

This is a great age we’re living in, and I’m glad to be involved with developing lots of really advanced technologies. One of the technology areas that I’m really fascinated with has been pushed forward by Stephen Wolfram. He created the industry standard computing environment Mathematica, which now serves as the engine behind his company’s newest creation, Wolfram|Alpha. (I’ve written a few posts on Wolfram|Alpha in the past, and you can read them here and here).

Imagine a guy with glasses who used to model baseball stats and play online poker nailing the outcome of the 2012 elections. And when I say “nailing”, I mean that he correctly predicted the U.S. Presidential contest in every one of the 50 states (and nearly every U.S. Senate race, too). He even performed better than some of the most widely-used polling firms. Now imagine that he gives his thoughts on making these types of predictions. That’s exactly what Nate Silver does in his new book The Signal and the Noise" target="_blank">The Signal and the Noise.

I’ve worked in what’s now being called “data science” for nearly twenty years. The title of Silver’s book – The Signal and the Noise – presents an important and sometimes overlooked part of this science. The “signal” is what we’re looking for in the data, and the “noise” is all the stuff in the data that gets in the way of what we’re looking for.

Stephen Wolfram is doing it again. I’m a big fan of Wolfram (you can read some of my other posts here, here, and here…), and am always intrigued by what he comes up with. A couple of days ago, Wolfram launched his latest contribution to data science and computational understanding – Wolfram|Alpha Pro.

Here’s an overview of what the new Pro version of Wolfram|Alpha can provide:

With Wolfram|Alpha Pro, you can compute with your own data. Just input numeric or tabular data right in your browser, and Pro will automatically analyze it—effortlessly handling not just pure numbers, but also dates, places, strings, and more.

Zoom in to see the details of any output—rendering it at a larger size and higher resolution.

Perform longer computations as a Wolfram|Alpha Pro subscriber by requesting extra time on the Wolfram|Alpha compute servers when you need it.

Licenses of prototying and analysis software go for several thousand dollars (Matlab, IDL, even Mathematica) – student versions can be had for a few hundred dollars, but you can’t leverage data science for business purposes on student licenses.

Wolfram|Alpha Pro lets anyone with a computer, an internet connection, and a small budget to leverage the power of data science. Right now, you can get a free trial subscription, and from there, the costs are $4.99/month. This price is introductory, but it could be sedutive enough to attract a lot of users (I’ve already signed up – all you need for the free trial is an e-mail address…)

One option that I find really interesting is Wolfram’s creation of the Computable Document Format (CDF), which interactivity lets you get dynamic versions of existing Wolfram|Alpha output as well as access to new content using interactive controls, 3D rotation, and animation. It’s like having Wolfram|Alpha is embedded in the document.

I had attended a Wolfram Science Conference back in 2006 and saw the potential for such a document format back then. There were a number of presenters who later wrote up their work into a paper, published by the journal Complex Systems. Since many of the presentations utilized a real interactivity with the data, I could see where much of the insight would be lost when people tried to write things down and limit their visualizations to simple, static graphs and figures.

I remember contacting Jean Buck at Wolfram Research, and recommending such a format. Who knows whether that had any impact, but I’m certainly glad to see that this is finally becoming a reality. I actually got the opportunity to meet Wolfram at the conference (he even signed a copy of his Cellular Automata and Complexity for me… – Jean was kind enough to arrange that for me – thanks, Jean!)