Data Science Buzzwords Demystified

Amanda Sivaraj -
July 6, 2015

Predictive analytics. Data science. Machine learning.

It seems like lots of people are excited about these words, considering how they keep getting hashtagged all over the place. But let’s be honest — how many of us actually understand what they mean or how they relate to one another?

We’ve compiled a list of jargon related to the emerging data science field so you won’t have to nod your head in seeming understanding the next time your boss starts rambling on about it (he or she probably doesn’t understand it either – you should probably leave this article up on your screen when you head to lunch). Let’s find out what these buzzwords actually mean and how they can help your business.

Data Mining

Textbook Definition

“The practice of examining large databases in order to generate new information.” (Google).

Human Definition

The practice of sending scores of minions to miserably scrape through huge databases in the hope of finding new information. Alternatively you could write a script and get a computer to do it for you, but where’s the fun in that?

Alright, alright, let’s get serious. So perhaps your company has a huge collection of information about your customers: birthdays, names, emails, titles, etc… We’ll call this collection of information your “customer database.” Now, let’s say you want to determine the average age of your customers. To get a computer to calculate this average, you’ll need to a create a list of birthdays for all the customers for whom you have this information. In order to build this list you’ll need to extract only the birthdays from the full collection of information in your customer database. This extraction process is a simple example of data mining, and it’s an essential step that you must complete before you can move on to any of the other tools in the space.

Predictive Analytics

Textbook Definition

“Predictive analytics is the practice of extracting information from existing data sets in order to determine patterns and predict future outcomes and trends. Predictive analytics does not tell you what will happen in the future. It forecasts what might happen in the future with an acceptable level of reliability, and includes what-if scenarios and risk assessment.” (Webopedia).

Human Definition

Predictive analytics is an umbrella term for any type of quantitative technique that uses existing data to make guesses about the probability of something happening in the future. Forecasting the weather is one example — there will be a 60% chance of rain…and a 99% chance that the weatherman is wrong, amiright? Jokes aside though, it’s a well-developed discipline. In order to discover these probabilities, experts create “predictive models” — and hey, what do you know — that’s our next term. It’s almost like I planned this.

Predictive Modeling

Textbook Definition

Predictive modeling is a process used in predictive analytics to create a statistical model of future behavior. (Techtarget).

Human Definition

So, you want to predict trends? This is a process you’d use to build a statistical model that examines historical data to identify risks and opportunities. Then you can figure out what new, illogical behavior people will start to exhibit next and exploit it to make millions of dollars, you shameless opportunist. You can figure out the probability of something happening based on certain conditions related to its occurrence. For instance, say you’re trying to guess whether people will still buy plane tickets if you increase the price by 30%. Also, it’s the holiday season and you know that demand for plane tickets skyrockets. Using statistical techniques, you could analyze these factors and determine the likelihood of people continuing to purchase tickets despite the hike in price.

“But do I have to do this manually!?” you cry.

Well, no. You could use this powerful, newfangled technology to do it…

Machine Learning

Textbook Definition

“Machine learning explores the construction and study of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from example inputs in order to make data-driven predictions or decisions, rather than following strictly static program instructions.” (Wikipedia)

Human Definition

A class of computer algorithms and mathematical models that allow machines to perform general tasks, like identifying human faces in photos. The models are used to make predictions and decisions, which you can then use to solve real world problems, such as understanding how your customers feel about your brand across various social media channels. The neat thing is that instead of hiring 100 people to analyze 1,000 data points each, you could get a single machine to do it in a fraction of the time. It’s a form of predictive modeling that you’d use if you wanted to claim that you were feeling merciful towards your underlings but mostly wanted to improve efficiency.

Data Science

Textbook Definition

“In general terms, Data Science is the extraction of knowledge from large volumes of data that are structured or unstructured, which is a continuation of the field data mining and predictive analytics, also known as knowledge discovery in databases (KDD).” (Wikipedia)

Human Definition

First off, who coined “KDD?” Ugh.

Anyway, let’s discuss structured and unstructured data. Structured data refers to information that’s very well organized and easily searchable — for instance, a database full of records about your employee’s birthdates and job titles. Unstructured data, by comparison, is basically one giant, hopeless mess of text-heavy information like metadata or social media posts. Data scientists, then, are highly qualified people with expertise in these areas:
As you might guess, their unique skillset is difficult and expensive to come by.

So, what if you don’t have the funds to hire a data science team but want to use machine learning to collect those insights? Well, there is something else you could do…

Pre-trained Model

Textbook Definition

A model in computational science that requires extensive computational resources to study the behavior of a complex system by computer simulation. (Wikipedia)

Human Definition

Okay, that made absolutely no sense so let’s just pretend we didn’t see that.

A pre-trained model is a machine learning model that’s already in existence for performing a certain task so you don’t have to break your bank account hiring a team to collect and label hundreds of thousands of data points, split the dataset, set evaluation criteria, extract features, train the model, and then package it all into some kind of software so that you can actually make sense of it. Instead, you can just ask one of your software developers to easily integrate the pre-trained model’s capabilities into your software.
It’s pretty nifty.

There’s one more major buzzword that you’ve almost certainly heard but we haven’t discussed yet:

Big Data

Textbook Definition

“Extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.” (Google).

Human Definition

Honestly there are so many definitions for this thing that full articles have been written about it. People are banging their heads on walls about it. Hadoop this, Hadoop that. Don’t succumb to the hype! Big data isn’t necessary for solving most of the problems you might be facing — even huge companies like Facebook and Yahoo typically don’t deal with it. In fact, trying to gather more data than you need may end up costing you much more than you’ll actually get out of it.