Machine Learning for Beginners, Part 7 – Naïve Bayes

In my last blog, I discussed k-Nearest Neighbor machine learning algorithms with an example that was hopefully easy to understand for beginners. During the summer of 2017 I began a five-part series on types of machine learning. That series included more details about K-means clustering, Singular Value Decomposition, Principal Component Analysis, Apriori and Frequent Pattern-Growth. Today I want to expand on the ideas presented in my Naive Bayes “Data Science in 90 Seconds” You Tube video and continue the discussion in plain language.

If you recall from earlier discussions, unsupervised machine learning is the ‘task of inferring a function to describe hidden structure from unlabeled data’. In unsupervised machine learning, the computer takes observations of data that do not have a predetermined class or category and tries to predict future data from it. Recall that making estimated predictions of the future is one of the ways data science differs from data analysis or business intelligence.

Naïve Bayes is a fast way to group data based on Bayes Theorem of probability, which predicts the class of an unknown data set based on the assumption that features are independent of one another. You can think of a feature as a characteristic or attribute of the data. The Naïve Bayes classifier can outperform other sophisticated classification methods, is widely used among data scientists and is easy to interpret and explain to a non-technical audience.

Now let’s look at an example. Let’s assume we have name data of doctors that are labeled as either female or male. We want to predict whether a doctor named ‘Drew’ is male or female. Let’s use Naive Bayes to answer this classification problem. We assume the name ‘Drew’ can be either male or female since we know famous people of both genders that have this name.

The attribute of the data we’re trying to classify is the ‘name.’ The name ‘Drew’ is found in our data a total of three times out of eight. “Drew” is female two out of eight times and male one out of eight times. So the Naive Bayes classifier would predict that this Doctor named Drew is more likely to be female than male.

There are many domains where the Naïve Bayes algorithm can quickly and accurately make predictions. In my next blog, I’ll be talking about results from a data challenge I’m working on.