Get More Stuff

Introduction to Naive Bayes Classification Algorithm in Python and R

Naive Bayes is a machine learning algorithm for classification problems. It is based on Bayes’ probability theorem. It is primarily used for text classification which involves high dimensional training data sets. A few examples are spam filtration, sentimental analysis, and classifying news articles.

It is not only known for its simplicity, but also for its effectiveness. It is fast to build models and make predictions with Naive Bayes algorithm. Naive Bayes is the first algorithm that should be considered for solving text classification problem. Hence, you should learn this algorithm thoroughly.

Table of Contents

Basics of Naive Bayes

The mathematics of the Naive Bayes

Variations of Naive Bayes

Advantages and Disadvantages

Python and R implementation

Applications of Naive Bayes

What is Naive Bayes algorithm?

Naive Bayes algorithm is the algorithm that learns the probability of an object with certain features belonging to a particular group/class. In short, it is a probabilistic classifier. You must be wondering why is it called so?

The Naive Bayes algorithm is called "naive" because it makes the assumption that the occurrence of a certain feature is independent of the occurrence of other features.

For instance, if you are trying to identify a fruit based on its color, shape, and taste, then an orange colored, spherical, and tangy fruit would most likely be an orange. Even if these features depend on each other or on the presence of the other features, all of these properties individually contribute to the probability that this fruit is an orange and that is why it is known as "naive."

As for the “Bayes” part, it refers to the statistician and philosopher, Thomas Bayes and the theorem named after him, Bayes' theorem, which is the base for Naive Bayes Algorithm.

The Mathematics of the Naive Bayes Algorithm

As already said, the basis of Naive Bayes algorithm is Bayes' theorem or alternatively known as Bayes' rule or Bayes' law. It gives us a method to calculate the conditional probability, i.e., the probability of an event based on previous knowledge available on the events. More formally, Bayes' Theorem is stated as the following equation:

Let us understand the statement first and then we will look at the proof of the statement. The components of the above statement are:

: Probability (conditional probability) of occurrence of event given the event is true

and : Probabilities of the occurrence of event and respectively

: Probability of the occurrence of event given the event is true

The terminology in the Bayesian method of probability (more commonly used) is as follows:

is called the proposition and is called the evidence.

is called the prior probability of proposition and is called the prior probability of evidence.

is called the posterior.

is the likelihood.

This sums the Bayes' theorem as

Let us take an example to better understand Bayes' theorem.

Suppose you have to draw a single card from a standard deck of 52 cards. Now the probability that the card is a Queen is . If you are given evidence that the card that you have picked is a face card, the posterior probability can be calculated using Bayes' Theorem as follows:

Now because given the card is Queen, it is definitely a face card. We have already calculated . The only value left to calculate is , which is equal to as there are three face cards for every suit in a deck. Therefore,

Derivation of Bayes' Theorem

For a joint probability distribution of two events and , , the conditional probability,

Similarly,

Therefore,

Bayes' Theorem for Naive Bayes Algorithm

In a machine learning classification problem, there are multiple features and classes, say, . The main aim in the Naive Bayes algorithm is to calculate the conditional probability of an object with a feature vector belongs to a particular class ,

for

Now, the numerator of the fraction on right-hand side of the equation above is

The conditional probability term, becomes because of the assumption that features are independent.

From the calculation above and the independence assumption, the Bayes theorem boils down to the following easy expression:

for

The expression is constant for all the classes, we can simply say that

for

How does the Naive Bayes Algorithm work?

So far, we learned what the Naive Bayes algorithm is, how the Bayes theorem is related to it, and what the expression of the Bayes' theorem for this algorithm is. Let us take a simple example to understand the functionality of the algorithm. Suppose, we have a training data set of 1200 fruits. The features in the data set are these: is the fruit yellow or not, is the fruit long or not, and is the fruit sweet or not. There are three different classes: mango, banana, and others.

Step 1: Create a frequency table for all the features against the different classes.

Name

Yellow

Sweet

Long

Total

Mango

350

450

0

650

Banana

400

300

350

400

Others

50

100

50

150

Total

800

850

400

1200

What can we conclude from the above table?

Out of 1200 fruits, 650 are mangoes, 400 are bananas, and 150 are others.

350 of the total 650 mangoes are yellow and the rest are not and so on.

800 fruits are yellow, 850 are sweet and 400 are long from a total of 1200 fruits.

Let's say you are given with a fruit which is yellow, sweet, and long and you have to check the class to which it belongs.

Step 2: Draw the likelihood table for the features against the classes.

Name

Yellow

Sweet

Long

Total

Mango

350/800=P(Mango|Yellow)

450/850

0/400

650/1200=P(Mango)

Banana

400/800

300/850

350/400

400/1200

Others

50/800

100/850

50/400

150/1200

Total

800=P(Yellow)

850

400

1200

Step 3: Calculate the conditional probabilities for all the classes, i.e., the following in our example:

Step 4: Calculate . In our example, the maximum probability is for the class banana, therefore, the fruit which is long, sweet and yellow is a banana by Naive Bayes Algorithm.

In a nutshell, we say that a new element will belong to the class which will have the maximum conditional probability described above.

Variations of the Naive Bayes algorithm

There are multiple variations of the Naive Bayes algorithm depending on the distribution of . Three of the commonly used variations are

Gaussian: The Gaussian Naive Bayes algorithm assumes distribution of features to be Gaussian or normal, i.e.,Read more about it here.

Multinomial: The Multinomial Naive Bayes algorithm is used when the data is distributed multinomially, i.e., multiple occurrences matter a lot. You can read more here.

Bernoulli: The Bernoulli algorithm is used when the features in the data set are binary-valued. It is helpful in spam filtration and adult content detection techniques. For more details, click here.

Pros and Cons of Naive Bayes algorithm

Every coin has two sides. So does the Naive Bayes algorithm. It has advantages as well as disadvantages, and they are listed below:

Pros

It is a relatively easy algorithm to build and understand.

It is faster to predict classes using this algorithm than many other classification algorithms.

It can be easily trained using a small data set.

Cons

If a given class and a feature have 0 frequency, then the conditional probability estimate for that category will come out as 0. This problem is known as the "Zero Conditional Probability Problem." This is a problem because it wipes out all the information in other probabilities too. There are several sample correction techniques to fix this problem such as "Laplacian Correction."

Another disadvantage is the very strong assumption of independence class features that it makes. It is near to impossible to find such data sets in real life.

Naive Bayes with Python and R

Let us see how we can build the basic model using the Naive Bayes algorithm in R and in Python.

R Code

To start training a Naive Bayes classifier in R, we need to load the e1071 package.

R

1

library(e1071)

To split the data set into training and test data we will use the caTools package.

R

1

library(caTools)

The predefined function used for the implementation of Naive Bayes in R is called naiveBayes(). There are only a few parameters that are of use:

R

1

naiveBayes(formula,data,laplace=0,subset,na.action=na.pass)

formula: The traditional formula

data: The data frame containing numeric or factor variables

laplace: Provides a smoothing effect

subset: Helps in using only a selection subset of the data based on some Boolean filter

na.action: Helps in determining what is to be done when a missing value in the data set is encountered

Let us take the example of the iris data set.

R

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

>library(e1071)

>library(caTools)

>data(iris)

>iris$spl=sample.split(iris,SplitRatio=0.7)

# By using the sample.split() we are creating a vector with values TRUE and FALSE and by setting

the SplitRatio to0.7,we are splittingthe original Iris dataset of150rows to70%training

and30%testing data.

>train=subset(iris,iris$spl==TRUE)#the subset of iris dataset for which spl==TRUE

Applications

The Naive Bayes algorithm is used in multiple real-life scenarios such as

Text classification: It is used as a probabilistic learning method for text classification. The Naive Bayes classifier is one of the most successful known algorithms when it comes to the classification of text documents, i.e., whether a text document belongs to one or more categories (classes).

Spam filtration: It is an example of text classification. This has become a popular mechanism to distinguish spam email from legitimate email. Several modern email services implement Bayesian spam filtering.Many server-side email filters, such as DSPAM, SpamBayes, SpamAssassin, Bogofilter, and ASSP, use this technique.

Sentiment Analysis: It can be used to analyze the tone of tweets, comments, and reviews—whether they are negative, positive or neutral.

Recommendation System: The Naive Bayes algorithm in combination with collaborative filtering is used to build hybrid recommendation systems which help in predicting if a user would like a given resource or not.

Conclusion

This article is a simple explanation of the Naive Bayes Classification algorithm, with an easy-to-understand example and a few technicalities.

Despite all the complicated math, the implementation of the Naive Bayes algorithm involves simply counting the number of objects with specific features and classes. Once these numbers are obtained, it is very simple to calculate probabilities and arrive at a conclusion.

Hope you are now familiar with this machine learning concept you most like would have heard of before.