Decision Boundary Visualization(A-Z)

Decision Boundary Visualization(A-Z)Meaning, Significance, ImplementationNavoneel ChakrabartyBlockedUnblockFollowFollowingJan 16Classification problems have been very common and essential in the field of Data Science.

Also in Computer Vision, projects like Diabetic Retinopathy or Glaucoma Detection, Texture Analysis is often used now-a-days instead of Classical Machine Learning with conventional Image Processing or Deep Learning.

Although Deep Learning has been the state-of-the-art in Diabetic Retinopathy as per the research paper:“A Deep Learning Method for the detection of Diabetic Retinopathy” [1].

In classification problems, prediction of a particular class is involved among multiple classes.

In other words, it can also be framed in a way that a particular instance (data-point in terms of Feature Space Geometry) needs to be kept under a particular region (signifying the class) and needs to separated from other regions (signifying other classes).

This separation from other regions can be visualized by a boundary known as Decision Boundary.

This visualization of the Decision Boundary in feature space is done on a Scatter Plot where every point depicts a data-point of the data-set and axes depicting the features.

The Decision Boundary separates the data-points into regions, which are actually the classes in which they belong.

Importance/Significance of a Decision Boundary:After training a Machine Learning Model using a data-set, it is often necessary to visualize the classification of the data-points in Feature Space.

Decision Boundary on a Scatter Plot serves the purpose, in which the Scatter Plot contains the data-points belonging to different classes (denoted by colour or shape) and the decision boundary can be drawn following many different strategies:Single-Line Decision Boundary: The basic strategy to draw the Decision Boundary on a Scatter Plot is to find a single line that separates the data-points into regions signifying different classes.

Now, this single line is found using the parameters related to the Machine Learning Algorithm that are obtained after training the model.

The line co-ordinates are found using the obtained parameters and intuition behind the Machine Learning Algorithm.

Deployment of this strategy is not possible if the intuition and working mechanism of the ML Algorithm is not known.

Contour-Based Decision Boundary: Another strategy involves drawing contours which are regions each enclosing data-points with matching or closely matching colours-depicting classes to which the data-points belong and contours-depicting the predicted classes.

This is the mostly followed strategy as this does not employ parameters and related calculations of the Machine Learning Algorithm obtained after Model Training.

But on the other hand, this does not perfectly separate data-points using a single line that can only be given by obtained parameters after training and their co-ordinates calculation.

Exemplar Implementation of Single-Line Decision Boundary:Here, I am going to demonstrate Single-Line Decision Boundary for a Machine Learning Model based on Logistic Regression.

, theta_n are the parameters of Logistic Regression and x_1, x_2, …, x_n are the featuresSo, h(z) is a Sigmoid Function whose range is from 0 to 1 (0 and 1 inclusive).

For plotting Decision Boundary, h(z) is taken equal to the threshold value used in the Logistic Regression, which is conventionally 0.

5.

So, ifthen,Now, for plotting Decision Boundary, 2 features are required to be considered and plotted along x and y axes of the Scatter Plot.

So,where,where x_1 is the original feature of the datasetSo, 2 values of x’_1 are obtained along with 2 corresponding x’_2 values.

The x’_1 are the x extremes and x’_2 are the y extremes of the Single Line Decision Boundary.

Application on a Fictional Dataset:The Dataset contains marks obtained by 100 students in 2 exams and the label (0/1), that indicates whether the student will be admitted to a university (1 or negative) or not (0 or positive).

The Dataset is available atnavoneel1092283/logistic_regressionContribute to navoneel1092283/logistic_regression development by creating an account on GitHub.

github.

comProblem Statement: “Given the marks obtained in 2 exams, predict whether the student will be admitted to the university or not using Logistic Regression”Here, the marks in 2 exams will be the 2 features that are considered.

The following is the Implemented Logistic Regression in 3 modules.

The Detailed Implementation is given in the article,Logistic Regression in Python from scratchClassification is a very common and important variant among Machine Learning Problems.

ylabel("Marks obtained in 2nd Exam")Obtained Contour-Based Decision Boundary where yellow -> Admitted and blue -> Not AdmittedThis method is apparently more convenient as no intuition and hypothesis or any Mathematics behind the Machine Learning Algorithm is required.

All that is required, is the knack of Advanced Python Programming !!!!So, it is a general method of plotting Decision Boundaries for any Machine Learning Model.

In most Practical and Advanced-Level projects, many features are being involved.

Then, how to plot Decision Boundaries in 2-D Scatter Plots?In those cases, there are multiple way outs:Feature Importance Scores given by Random Forest Classifier or Extra Trees Classifier can be used, to obtain 2 most important features and then the Decision Boundary can be plotted on the Scatter Plot.

Dimension Reduction techniques like Principal Component Analysis (PCA) or Linear Discriminant Analysis (LDA) can be used for reducing N number of features into 2 features (n_components = 2) as the information or interpretation of the N features get embedded into the 2 features.

Then, Decision Boundary can be plotted on the Scatter Plot considering the 2 features.