Decisions treeA supporting toolDecision tree is a supporting tool that uses a tree like graph or model of decisions.Useful ClassifierThe decision tree approach is most useful in classification problem.Popular Algorithms of Decision tree are-ID3C4.5ProbabilityWhat is probability?Probability is a measure or estimation of how likely it is that something will happen or that a statement is true.Probabilities are given a value between 0 and 1. The higher the degree of probability, the more likely the event is to happen.Proposed AlgorithmUnited International UniversityThesis Supervisor:Dr. Chowdhury Mofizur RahmanPro-Vice chancellorUnited International UniversityAditi BiswasKazi Mohammad EhsanHasnaeen Ferdous Bin HashemConditional ProbabilityA conditional probability is the probability that an event will occur, when another event is known to occur or to have occurred.

Given two events A and B with P(B)>0, the conditional probability of A given B is defined as the quotient of the joint probability of A and B, and the probability of B:

Naïve Bayesian classifier is a simple probabilistic classifier based on probability mode, which can be trained very efficiently in a supervised learning.It is based on static probability theory.

Naive Bayesian ClassifierGaussian Distributions

The Gaussian distribution is the classic “bell-shaped curve” distribution. The mathematical function for computing the probability density of the Gaussian distribution at a particular point X is:

Input: Total training Data set, % of delete info

Output: Reduced Training Data setRandomize Total DataSeparate the Data part from arff fileAttribute <- Extract Attributes from DataClass <- Extract Class from DataclassProbility <- Calculate the class probability classMean <- calculate the class mean according to class of each attributes for all dataclassStd <- calculate the class standard deviation according to class of each attributes for all datafor all Data (i=0 to length of data) Initialize probability variable x  1for all classes (j=0 to length of class) doif class = class of datafor all attributes (k=0 to all attributes) do x=x *GuassianPF(attributes[k],classMean[j],classStd[j]) Gaussian Probability Calculations for each attributes of datawith attribute value class mean and class std End forWeight x * classProbabilityWrite weight according to data indexEnd ifEnd forEnd forShort the total Data in ascending orderDelete data as % inputCreate new arff file with reduced data

Calculated WeightSorted Data according to WeightFinal Dataset (1% Deleted)WEKA is a strong tool to make Decision Trees and AnalyzingThere are several Decision Tree making algorithms:J48ID3RAT etcWEKA:Developed at the University of Waikato, New ZealandIt is a java based software.

WEKA (Continues)Work in WEKAWork in WEKAWork in WEKAWork in WEKAWork in WEKAWork in WEKAWork i WEKAWe will consider:Data Deleted PercentageNumber of InstancesNumber of LeavesSize of TreeIncorrectly Classified Percentage

Performance AnalysisPerformance AnalysisPerformance AnalysisReferencesConclusion# Deleting unimportant data from data set.# Performance is related with number of leaves of the tree.# Number of Leaves and Tree Size will be reduce.# Performance better for large Data Set compare to smaller Data Set.# Training time will be less as well as prediction time.# Saving storageFlow Chart (STD)