machine learning

Machine Learning is a topic that I’ve been interested in for years, but have
never taken the time to learn. I’ve recently found some articles and projects
that looked interesting, which I link to later. But before I really jump into
any of those I thought I should start at the beginning with an introduction
to machine learning.

machine learning finds the decision surface (aka decision boundary), separates one class from another to generalize new data points

algorithms are divided into two types: supervised vs unsupervised

Bayes Algorithm

Prior Probability x Test Evidence -> Posterior Probability

Naive Bayes algorithm

supervised

usually used for text learning

looks at word frequencies, not word order

given the frequency of words for a person, multiply the probability for each word and then multiply that by the prior probability. Do that for each person. For example, If you have two people, perform the multiplication for Person A and Person B, this will give you the ratio of whether it was Person A or Person B.

It is good for classifying texts because of its simplicity and the independent features.

Example where it fails, “Chicago Bulls”. Since the algorithm ignores word order it would treat it as “Chicago” and “Bulls”.

March 07, 2007

Support Vector Machine (SVM)

find separating line between data of two different classes, called a hyper line

the “best” line maximizes the distance to the nearest point, this is called the margin

should “do the best it can”, when a clear line can’t be created (b/c of outliers)

sometimes you have to add a new feature (such as x^2+y^2, or absolute value of x |x|, etc) so the SVM can linearly separate the two classes of data