Sentiment analysis using OpenNLP document categorizer

We will talk again about sentiment analysis, this time we will solve the problem using a different approach. Instead of naive Bayes, we will use Apache OpenNLP and more precisely, the Document Categorizer.

About Apache OpenNLP Document Categorizer

The Apache OpenNLP Document Categorizer can be used to classify text into pre-defined categories. This is achieved by using the maximum entropy algorithm, also named MaxEnt. The algorithm constructs a model based on the same information as the naive Bayes algorithm, but uses a different approach toward building the model. While naive Bayes assumes the feature independence, MaxEnt uses multinomial logistic regression to determine the right category for a given text. To understand how the regression algorithm works you can see the following article: Simple linear regression using JFreeChart. For logistic regression see: Logistic regression using Apache Mahout.

The entropy is a term used in the context of information theory and measures the uncertainty of an information content. Let’s consider the example of a coin toss (source Wikipedia). When the coin is fair, that is, when the probability of heads is the same as the probability of tails, then the entropy of the coin toss is as high as it could be. This is because there is no way to predict the outcome of the coin toss ahead of time—the best we can do is predict that the coin will come up heads, and our prediction will be correct with probability 1/2. Such a coin toss has one bit of entropy since there are two possible outcomes that occur with equal probability, and learning the actual outcome contains one bit of information. Contrarily, a coin toss with a coin that has two heads and no tails has zero entropy since the coin will always come up heads, and the outcome can be predicted perfectly.

The Maximum Entropy principle can be formulated as follows: given a collection of facts, choose a model which is consistent with all the facts, but otherwise as uniform as possible. The same principle is used also by this OpenNLP algorithm: from all the models that fit our training data, selects the one which has the largest entropy.

Hi. This has been very useful. But is there anyway that I could use other classification algorithm for example Naive Bayes or SVM to integrate together with OpenNlp Api? Does OpenNlp api provides any classification methods that i could call?