Pages

Tuesday, November 30, 2010

"Increasingly, algorithms are used to determine whether we can get access to credit, insurance and government services. They are posing a challenge to human decision-making in the arts. They are being used by prospective employers to decide if we should be hired. They can determine whether your online business will succeed or fail, and they have revolutionized the world of high finance."

Friday, November 12, 2010

This is the final second-to-last installment of a six-part series on text mining in RapidMiner. This video describes how to automatically categorize documents. This could be useful for a research project, or say finance.

You could use it to classify documents as "positive" or "negative", thus doing sentiment analysis. You could do it with financial news text, and classify documents as "stock went up" or "stock went down" after the release, and make (short-term) predictions of future stock movements. You can also see which words are important discriminants. Once you've trained a learning algorithm, you can use it on unseen data.

This is part four of a six-part series on text mining in RapidMiner. This video describes how to calculate the TF-IDF score for terms, calculate the similarity between documents, and cluster documents together. This can be useful for finding duplicate documents or database entries, and to show similar documents on a web page.

In the context of a job board, you could use it to find an interesting job, and then to find related ones as well.

Topics covered:

creating a word vector and calculating the terms' TF-IDF scores

calculating the similarity between documents using their cosine similarity

clustering documents using the K-Means algorithm

If you're not familiar with the free and open-source RapidMiner, see my other videos on my Youtube Channel.

This is part three of a six-part series on text mining in RapidMiner. This video describes how to find association rules in a collection of documents. An example would be if a job posting includes "data" and "mining" then it is also likely to include "RapidMiner". This is known as market basket analysis when applied to grocery stores :)

In this example, it can be useful for finding phrases and concepts that are important to job recruiters. You can use these phrases and concepts in your cover letter and resume, and increase your chances of getting them read.

Topics covered:

reading documents from a database

processing the text

creating a word vector

finding frequent itemsets using the FP-Growth algorithm

finding association rules

visualizing association rules

If you're not familiar with RapidMiner, see my other videos on my Youtube Channel.