Sunday, May 26, 2013

I used the Read Database operator to get the documents. They were random Latin words, of 20 to 500 words in length.

The text processing was purposefully simple: tokenize the document and get the binary word vector.

I then stored the results in the RapidMiner repository, which creates a binary file.

In a different process, I then read the stored results and applied a Naive Bayes model to them. I didn't do all of them, but there wasn't much difference. As you can see, the model application is quite fast.