Over the last months, I created a complex RapidMiner process for suggesting TV programs based on the programs recorded by the user. A bit like a Tivo but for the open source personal video recorder software MythTV. (http://www.mythtv.org/) MythMiner also works in RapidAnalytics.

MythTV users can now receive daily suggestions for new TV programs they might be interested in. MythMiner can even automatically schedule recordings of interesting programs if the user wishes.

RapidMiner users, please note that MythTV is a complex system and it's currently only available for Unix/Linux. It can take weeks to install it and you probably need a dedicated computer for it. So maybe you are interested in taking a look at MythMiner to learn how I solved this problem but it can be only used with a full MythTV installation.

Thanks to the RapidMiner team for this amazing software and especially to Mr. Ralf Klinkenberg at Rapid-I who taught me everything about data and text mining in the training courses.

thanks a lot for your kind words. I have to return your compliments: You are one of the most experienced and knowledgeable data mining experts I have met in our training courses so far. And I gave a lot of courses during last five years or so. So that means something. It is a pleasure to work with you. So I hope we have a chance to meet again this year.

Basically, MythMiner creates two "corpuses" from the MySQL database that is used by MythTV. One corpus consists of the categories, titles and descriptions of programs that were recorded by the user in the past. The second corpus consists of the same data of programs that *weren't* recorded. (The assumption is that the user did record everything he/she is interested in.) The second corpus is sampled, of course, because it can be a huge list.

The rest is normal text mining: building a model from the recorded and not recorded samples and applying this model and wordlist to tomorrow's program data.

I did a lot of experimenting with Bayes and SVM operators and got a bit better results from SVM. My machines also worked day and night on parameter optimization (grid and evolutionary). Had to try all the things I learned in the courses in Dortmund

The result is saved in an HTML page using the reporting plugin and optionally (when the confidence exceeds a configurable threshold) inserted into the scheduled recordings table. (MythTV supports recording priorities, and MythMiner uses the lowest priority. So the user's manual recordings won't be interfered with.)

The process also works in RapidAnalytics; that's how I use it currently. But it works equally well from the RapidMiner command line.

The resulting HTML can be viewed as is but my cron script applies a few transformations (e. g. it makes URLs clickable) to it and mails to me each morning.

By the way, I'm a bit unhappy with the reporting plugin: I fully understand that you include a text like "Created with RapidMiner" with a link to Rapid-I in the default template. But if the user creates a template on his own, that shouldn't be changed by RapidMiner. This is shareware mentality, not real Open Source mentality. (Of course I could change the source code but then I would have to do that each time the plugin is updated.)

You might consider removing this "feature" and just leave the message in the default template.