Month: March 2017

In my last article I showed an analysis of 617 movie scripts, identifying the most said words in those movies and also the trending of positive and negative words. That was done using different data sets, which means I had to do some data cleaning and blending. Today I’ll show you exactly what I did to clean and prepare the final data set using Pentaho Data Integration, a.k.a. Kettle.