Data Scientist in Van Wyck, SC | Taking it one day at a time

Sentiment Analysis of Frasier

If you’ve been following along with my previous posts, you know that I’ve been
working on prepping data from the television show Frasier. It involved a few
different ways of collecting this data, including cleaning it in BASH,
scraping and augmenting data in R.

Once I had the data ready, I was able to dive in. In the first part of my
analysis, I was able to use the subtools package in R as well as data from
IMDB.com to create a time-series sentiment analysis. Orginally I thought I would
be able to do this using tidytext but I found that using only unigrams with
the package didn’t really capture the full picture of each sentence.
The sentimentr package proved to be useful for what I was going for.
You can
check it out here.

In the second analysis
I tackled more detailed information from the transcripts at
kacl780.net. I augmented information from those
transcripts with data from IMDB and the gender package in R. There’s more
information available than I actually have time for at this point, so I’m
hoping that I can revisit it at some point in the future.

I’m also hoping I can combine some of the info that I didn’t use into a shiny
app that will present it in a nice tidy format.

Now this is pretty close to the original point of the reddit analysis, but I
found it a bit of a challenge since some characters share the same words.
For example, Frasier and Niles both say “dad” a lot, Frasier says it about
twice as much as Niles. To overcome that, I just took the top character by word
and gave credit to them. Also for Daphne, she says “Dr. Crane” a lot, referring
to both brothers, but it’s tough to see that since we’re using unigrams instead
of bigrams. You might consider removing one or the other references.

There’s still so much more that can be done with the data, I am just going to
work on it in small chunks as I go.