So, Arnab Goswami’s interview of Rahul Gandhi concluded a while ago and now that the transcript is online, it’s time to do some text analysis (I will leave the meta analysis to political commentators/analysts):

The most frequently used words by Rahul (after filtering out some commonly used words):
system (70)
people (66)
going (52)
party (50)
country (45)
want/wants/wanted (40)
thing/things (37)
congress (34)
power (32)
rti (32)
political (31)
think/thinks/thinking (29)
one (28)
issue (26)
riots (25)

2 word phrase frequency:
i am or i’m (70)
in the (57)
going to (44)
the system (43)
this country (39)
we have (38)
i have (33)
of the (32)
to do (29)

3 word frequency:
the congress party (23)
in this country (22)
i want to (18)
we have to (13)

4 word frequency:
we are going to (9)
are we going to (8)
in the congress party (8)

Rahul’s word cloud

Arnab’s word cloud

Note: Word clouds created using Wordle and text analysis conducted using Textalyser and ATLAS.ti. The list of English stopwords was taken from Ranks.nl. To download the data in the spreadsheet, click here. (Please click on ‘file > ‘download as’ to save a copy of the file on your computer)

PS: In case you are wondering, Rahul Gandhi referred to himself in third person 7 times; he didn’t refer to his opponents by their names (Akhilesh or Arvind Kejriwal had 0 references, but Modi had 3); and oh, the word empower or a version of it like empowering/empowered/empowerment had 23 occurrences.

Did you say you are interested in some data wrangling? Or perhaps some data scraping? Wait, you say you just want to learn how to clean data and maybe geocode it? For all this and much more, log on to School of Data now! You can even take a course online. The following are some of the recommended tools:

“In a study of more than 500 stories published in four newspapers in the year 2011, I found nearly half were simply accounts of violent events. An analysis of sources showed that 62 percent of the stories were based on information supplied by security personnel and government spokespersons. Only 5 percent of the stories quoted the Maoists. And just 5 percent gave voice to the villagers.” Source: Caravan

The study she refers to is called “Guns and Protests”, which she undertook at Reuters Institute. The main finding that jumps out is that even in a left-leaning newspaper like The Hindu there is little space given to voices of villagers who are in the midst of this conflict. The key table is:

(The percentage figures are slightly different because I think there might be some typos in her table — I recalculate the percentages based on her raw numbers.)

To read more about the study click here.
To download the data in the spreadsheet, click here.
(Please click on ‘file’> ‘download’ to save a copy of the file on your computer)