Columns of interest in this study are the names of the “Speaker”, titles of the talk - “Name” column, and Duration of the talk - the “Duration” column. Latter titles seem to have the first and last names of the speakers at beginning . Upon going through that column, I realized that this practice began with the 424th entry. To be safe, let’s remove the first two words from that point on.

Some Cleaning

In this section, we remove the first two words from the 424th entry. We then clean up the text by removing some punctuations, extra spaces, and any URLs that may be present.

Speakers with more than 2 appearances and mean duration of their talks

numtalks <- data.frame(table(ted$Speaker))
table(numtalks$Freq)

##
## 1 2 3 4 5 6 9
## 1301 130 40 11 3 1 1

There were 1487 different speakers. 1301 of them gave one talk, whereas 130 of them had given two talks. Below, I will focus on only those people who have given more than 2 talks, which is a list of 56 people.

Let’s first deal with the duration of talk variable. Here, we compute the mean duration of talks for this group of 56.

Hans Rosling has 9 appearances, the highest in this list, followed by Marco Tempest, who has 6 appearances.

Assuming that many talks were scheduled for 15 mins, none of the 56 speakers had a mean talk time of around 14 mins. They all stretched their talk to either take up the whole 15 mins or perhaps, extended their talk a bit. Interesting stuff.