A while back I encountered an interesting graphic showing where letters were located in english words (http://www.prooffreader.com/2014/05/graphing-distribution-of-english.html). The other day I decided to do a similar one for letters in danish words and for this I used R.

I downloaded all abstracts from the danish Wikipedia and made my own version as you can see here:

Here is how you can do it:

# First you need to load in some text

library(rvest)

# I’ll grab an article from FiveThirtyEight.com as a show case.
# I did my analysis on all the danish abstracts from Wikipedia (took a while!)
# When you do your final analysis you’ll want as much text as possible too.

# We add that position to a lit of positions
letter_place.list <- c(letter_place.list, letter_place)
}

# We create a new list to hold all the data and we then add the results from the loop
if(!exists(“letter_place.data”)) letter_place.data <- list(letter_place.list) else letter_place.data <- append(letter_place.data , list(letter_place.list))

# We make sure to name each list properly
names(letter_place.data)[length(letter_place.data)] <- i

}

# Now we have a nested list with the data we need, but first we’ll convert it to a long form data frame

# We then bind all the data frames together
letter_place.data.df <- rbind(letter_place.data.df, loop_data)

}, error=function(e){}) # Ends the tryCatch
}

# We check to see if we have all the letters
unique(letter_place.data.df$letter)

# We change the letters back to upper case for aesthetics in the graphic
letter_place.data.df$letter <- str_to_upper(letter_place.data.df$letter)

library(ggplot2)

# We create a density plot with free y scales to show the distribution, we choose a red fill colour and then we facet wrap it to show each individual letter
p <- ggplot(letter_place.data.df, aes(x=value)) + geom_density(aes(fill=”red”)) + facet_wrap( ~ letter, scales=”free_y”)