structural topic modeling

Structural topic modeling

The topic modeling part

Find topics in your data!

install.packages("stm", "SnowballC") # probably new
install.packages("dplyr", "tidyr") # if you don't have them already

Getting my data ready. Note that your data should be a data frame where each row has one document. You should have a column called “documents” that has all of the text. Any other variables can be added as additional columns.

out <- prepDocuments(processed$documents, processed$vocab, processed$meta) # removes infrequent terms depending on user-set parameter lower.thresh (the minimum number of documents a word needs to appear in in order for the word to be kept within the vocabulary)

Note: “The default is init.type =”LDA" but in practice researchers on personal computers with vocabularies less that 10,000 can utilize the spectral initialization successfully." And spectral initialization is better, so you should do that. If you have a very large dataset, read up on how to correctly use the LDA option in the stm vignette.

For more information on FREX and high probability rankings, see Roberts et al. (2013, 2015, 2014); Lucas et al. (2015). For more information on score, see the lda R package. For more information on lift, see Taddy (2013).