Interpreting Topic Models

This page serves as a staging ground for creating the WE1S topic modeling interpretation protocol. Included are observations about using the DFR-browser interface, steps in reading a model, supplementary technical means of helping to read models, etc.

Weighting Topics (using spreadsheet)

A. Go to your project’s cache/model folder on Mirrormask (in the “write” tree), and download the “keys.txt” file to your local hard drive (or copy the content of the file to your clipboard).

B. Copy and paste the content of keys.txt into an Excel spreadsheet (or Google sheet). Then select column B, go to the “Data” tab, and use “sort”. (In Excel, you will then have to “expand” the selection to sort the affiliated columns and also choose the column to sort on and the sort order.)

C. Then you will end up with a ranked list of topics. You can also color code the rows if you wish for different kinds of topics.

Visualizing topics as word clouds (“topic clouds”), using Lexos

A. Go to your project’s cache/model folder on Mirrormask (in the “write” tree), and download the “counts.txt” file to your local hard drive.

B. Go to the Lexos online site: http://lexos.wheatoncollege.edu. Then, under the “Visualize” tab, choose “Multicloud.” In the dialogue for the multicloud, toggle from “Document clouds” to “Topic Clouds.” Then upload your “counts.txt” file and ask Lexos to “get graphs”.

“Evaluating Topic Significance — In addition to identifying qualitatively distinct topics in a large textual dataset, computational methods provide the necessary information to evaluate the significance of topics that the model identifies. From a topic modeling perspective, there are two basic ways to think about the significance of a topic. The first is to consider how commonly a topic occurs in the corpus as a whole, relative to other topics. If a reader encounters demarcation language, what subjects are they more likely to be reading about?Figure 4 reports labeled topics ranked by the Dirichlet parameter for each topic estimated by MALLET. In the MALLET implementation of LDA with hyperparameter optimization enabled, the Dirichlet parameter for each topic is optimized at regular intervals as the model is iteratively constructed. The greater the Dirichlet parameter for each topic in the resulting model, the greater the proportion of the corpus that has been assigned to that topic by MALLET. Ranking topics by Dirichlet parameter therefore answers the question of how commonly a topic occurs in the corpus as a whole, relative to other topics.”

Abstract: “The ability to analyze and organize large collections, to draw relations between pieces of evidence, to build knowledge, are all part of an information discovery process. This paper describes an approach to interactive topic analysis, as an information discovery conversation with a recommender system. We describe a model that motivates our approach, and an evaluation comparing interactive topic analysis with state-of-the-art topic analysis methods.”