The Hierarchical Dirichlet Process (HDP) is typically used for topic
modeling when the number of topics is unknown and can be seen as an extension of Latent Dirichlet Allocation.

n Bayesian topic modeling, individual words in each document are assigned to one of \(K\) topics. This means each document has its own distribution over topics and each topic has its own distribution over words.

After running our model, we can visualze our topics using pyLDAvis. pyLDAvis is a Python
implementation of the LDAvis
tool created by Carson Sievert.

LDAvis is designed to help users interpret the topics in a topic
model that has been fit to a corpus of text data. The package
extracts information from a fitted LDA topic model to inform an
interactive web-based visualization.

Similarly, if we create a document from words from the 1st and 7th
topic, our prediction is that the document is generated mostly by those
topics. We’ll plot these two documents to compare their distribution over topics.

*Topic and Term Distributions*

Of course, we can also get the topic distribution for each document
(commonly called \(\Theta\)).

We can also get the raw word distribution for each topic (commonly
called \(\Phi\)). This is related to the word relevance. Here are
the most common words in one of the topics.

To use our HDP, install our libary in conda:

$condainstallmicroscopes-lda

Datamicroscopes is developed by Qadium, with funding from the DARPAXDATA program. Copyright Qadium 2015.