Automatic construction of user-desired topical hierarchies over large volumes of text

Automatic construction of user-desired topical hierarchies over large volumes of text data is a highly desirable but challenging task. while generating consistent and quality hierarchies. and documents. The tokens. All the unique tokens in this corpus are indexed using a vocabulary of terms. And ∈ [= 1 … represents the index of the is defined by a probability distribution over terms ∈ Δ= {is the phrase ranked at in which each node is a topic. Every non-leaf topic has child topics. We assume is bounded by a small number is named the of the tree to another in in in topics. Remove – Docetaxel (Taxotere) for an arbitrary set of topics in in in to be under a different parent topic is recursively indexed by → is the path index of its parent topic and ∈ [among its siblings. For example topic 2 in the ‘merge’ example of Figure 1 is indexed as → 2 and topic 3 in the same tree is indexed as → 1 → 1. The of a topic is defined to be its distance to the root. So root topic is in level 0 and topic → 1 → 1 is in level 2. The of a tree is defined Docetaxel (Taxotere) to be the maximal level over all the topics in the tree. Clearly the total number of topics is upper bounded by leaf nodes and non-leaf nodes. For ease of explanation we assume all leaf nodes are on the level of = 0) has a multinomial distribution = = ·|paired with a non-leaf node > 0) has a multinomial distribution = = ·|→ 1 through → represents the content bias of document towards → 1 and → 2. So a document is associated with 3 multinomial distributions over topics: over its 2 children is generated from a Dirichlet prior (represents the corpus’ bias towards is selected from all children of in ～ ∈ [in ～ ∈ [← 0; While is not a leaf node: ← + 1; Draw subtopic for an internal node in the topic hierarchy can be calculated as a mixture of its children’s term distributions. The Dirichlet prior determines the mixing weight. When the structure is fixed we need to infer its parameters = 1 our model reduces to the flat LDA model. 4.1 Model Structure Manipulation The main advantage of this model is that it can be consistently manipulated to accommodate user operations. Proposition 1. The following atomic manipulation operators are sufficient in order to compose all the user operations introduced in Section 3: EXP(subtopics of a leaf topic → 1) three times. ‘Split’ – EXP(→ 2 2 followed by MER(→ 2). ‘Remove’ – MER(→ 2 → 2 → 1) followed by MER(→ 2). Implementation of these atomic operators needs to follow the consistency requirement. Single-run consistency – suppose the topical hierarchy ((((and of a random variable is the expectation of its in a document token positions. They are related to the model parameters and and by fitting the empirical moments with theoretical moments. As a computational advantage it only relies on the term co-occurrence statistics. The statistics contain important information compressed from the full data and require only a few scans of the data to collect. To compute our three atomic operators we generalize the notion of population moments. We consider the population moments on a topic . Component is the expectation of 1given that is drawn from topic is a × tensor (hence a matrix) storing the expectation of the co-occurrences of two terms ∈ ?∈ ?is a tensor in ?= A(a × × tensor) as the expectation of co-occurrences of three terms using model parameters associated with in document as: is in document subtopics under topic without changing any existing model parameters. So we need an algorithm that returns (∈ [k] with ?∈ [k] as unknown variables. Solving these equations yields a solution of the acquired model parameters. The following theorem by Anandkumar [3] suggests that we only need to use up to 3rd order moments to find the solution. Theorem 1. Assume > 0 × matrix and × × tensor. Direct application of the tensor decomposition SERPINB2 algorithm in [3] is challenging due to the creation of these huge dense tensors. Therefore we design a more scalable algorithm. The idea is to bypass the creation of and of the moments. We go over Algorithm 1 to explain it. Line 1.1 collects the empirical moments with one scan of the data. Lines 1.2 to 1.6 project Docetaxel (Taxotere) the large tensor into a smaller tensor ∈ ? . is not only of smaller size but also can be decomposed into an orthogonal form: calculated in Docetaxel (Taxotere) Line 1.5 which.