Mixtures of Hierarchical Topics Pachinko Allocation

4 min readMay 11, 2022

Another approach to representing the organization of topics is the pachinko allocation model (PAM) (Li & McCallum, 2006). PAM is a family of generative models in which words are generated by a directed acyclic graph (DAG) consisting of distributions over words and distributions over other nodes. A simple example of the PAM framework, four-level PAM, is described in Li and McCallum (2006). There is a single node at the top of the DAG that defines a distribution over nodes in the second level, which we refer to as super topics. Each node in the second level defines a distribution over all nodes in the third level, or sub-topics. Each sub-topic maps to a single distribution over the vocabulary. Only the sub-topics, therefore, actually produce words. The super-topics represent clusters of topics that frequently cooccur. In this paper, we develop a different member of the PAM family and apply it to the task of hierarchical topic modeling. This model, hierarchical PAM (hPAM), includes multinomial over the vocabulary at each internal node in the DAG. This model addresses the problems outlined above: we no longer have to commit to a single hierarchy, so getting the tree structure exactly right is not as important as in hLDA. Furthermore, “methodological” topics such as one referring to “points” and “players” can be shared between segments of the corpus. Computer Science provides a good example of the benefits of the hPAM model. Consider three subfields of Computer Science: Natural Language Processing, Machine Learning, and Computer Vision. All three can be considered part of Artificial Intelligence. Vision and NLP both use ML extensively, but all three subfields also appear independently. In order to represent ML as a single topic in a tree-structured model, NLP and Vision must both be children of ML; otherwise words about Machine Learning must be spread between an NLP topic, a Vision topic, and an ML-only topic. In contrast, hPAM allows higher-level topics to share lower-level topics. For this work we use a fixed number of topics, although it is possible to use nonparametric priors over the number of topics. We evaluate hPAM, hLDA and LDA based on the criteria mentioned earlier. We measure the ability of a topic model to predict unseen documents based on the empirical likelihood of held-out data given simulations drawn from the generative process of each model. We measure the ability of a model to describe the hierarchical structure of a corpus by calculating the mutual information between topics and human-generated categories such as journals. We find a 1.1% increase in empirical log likelihood for hPAM over hLDA and a five-fold increase in super-topic/journal mutual information.

2.2. hLDA

The hLDA model, which is described in Blei et al. (2004), represents the distribution of topics within documents by organizing the topics into a tree. Each document is generated by the topics along a single path of this tree. When learning the model from data, the sampler alternates between choosing a new path through the tree for each document and assigning each word in each document to a topic along the chosen path. In hLDA, the quality of the distribution of topic mixtures depends on the quality of the topic tree. The structure of the tree is learned along with the topics themselves using a nested Chinese restaurant process (NCRP). The NCRP prior requires two parameters: the number of levels in the tree and a parameter γ. At each node, a document sampling a path chooses either one of the existing children of that node, with probability proportional to the number of other documents assigned to that child, or to a new child node, with probability proportional to γ. The value of γ can therefore be thought of as the number of “imaginary” documents in an as-yet-un-sampled path.

2.3. PAM

Pachinko allocation models documents as a mixture of distributions over a single set of topics, using a directed acyclic graph to represent topic cooccurrences. Each node in the graph is a Dirichlet distribution. At the top level there is a single node. Besides the bottom level, each node represents a distribution over nodes in the next lower level. The distributions at the bottom level represent distributions over words in the vocabulary. In the simplest version, there is a single layer of Dirichlet distributions between the root node and the word distributions at the bottom level. These nodes can be thought of as “templates” common cooccurrence patterns among topics. PAM does not represent word distributions as parents of other distributions, but it does exhibit several hierarchy-related phenomena. Specifically, trained PAM models often include what appears to be a “background” topic: a single topic with high probability in all super-topic nodes. Earlier work with four-level PAM suggests that it reaches its optimal performance at numbers of topics much larger than previously published topic models (Li & McCallum, 2006)파칭코

Mixtures of Hierarchical Topics Pachinko Allocation

Written by texas holdemsite