Subspace multinomial model (SMM) is a log-linear model and
can be used for learning low dimensional continuous representation
for discrete data. SMMand its variants have been used for
speaker verification based on prosodic features and phonotactic
language recognition. In this paper, we propose a new variant
of SMM that introduces sparsity and call the resulting model
as `1 SMM. We show that `1 SMM can be used for learning
document representations that are helpful in topic identification
or classification and clustering tasks. Our experiments in document
classification show that SMM achieves comparable results
to models such as latent Dirichlet allocation and sparse topical
coding, while having a useful property that the resulting document
vectors are Gaussian distributed.

Abstract

Subspace multinomial model (SMM) is a log-linear model and
can be used for learning low dimensional continuous representation
for discrete data. SMMand its variants have been used for
speaker verification based on prosodic features and phonotactic
language recognition. In this paper, we propose a new variant
of SMM that introduces sparsity and call the resulting model
as `1 SMM. We show that `1 SMM can be used for learning
document representations that are helpful in topic identification
or classification and clustering tasks. Our experiments in document
classification show that SMM achieves comparable results
to models such as latent Dirichlet allocation and sparse topical
coding, while having a useful property that the resulting document
vectors are Gaussian distributed.