We introduce tiered clustering, a mixture
model capable of accounting for varying degrees
of shared (context-independent) feature
structure, and demonstrate its applicability
to inferring distributed representations of
word meaning. Common tasks in lexical semantics
such as word relatedness or selectional
preference can benefit from modeling
such structure: Polysemous word usage is often
governed by some common background
metaphoric usage (e.g. the senses of line or
run), and likewise modeling the selectional
preference of verbs relies on identifying commonalities
shared by their typical arguments.
Tiered clustering can also be viewed as a form
of soft feature selection, where features that do
not contribute meaningfully to the clustering
can be excluded. We demonstrate the applicability
of tiered clustering, highlighting particular
cases where modeling shared structure is
beneficial and where it can be detrimental.