Understanding a software system by just analyzing
the structure of the system reveals only half of the
picture, since the structure tells us only how the
code is working but not what the code is about. What
the code is about can be found in the semantics of
the source code: names of identifiers, comments etc.
In this paper, we analyze how these terms are spread
over the source artifacts using Latent Semantic
Indexing, an information retrieval technique. We use
the assumption that parts of the system that use
similar terms are related. We cluster artifacts that
use similar terms, and we reveal the most relevant
terms for the computed clusters. Our approach works
at the level of the source code which makes it
language independent. Nevertheless, we correlated
the semantics with structural information and we
applied it at different levels of abstraction (e.g.
classes, methods). We applied our approach on three
large case studies and we report the results we
obtained.