Unsupervised Corpus Based Clustering of Similar Contexts

SenseClusters is a package of Perl programs that allows a user to cluster
similar contexts together using unsupervised knowledge-lean methods. These
techniques have been applied to word sense discrimination, email
categorization, and name discrimination.

Collocation Identification

NSP allows you to identify word n-grams in large corpora using
standard tests of association such as Fisher's exact test, the log
likelihood ratio, Pearson's chi-squared text, and the Dice Coefficient.

UMLS Resources

UMLS::Similarity allows you to measure the similarity and relatedness of
two concepts in the Unified Medical Language Subsystem (UMLS) using a
variety of measures of semantic similarity and relatedness.

Complete source code and documentation for the Duluth systems that
participated in the Senseval-3 (2004) comparative exercise among word
sense disambiguation systems. This includes supervised lexical sample
systems based on the Duluth Senseval-2 systems, and a new unsupervised
lexical sample system.

Complete source code and documentation for the Duluth systems
that participated in the lexical sample tasks of Senseval-2 (2001)
comparative exercise among word sense disambiguation systems. These
systems rely on lexical features like unigrams, bigrams, and
co-occurrences.