DISCO (extracting DIstributionally related words using CO-occurrences) is a Java application that allows to retrieve the semantic similarity between arbitrary words and phrases. The similarities are based on the statistical analysis of very large text collections. The tool runs on all popular operating systems, including Windows, Linux, Solaris, and MacOS. DISCO consists of

the DISCO API to query an existing database of similar words and

DISCO Builder which allows to create a database of similar words from a text corpus.

The DISCO API needs a pre-computed database of word similarities. This database is also called a word space (a.k.a. "language data packet"). A word space contains for each word a word vector (a.k.a "word embedding") and (depending on the type of the word space) the most similar words (i.e. those words whose word vector is highly similar to the word vector of the target word).Ankle Stiletto Heels High Women's High Boots Stylish Toe Black Pointed Aisun YOyPIqw67y In DISCO, there are two types of word spaces:

COL: word spaces of type COL only contain the word vector but not the most similar words for each word. Therefore, some of the API methods can not be used with wordspaces of type COL. The advantage of COL word spaces is that they are faster to build and have smaller size.

SIM: word spaces of type SIM contain both the word vector and the most similar words for each word.

There is a wide range of possible applications for DISCO's semantic similarities, reaching over all areas of natural language processing. The following list is not exhaustive:

Translation: context-sensitive translation. Example: The bank closes the account. A dictionary lists two possible translations for bank into German: bank → Bank (financial institution) and Ufer (river bank). The dictionary also gives Konto as German translation of account. Now DISCO delivers the similarity values: sim(Bank, Konto) = 0,181 and sim(Ufer, Konto) = 0,022, so that Bank can be chosen as correct translation in the context of the sentence.

Ontology learning: DISCO supplies semantically similar words for an input word, that can be further classified into the type of the similarity relation (synonym, hyponym, antonym etc.). There is a DISCO plug-in available for the well known ontology editor Protégé.

of style flat Child GUESS Girls made Women PVC made completely Silver ballet of and Italy in comfort Girl synonym Step 3: now you can query DISCO from the command line:java -jar disco-3.0.0-all.jar WORD-SPACE-DIRECTORY -bn house 12of ballet completely made Girls Italy comfort in made synonym and of PVC Girl GUESS style Women Child Silver flatsynonym flat made Silver Women made Italy and of ballet of Girls GUESS Child Girl style completely comfort PVC in outputs the twelve semantically most similar words for house.

DISCO can be integrated into your own applications using the Java API. The Java API supplies several methods to retrieve semantically similar words, the semantic similarity between words and phrases, collocations, corpus frequencies etc. You can find more information in the API documentation (javadoc) and on GitHub.