In order to get reliable estimates of the co-occurrences of
words, large text corpora have to be used. Since associations
of the ``average subject'' are to be simulated, the texts should
not be specific to a certain domain, but reflect the wide
distribution of different types of texts and speech as perceived
in every day life.

The following selection of some 33 million words of machine readable
English texts used in this study is a modest attempt to achieve
this goal:

Brown corpus of present day American English (1 million words)

LOB corpus of present day British English (1 million words)

Belletristic literature from Project Gutenberg (1 million words)

Articles from the New Scientist from Oxford Text Archive (1 million words)

Wall Street Journal from the ACL/DCI (selection of 6 million words)

Hansard Corpus. Proceedings of the Canadian Parliament (selection of 5 million words from the ACL/DCI-corpus)

Grolier's Electronic Encyclopedia (8 million words)

Psychological Abstracts from PsycLIT (selection of 3.5 million words)

Agricultural abstracts from the Agricola database (3.5 million words)

DOE scientific abstracts from the ACL/DCI (selection of 3 million words)

To compute associations for German the following corpora comprising
about 21 million words were used:

LIMAS corpus of present-day written German (1.1 million words)

Freiburger Korpus from the Institute for German Language (IDS), Mannheim (0.5 million words of spoken German)

Mannheimer Korpus 1 from the IDS (2.2 million words of present-day written German from books and periodicals)

Handbuchkorpora 85, 86 and 87 from the IDS (9.3 million words of newspaper texts)

German abstracts from the psychological database PSYNDEX (8 million words)

For technical reasons, not all words occuring in the corpora
have been used in the simulation. The vocabulary used consists
of all words which appear more than ten times in the English
or German corpus. It also includes all 100 stimulus words and all
responses in the English or German association norms. This leads
to an English vocabulary of about 72000 and a German vocabulary of
65000 words. Hereby, a word is defined as a string of alpha characters
separated by non-alpha characters. Punctuation marks and special
characters are treated as words.