I'm looking for a list of German words in a digital format that is simple and easy to parse by a computer. It should encompass almost all words, but it doesn't matter whether inflections are included. Acronyms and proper names are not interesting, but those are easy to filter.

The Wiktionary dump does not fulfill the criterion of being simple to parse, since I'd have to check each article whether it is a German word or not, nor the all-encompassing criterion.

These contain a list of line-separated German words. ogerman is for the old-spelling and ngerman is for reformed spelling. On my system, ogerman contains about 76000 words, while ngerman has about 330000 words.

It may be needed to install a package containing those files, and the path may be different either. Under debian, those are released under the GPL license, the package names are wogerman and wngerman.

@mbx Please see the question. Under Debian the names are wngerman and wogerman.
–
FUZxxlMay 28 '11 at 13:41

I just wondered, as dpkg -S ngerman only gave me texlive packages. But I guess it only uses installed packages.
–
mbxMay 28 '11 at 14:56

1

@mbx There is a w in fornt of ngerman. You could use apt-file to find that package. AFAIK, wogerman is preinstalled if you have full German language support. BTW, using these packages is great too, since you can easily put them as a dependency of your project instead of shipping them.
–
FUZxxlMay 28 '11 at 15:07

Google's NGramViewer also offers the raw datasets for download. The datasets also contain the number of occurences of the word (and for combinations of up to five words) in any given year. This may be useful for statistics on how differend word usages evolved; it's used all the time on EL&U.

Additionally, there are, beside (n|o)german-files, mentioned by FUZxxl, aspell- and ispell-files, which are compressed somehow, but I don't know how. Using gunzip and word-list-compress -d didn't work, as mentioned in the manpage.