A collection of functions that measure the readability of a given body of text
using surface characteristics. These measures are basically linear regressions
based on the number of words, syllables, and sentences.

The functionality is modeled after the UNIX style(1) command. Compared to the
implementation as part of GNU diction,
this version supports UTF-8 encoded text, but expects sentence-segmented and
tokenized text. The syllabification and word type recognition is based on
simple heuristics and only provides a rough measure.

NB: all readability formulas were developed for English, so the scales of the
outcomes are only meaningful for English texts.

Installation

$ pip install https://github.com/andreasvc/readability/tarball/master

Usage

$ readability --help
Simple readability measures.
Usage: readability [--lang=<x>] [FILE]
or: readability [--lang=<x>] --csv FILES...
By default, input is read from standard input.
Text should be encoded with UTF-8,
one sentence per line, tokens space-separated.
Options:
-L, --lang=<x> Set language (available: de, nl, en).
--csv Produce a table in comma separated value format on
standard output given one or more filenames.
--tokenizer=<x> Specify a tokenizer including options that will be given
each text on stdin and should return tokenized output on
stdout. Not applicable when reading from stdin.