Top 1000 Words

The following are the 1000 most common wordforms in UK English,
based on 29 works of literature by 18 authors
(4.6 million words) and
Rosengren's modified frequency, with case-equated matching.
Words can include hyphen and apostrophe.

abs is the absolute frequency (total number of
occurrences); r is the range (number of texts in
which the word occurs); mod is the modified frequency
as defined by Rosengren (1972).

Omitted in compiling this list: John London England English
George Tom.
The texts range from 26000 words (Alice in Wonderland)
to 360000 (David Copperfield).

Absolute frequency is a notoriously noisy indicator of the
commonness of a word; a particular word may occur a large
number of times in total but in only a few texts.
For a discussion of the problem see the introduction of
Russian Learners' Dictionary by Nicholas J. Brown (1996).

Rosengren's modified frequency of a word is defined
as follows

n
KF = ( sum d_i * sqrt(x_i/d_i) ) ** 2
i=1

x_i is the number of occurrences in the ith text,
and d_i is the (fractional) size of the text.
KF stands for korrigierte Frequenz (corrected frequency).

Modified frequency has the property that it is less than or
equal to the absolute frequency, with equality if and only if
the word is evenly distributed (same relative frequency in all
texts).

The ratio mod/abs can be taken as a measure of the evenness
of distribution of a word.
On that basis the most evenly distributed words in the
above list are