Frequency per million with a corpus, subcorpus and text type

The frequency per million is always related to the whole corpus or subcorpus, not to a text type. Restricting the query to one or more text types will affect the number of hits but the frequency per million will stay calculated using the number of tokens in the whole (sub)corpus.

To realte the frequency per million to one or more text types, create a subcorpus from the text type(s) and restrict the query to this subcorpus.

Example

Looking up the frequency of the word helps in the British National Corpus (112,181,015 tokens), first in the spoken Text type and then in the spoken subcorpus will produce these results.

subcorpus selected

none

none

spoken
11,787,138 tokens

text type selected

none

spoken

none

hits

3,116

302

302

frequency per million

27.75
in relation to the number of tokens in the whole corpus

2.69
in relation to the number of tokens in the whole corpus

25.62
in relation to the subcropus size

possible interpretation

helps appears 27.75 times per million words in BNC

‘spoken’ helps appears 2.69 times per million in BNC

helps appears 25.62 times per million in the spoken part of BNC

Workshop in lexicography and lexical computing

Your 5 days to get up-to-date with the latest developments in corpus-driven lexicography and to activate and enhance your corpus query skills with some of the top experts in the field.