generally speaking, non-words are tokens which do not start with a letter of the alphabet. Examples of non-words: !mportant, 2U
(There might be rare cases when the corpus author uses a different definition in their corpus. Such a definition is part of the corpus configuration file.)