stopwords – a list of stopwords that should be ignored during tokenizing/lemmatizing
and ngrams creation

batch_size – a batch size for spaCy buffering

ngram_range – size of ngrams to create; only unigrams are returned by default

lemmas – whether to perform lemmatizing or not

lowercase – whether to perform lowercasing or not; is performed by default by _tokenize()
and _lemmatize() methods

alphas_only – whether to filter out non-alpha tokens; is performed by default by
_filter() method

spacy_model – a string name of spacy model to use; DeepPavlov searches for this name in
downloaded spacy models; default model is en_core_web_sm, it downloads automatically
during DeepPavlov installation

Tokenize or lemmatize a list of documents for Russian language. Default models are
ToktokTokenizer tokenizer and pymorphy2 lemmatizer.
Return a list of tokens or lemmas for a whole document.
If is called onto List[str], performs detokenizing procedure.

Parameters

stopwords – a list of stopwords that should be ignored during tokenizing/lemmatizing
and ngrams creation

ngram_range – size of ngrams to create; only unigrams are returned by default

lemmas – whether to perform lemmatizing or not

lowercase – whether to perform lowercasing or not; is performed by default by _tokenize()
and _lemmatize() methods

alphas_only – whether to filter out non-alpha tokens; is performed by default by _filter()
method