If you want the output in UD format, try setting "data_format":ud in the tag_output_prettifier section
of configuration file you import (configs/morpho_tagger/UD2_0/morpho_ru_syntagrus_pymorphy.json in this case).

to apply a model which additionally utilizes information from
Pymorphy2 library.

A subdirectory results will be created in the working directory of deeppavlov module,
which is ~/.deeppavlov by default, and predictions will be written to the file ud_ru_syntagrus_test.res in it.
You can change the paths in corresponding sections of configuration file.

To evaluate ru_syntagrus model on ru_syntagrus test subset, run

python -m deeppavlov evaluate morpho_ru_syntagrus_train

To retrain model on ru_syntagrus dataset, run one of the following
(the first is for Pymorphy-enriched model)

Morphological tagging consists in assigning labels, describing word
morphology, to a pre-tokenized sequence of words.
In the most simple case these labels are just part-of-speech (POS)
tags, hence in earlier times of NLP the task was
often referred as POS-tagging. The refined version of the problem
which we solve here performs more fine-grained
classification, also detecting the values of other morphological
features, such as case, gender and number for nouns,
mood, tense, etc. for verbs and so on. Morphological tagging is a
stage of common NLP pipeline, it generates useful
features for further tasks such as syntactic parsing, named entity
recognition or machine translation.

Common output for morphological tagging looks as below. The examples
are for Russian and English language and use the
inventory of tags and features from Universal Dependencies
project.

It does not take into account the contents except the columns number
2, 4, 6
(the word itself, POS label and morphological tag), however, in the
default setting the reader
expects the word to be in column 2, the POS label in column 4 and the
detailed tag description
in column 6.

When annotating unlabeled text, our model expects the data in
10-column UD format as well. However, it does not pat attention to any column except the first one,
which should be a number, and the second, which must contain a word.
You can also pass only the words with exactly one word on each line
by adding "from_words":True to dataset_reader section.
Sentences are separated with blank lines.

The character-level part implements the model from
Kim et al., 2015. Character-aware language
models.
First it embeds the characters into dense vectors, then passes these
vectors through multiple
parallel convolutional layers and concatenates the output of these
convolutions. The convolution
output is propagated through a highway layer to obtain the final word
representation.

You can optionally use a morphological dictionary during tagging. In
this case our model collects
a 0/1 vector with ones corresponding to the dictionary tags of a
current word. This vector is
passed through a one-layer perceptron to obtain an embedding of
dictionary information.
This embedding is concatenated with the output of character-level
network.

As a word-level network we utilize a Bidirectional LSTM, its outputs
are projected through a dense
layer with a softmax activation. In principle, several BiLSTM layers
may be stacked as well
as several convolutional or highway layers on character level;
however, we did not observed
any sufficient gain in performance and use shallow architecture
therefore.

class_name field refers to the class MorphotaggerDatasetReader,
data_path contains the path to data directory, the language
field is used to derive the name of training and development file.
Alternatively, you can specify these files separately by full (or absolute) paths
like

By default you need only the train file, the dev file is used to
validate
your model during training and the test file is for model evaluation
after training. Since you need some validation data anyway, without
the dev part
you need to resplit your data as described in Dataset
Iterator section.

Your data should be in CONLL-U format. It refers to predict mode also, but in this case only word
column is taken into account. If your data is in single word per line format and you do not want to
reformat it, add "from_words":True to dataset_reader section. You can also specify
which columns contain words, tags and detailed tags, for documentation see
Documentation.

The chainer part of the configuration file contains the
specification of the neural network model and supplementary things such as vocabularies.
Chainer refers to an instance of Chainer, see
config_description for a complete description.

The major part of chainer is pipe. The pipe contains
vocabularies and the network itself as well
as some pre- and post- processors. The first part lowercases the input
and normalizes it (see CapitalizationPreprocessor).

If you want to utilize external morphological knowledge, you can do it in two ways.
The first is to use DictionaryVectorizer.
DictionaryVectorizer is instantiated from a dictionary file.
Each line of a dictionary file contains two columns:
a word and a space-separated list of its possible tags. Tags can be in any possible format. The config part for
DictionaryVectorizer looks as

The second variant for external morphological dictionary, available only for Russian,
is Pymorphy2. In this case the vectorizer list all Pymorphy2 tags
for a given word and transforms them to UD2.0 format using
russian-tagsets library. Possible UD2.0 tags
are listed in a separate distributed with the library. This part of the config look as
(see config))

When an additional vectorizer is used, the first line is changed to
"in":["x_processed","x_possible_tags"] and an additional parameter
"word_vectorizers":[["#pymorphy_vectorizer.dim",128]] is appended.