This component solves Slot-Filling task using Levenshtein search and different neural network architectures for NER.
To read about NER without slot filling please address NER documentation.
This component serves for solving DSTC 2 Slot-Filling task. In most of the cases, NER task can be formulated as:

Given a sequence of tokens (words, and maybe punctuation symbols)
provide a tag from a predefined set of tags for each token in the
sequence.

For NER task there are some common types of entities used as tags:

persons

locations

organizations

expressions of time

quantities

monetary values

Furthermore, to distinguish adjacent entities with the same tag many
applications use BIO tagging scheme. Here “B” denotes beginning of an
entity, “I” stands for “inside” and is used for all words comprising the
entity except the first one, and “O” means the absence of entity.
Example with dropped punctuation:

In the example above, FOOD means food tag, LOC means location
tag, and “B-” and “I-” are prefixes identifying beginnings and
continuations of the entities.

Slot Filling is a typical step after the NER. It can be formulated as:

Given an entity of a certain type and a set of all possible values of
this entity type provide a normalized form of the entity.

In this component, the Slot Filling task is solved by Levenshtein
Distance search across all known entities of a given type.

For example, there is an entity of “food” type:

chainese

It is definitely misspelled. The set of all known food entities is
{‘chinese’, ‘russian’, ‘european’}. The nearest known entity from the
given set is chinese. So the output of the Slot Filling system will be
chinese.

The dataset reader is a class which reads and parses the data. It
returns a dictionary with three fields: “train”, “test”, and “valid”.
The basic dataset reader is “ner_dataset_reader.” The dataset reader
config part with “ner_dataset_reader” should look like:

The inputs and outputs must be specified in the pipe. “in” means regular
input that is used for inference and train mode. “in_y” is used for
training and usually contains ground truth answers. “out” field stands
for model prediction. The model inside the pipe must have output
variable with name “y_predicted” so that “out” knows where to get
predictions.

The major part of “chainer” is “pipe”. The “pipe” contains the
pre-processing modules, vocabularies and model. However, we can use
existing pipelines:

This part will initialize already existing pre-trained NER module. The
only thing need to be specified is path to existing config. The
preceding lazy tokenizer serves to extract tokens for raw string of
text.

The slotfiller takes the tags and tokens to perform normalization of
extracted entities. The normalization is performed via fuzzy Levenshtein
search in dstc_slot_vals dictionary. The output of this component is
dictionary of slot values found in the input utterances.

The main part of the dstc_slotfilling componet is the slot values
dictionary. The dicttionary has the following structure:

Please see an example of training a Slot Filling model and using it for
prediction:

fromdeeppavlovimportbuild_model,configsPIPELINE_CONFIG_PATH=configs.ner.slotfill_dstc2slotfill_model=build_model(PIPELINE_CONFIG_PATH,download=True)slotfill_model(['I would like some chinese food','The west part of the city would be nice'])

This example assumes that the working directory is the root of the
project.

An alternative approach to Slot Filling problem could be fuzzy search
for each instance of each slot value inside the text. This approach is
realized in slotfill_raw component. The component uses needle in
haystack

The main advantage of this approach is elimination of a separate Named
Entity Recognition module. However, absence of NER module make this
model less robust to noise (words with similar spelling) especially for
long utterances.

Usage example:

fromdeeppavlovimportbuild_model,configsPIPELINE_CONFIG_PATH=configs.ner.slotfill_dstc2_rawslotfill_model=build_model(PIPELINE_CONFIG_PATH,download=True)slotfill_model(['I would like some chinese food','The west part of the city would be nice'])