When we build classifiers for database applications, we often encounter
nominal features. A nominal feature captures an object quality. An example
is the country a person is from. As designers, we must make sure our
classifiers encounter identical representation of nominal features both in
training and in execution.

In this article, we walk through an database application example that
explains how to deal with nominal data representation.

>> b*pr
{??? Error using ==> sdexe
Nominal representations in data set and pipeline do not agree! Use sdnominal to
validate and/or update nominal representation.

What is the cause of the reported error?

The nominal representation in data set b and classifier pr do not
match. In other words, different numerical coding is used for nominal
categories in b and in pr. Applying one to another would yield
meaningless results.

Comparing detailed nominal information on data set b and the classifier
pr (see previous section) shows, for example, that our test set contains
Amer-Indian-Eskimo race but the classifier does not know about it as this
category was not present in the training data set a.

We have seen how to pull data from SQL database and train a classifier on
nominal feature represenation. perClass helps us to make sure that the
nominal representation used in our data sets and classifiers is identical.

TIP: To avoid the need for re-training our classifiers, we should define
our data sets with the complete set of nominal categories for each
attribute and use the same setup in the entire project.