Data Science, Machine Learning, CS Theory, Systems & Web

Using Vowpal Wabbit : Tips and Tricks

As I play around more with the machine learning toolkit Vowpal Wabbit, I realized there are several subtle flags/functionality in the toolkit.

Here is an effort to aggregate my leanings in using this toolkit.

Tips:

-q [ –quadratic ] arg

Create and use quadratic features

-q is a very powerful option. It takes as an argument a pair of two letters. Its effect is to create interactions between the features of two namespaces. Suppose each example has a namespaceuser and a namespace document, then specifying -q ud will create an interaction feature for every pair of features (x,y) where x is a feature from the user namespace and y is a feature from the document namespace. If a letter matches more than one namespace then all the matching namespaces are used. In our example if there is another namespace url then interactions between url and document will also be modeled. The letter : is a wildcard to interact with all namespaces. -q a: (or -q :a) will create an interaction feature for every pair of features (x,y)where x is a feature from the namespaces starting with a and y is a feature from the all namespaces. -q :: would interact any combination of pairs of features.

–print

Use this to understand how VW constructs the highlighted number of features.

When using contextual bandit mode, you will notice it gets added automatically per action

Feature ‘116060’

This is a constant feature with value 1, that essentially captures the intercept term in a linear model.

You may come across this feature if you look closely into the VW output.

Output Feature Ranking and Feature Weights Using VW

Is it possible to output the feature rankings after every update ?

try –audit

Is it possible to output the feature rankings at the end of training ?

use a combination of –-readable_model foo and –l1 1e-3. Any features surviving a high level of l1 regularization must be important according to the gradient.

Is it possible to output the feature weights after every update ?

possible, but this will be expensive. In between each example you can put in a special examples with a ‘tag’ that says save_<filename>..

Is it possible to output the feature weights at the end of training ?

that’s what –f does. If you want it in a readable format then use –readable_model.