/Applications/anaconda/envs/nlu/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters

As noted above, the dataset was originally released by Bowman et al. 2015, who derived it from WordNet using some heuristics (and thus it might contain some errors or unintuitive pairings).

I've processed the data into three different train/test splits, in an effort to put some pressure on our models to actually learn these semantic relations, as opposed to exploiting regularities in the sample.

edge_disjoint: The train and devedge sets are disjoint, but many words appear in both train and dev.

word_disjoint: The train and devvocabularies are disjoint, and thus the edges are disjoint as well.

word_disjoint_balanced: Like word_disjoint, but with each word appearing at most one time as the left word and at most one time on the right for a given relation type.

These are progressively harder problems

For word_disjoint, there is real pressure on the model to learn abstract relationships, as opposed to memorizing properties of individual words.

For word_disjoint_balanced, the model can't even learn that some terms tend to appear more on the left or the right. This might be a step too far. For example, appearing more on the right for hypernym corresponds in a deep way with being a more general term, which is a non-trivial lexical property that we want our models to learn.

In [4]:

withopen(wordentail_filename)asf:wordentail_data=json.load(f)

The outer keys are the three splits plus a list giving the vocabulary for the entire dataset:

There is still an important bias in the data: some words appear much more often than others, and in specific positions. For example, the very general term part appears on the right in a large number of cases, many of them hypernym.

To see how much our models are leveraging the uneven distribution of words across the left and right positions, we also have a split in which each word $w$ appears in at most one item $((w, w_{R}), y)$ and at most one item $((w_{L}, w), y)$.

The following tests establish that the dataset has the desired properties:

Even in deep learning, feature representation is the most important thing and requires care!
For our task, feature representation has two parts: representing the individual words and combining those representations into a single network input.

defrandvec(w,n=50,lower=-1.0,upper=1.0):"""Returns a random vector of length `n`. `w` is ignored."""returnutils.randvec(n=n,lower=lower,upper=upper)

In [28]:

# Any of the files in glove.6B will work here:glove50_src=os.path.join(glove_home,'glove.6B.50d.txt')# Creates a dict mapping strings (words) to GloVe vectors:GLOVE50=utils.glove2dict(glove50_src)defglove50vec(w):"""Return `w`'s GloVe representation if available, else return a random vector."""returnGLOVE50.get(w,randvec(w,n=50))

Here we decide how to combine the two word vectors into a single representation. In more detail, where u is a vector representation of the left word and v is a vector representation of the right word, we need a function vector_combo_func such that vector_combo_func(u, v) returns a new input vector z of dimension m. A simple example is concatenation:

In [29]:

defvec_concatenate(u,v):"""Concatenate np.array instances `u` and `v` into a new np.array"""returnnp.concatenate((u,v))

And then we run the experiment with nli.bakeoff_experiment. This trains and tests on all three splits, and additionally trains on word_disjoint's train portion and tests on word_disjoint_balanced's dev portion, to see what distribution of examples is more effective for this balanced evaluation.

Since the bake-off focus is word_disjoint, you might want to run just that evaluation. To to that, use:

For the methods, the only requirement is that they differ in some way from the baseline above. They don't have to be completely different, though. For example, you might want to stick with the model but represent examples differently, or the reverse.

You must train only on the train split. No outside training instances can be brought in. You can, though, bring in outside information via your input vectors, as long as this information is not from dev or edge_disjoint.

You can also augment your training data. For example, if ((A, B), synonym) is a training instance, then so should be ((B, A), synonym). Similarly, ((A, B), hyponym) and ((B, C), hyponym) are training cases, then so should be ((A, C), hyponym).

Since the evaluation is for word_disjoint, you're not going to get very far with random input vectors! A GloVe featurizer is defined above. Feel free to look around for new word vectors on the Web, or even train your own using our VSM notebooks.

You're not required to stick to TfShallowNeuralNetwork. For instance, you could create deeper feed-forward networks, change how they optimize, etc. As long as you have fit and predict methods with the same input and output types as our networks, you should be able to use bakeoff_experiment. For notes on how to extend the TensorFlow models included in this repository, see tensorflow_models.ipynb.

This website does not host notebooks, it only renders notebooks
available on other websites.