原文：
Working with Text-Based Distances
Nearest neighbors is more versatile than just dealing with numbers. As long as we have a way to measure distances between features, we can apply the nearest neighbors algorithm. In this recipe, we will introduce how to measure text distances with TensorFlow.

Getting ready
In this recipe, we will illustrate how to use TensorFlow's text distance metric, the Levenshtein distance (the edit distance), between strings. This will be important later in this chapter as we expand the nearest neighbor methods to include features with text.

The Levenshtein distance is the minimal number of edits to get from one string to another string. The allowed edits are inserting a character, deleting a character, or substituting a character with a different one. For this recipe, we will use TensorFlow's Levenshtein distance function, edit_distance(). It is worthwhile to illustrate the use of this function because the usage of this function will be applicable to later chapters.
Note that TensorFlow's edit_distance() function only accepts sparse tensors. We will have to create our strings as sparse tensors of individual characters

How to do it…
1.First, we load TensorFlow and initialize a graph:

import tensorflow as tf
sess = tf.Session()

2.Then we will show how to calculate the edit distance between two words, 'bear' and 'beer'. First, we will create a list of characters from our strings with Python's 'list()' function. Next, we create a sparse 3D matrix from that list. We have to tell TensorFlow the character indices, the shape of the matrix, and which characters we want in the tensor. After this we can decide if we would like to go with the total edit distance (normalize=False) or the normalized edit distance (normalize=True), where we divide the edit distance by the length of the second word:
TensorFlow's documentation treats the two strings as a proposed (hypothesis) string and a ground truth string. We will continue this notation here with h and t tensors.

3.This results in the following output:
[[ 2.]]
The function, SparseTensorValue(), is a way to create a sparse tensor in TensorFlow. It accepts the indices, values, and shape of a sparse tensor we wish to create.

4.Next, we will illustrate how to compare two words, bear and beer, both with another word, beers. In order to achieve this, we must replicate the beers in order to have the same amount of comparable words:
hypothesis2 = list('bearbeer')
truth2 = list('beersbeers')
h2 = tf.SparseTensor([[0,0,0], [0,0,1], [0,0,2], [0,0,3], [0,1,0], [0,1,1], [0,1,2], [0,1,3]], hypothesis2, [1,2,4])
t2 = tf.SparseTensor([[0,0,0], [0,0,1], [0,0,2], [0,0,3], [0,0,4], [0,1,0], [0,1,1], [0,1,2], [0,1,3], [0,1,4]], truth2, [1,2,5])
print(sess.run(tf.edit_distance(h2, t2, normalize=True)))

How it works…
For this recipe, we have shown that we can measure text distances several ways using TensorFlow. This will be extremely useful for performing nearest neighbors on data that has text features. We will see more of this later in the chapter when we perform address matching.

There's more…
Other text distance metrics exist that we should discuss. Here is a definition table describing other various text distances between two strings, s1 and s2:
Computing with Mixed Distance Functions
When dealing with data observations that have multiple features, we should be aware that features can be scaled differently on different scales. In this recipe, we account for that to improve our housing value predictions.

Getting ready
It is important to extend the nearest neighbor algorithm to take into account variables that are scaled differently. In this example, we will show how to scale the distance function for different variables. Specifically, we will scale the distance function as a function of the feature variance.
The key to weighting the distance function is to use a weight matrix. The distance function written with matrix operations becomes the following formula:
Here, A is a diagonal weight matrix that we use to scale the distance metric for each feature.
For this recipe, we will try to improve our MSE on the Boston housing value dataset. This dataset is a great example of features that are on different scales, and the nearest neighbor algorithm would benefit from scaling the distance function.

How to do it…
1.First, we will load the necessary libraries and start a graph session:

3.Now we scale the x values to be between zero and 1 with min-max scaling:
x_vals = (x_vals - x_vals.min(0)) / x_vals.ptp(0)

4.We now create the diagonal weight matrix that will provide the scaling of the distance metric by the standard deviation of the features:
weight_diagonal = x_vals.std(0)
weight_matrix = tf.cast(tf.diag(weight_diagonal), dtype=tf. float32)

7.Now we can declare our distance function. For readability, we break up the distance function into its components. Note that we will have to tile the weight matrix by the batch size and use the batch_matmul() function to perform batch matrix multiplication across the batch size:
subtraction_term = tf.sub(x_data_train, tf.expand_dims(x_data_ test,1))
first_product = tf.batch_matmul(subtraction_term, tf.tile(tf. expand_dims(weight_matrix,0), [batch_size,1,1]))
second_product = tf.batch_matmul(first_product, tf.transpose(subtraction_term, perm=[0,2,1]))
distance = tf.sqrt(tf.batch_matrix_diag_part(second_product))

8.After we calculate all the training distances for each test point, we need to return the top k-NNs. We do this with the top_k() function. Since this function returns the largest values, and we want the smallest distances, we return the largest of the negative distance values. We then want to make predictions as the weighted average of the distances of the top k neighbors:
top_k_xvals, top_k_indices = tf.nn.top_k(tf.neg(distance), k=k)
x_sums = tf.expand_dims(tf.reduce_sum(top_k_xvals, 1),1)
x_sums_repeated = tf.matmul(x_sums,tf.ones([1, k], tf.float32))
x_val_weights = tf.expand_dims(tf.div(top_k_xvals,x_sums_ repeated), 1)
top_k_yvals = tf.gather(y_target_train, top_k_indices)
prediction = tf.squeeze(tf.batch_matmul(x_val_weights,top_k_ yvals), squeeze_dims=[1])

12.As a final comparison, we can plot the distribution of housing values for the actual test set and the predictions on the test set with the following code:
bins = np.linspace(5, 50, 45)
plt.hist(predictions, bins, alpha=0.5, label='Prediction'')
plt.hist(y_batch, bins, alpha=0.5, label='Actual'')
plt.title('Histogram of Predicted and Actual Values'')
plt.xlabel('Med Home Value in $1,000s'')
plt.ylabel('Frequency'')
plt.legend(loc='upper right'')
plt.show()

Figure 3: The two histograms of the predicted and actual housing values on the Boston dataset. This time we have scaled the distance function differently for each feature.

How it works…
We decreased our MSE on the test set here by introducing a method of scaling the distance functions for each feature. Here, we scaled the distance functions by a factor of the feature's standard deviation. This provides a more accurate view of measuring which points are the closest neighbors or not. From this we also took the weighted average of the top k neighbors as a function of distance to get the housing value prediction.

There's more…
This scaling factor can also be used to down-weight or up-weight features in the nearest neighbor distance calculation. This can be useful in situations where we trust features more or less than others.