defsentences_to_indices(X, word_to_index, max_len): m = X.shape[0] # number of training examples# Initialize X_indices as a numpy matrix of zeros and the correct shape X_indices = np.zeros((m, max_len))for i in range(m): # loop over training examples# Convert the ith training sentence in lower case and split is into words. You should get a list of words. sentence_words = X[i].lower().split()# Initialize j to 0 j = 0# Loop over the words of sentence_wordsfor w in sentence_words:# Set the (i,j)th entry of X_indices to the index of the correct word. X_indices[i, j] = word_to_index[w]# Increment j to j + 1 j += 1return X_indices

接下来需要实现预训练的 Embedding 层，将训练好的嵌入矩阵设置到 Embedding() 层的权值中：

12345678910111213141516171819202122

defpretrained_embedding_layer(word_to_vec_map, word_to_index): vocab_len = len(word_to_index) + 1# adding 1 to fit Keras embedding (requirement) emb_dim = word_to_vec_map["cucumber"].shape[0] # define dimensionality of your GloVe word vectors (= 50)# Initialize the embedding matrix as a numpy array of zeros of shape (vocab_len, dimensions of word vectors = emb_dim) emb_matrix = np.zeros((vocab_len, emb_dim))# Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabularyfor word, index in word_to_index.items(): emb_matrix[index, :] = word_to_vec_map[word]# Define Keras embedding layer with the correct output/input sizes, make it trainable.# Use Embedding(...). Make sure to set trainable=False. embedding_layer = Embedding(vocab_len, emb_dim, trainable = False)# Build the embedding layer, it is required before setting the weights of the embedding layer. Do not modify the "None". embedding_layer.build((None,))# Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained. embedding_layer.set_weights([emb_matrix])return embedding_layer

构建模型

接下来需要构建模型，模型分为：

输入层: Input((max_len, m), dtype='int32')

LSTM 层: LSTM(hidden_units, return_sequence)(embeddings)

Dropout 层: Dropout(keep_prob)(X)

全连接层: Dense(output_dimension)(X)

激活层: Activation(activation_func)(X)

1234567891011121314151617181920212223242526272829

defEmojify_V2(input_shape, word_to_vec_map, word_to_index):# Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices). sentence_indices = Input(input_shape, dtype='int32')# Create the embedding layer pretrained with GloVe Vectors (≈1 line) embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)# Propagate sentence_indices through your embedding layer, you get back the embeddings embeddings = embedding_layer(sentence_indices) # Propagate the embeddings through an LSTM layer with 128-dimensional hidden state# Be careful, the returned output should be a batch of sequences. X = LSTM(128, return_sequences=True)(embeddings)# Add dropout with a probability of 0.5 X = Dropout(0.5)(X)# Propagate X trough another LSTM layer with 128-dimensional hidden state# Be careful, the returned output should be a single hidden state, not a batch of sequences. X = LSTM(128, return_sequences=False)(X)# Add dropout with a probability of 0.5 X = Dropout(0.5)(X)# Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors. X = Dense(5)(X)# Add a softmax activation X = Activation('softmax')(X)# Create Model instance which converts sentence_indices into X. model = Model(inputs=sentence_indices, outputs=X)return model