Deep learning 08--Neural style by Keras

Before I begin to explain how to do it, I want to mentioned that generate artistic style by deep neural network is different with image classification, we need to learn new concepts and add them into our tool boxes, if you find it hard to understand at the first time you saw it, do not fear, I have the same feeling too. You can ask me the questions or go to fast ai forum.

The paper present an algorithm to generate artistic style image by combine two image together using convolution neural network. Here are examples combine source images(bird, dog, building) with style images like starry , alice and tes_teach. From left to right is style image, source image, image combined by convolution neural network.

Let us begin our journey of the implementation of the algorithm(I assume you know how to install Keras, tensorflow, numpy, cuda and other tools, I recommend using ubuntu16.04.x as your os, this could save you tons of headache when setup your deep learning toolbox).

Unlike doing image classification with pure sequential api of Keras, to build a neural style network, we need to use backend api of Keras.

content_base = K.variable(content_img_arr)
style_base = K.variable(style_img_arr)
gen_img = K.placeholder(content_shp)
batch = K.concatenate([content_base, style_base, gen_img], 0)
#Feed the batch into the vgg model, every time we call the model/layer to
#generate output, it will generate output of content_base, style_base,
#gen_img. Unlike content_base and style_base, gen_img is a placeholder,
#that means we will need to provide data to this placeholder later on
model = vgg16_avg.VGG16_Avg(input_tensor = batch, include_top=False)
#build a dict of model layers
outputs = {l.name:l.output for l in model.layers}
#I prefer these 1~3 layers hierarchy as my style_layers,
#you can try it out with different range
style_layers = [outputs['block{}_conv1'.format(i)] for i in range(1,4)]
content_layer = outputs['block4_conv2']

#gram matrix is a matrix collect the correlation of all of the vectors
#in a set. Check wiki(https://en.wikipedia.org/wiki/Gramian_matrix)
#for more details
def gram_matrix(x):
#change height,width,depth to depth, height, width, it could be 2,1,0 too
#maybe 2,0,1 is more efficient due to underlying memory layout
features = K.permute_dimensions(x, (2,0,1))
#batch flatten make features become 2D array
features = K.batch_flatten(features)
return K.dot(features, K.transpose(features)) / x.get_shape().num_elements()
def style_loss(x, targ):
return metrics.mse(gram_matrix(x), gram_matrix(targ))
content_loss = lambda base, gen: metrics.mse(gen, base)
#l[1] is the output(activation) of style_base, l[2] is the
#output of gen_img loss of style image and gen_img. As the
#paper suggest, we add the loss of all convolution layers
loss = sum([style_loss(l[1], l[2]) for l in style_layers])
#content_layer[0] is the output of content_base,
#content_layer[2] is the output of gen_img
#loss of content image and gen_img
loss += content_loss(content_layer[0], content_layer[2]) / 10.
#The loss need two variables but we only pass in one,
#because we only got one placeholder in the graph,
#the other variable already determine by K.variable
grad = K.gradients(loss, gen_img)
#We cannot call loss and grad directly, we need
#to create a function(convert it to symbolic definition)
#before we can feed it into the solver
fn = K.function([gen_img], [loss] + grad)

You may ask, why using fmin_l_bfgs_b but not stochastic gradient descent? The answer is we can, but we have a better choice. Unlike image classification, we do not have a lot of batch to run, right now we only need to figure out the loss and gradient between three inputs, they are source image, style image and the random image, using fmin_l_bfgs_b is more than enough.