For this next writeup, I'll show how to take the same model and prepare it for use in a mobile app. I only have experience with iOS devices and only have an iPhone for testing, but the process of extracting, modifying, and serializing the computation graphs should apply for Android deployments as well.

Here is a video capture of the app running on my development device, an iPhone 5s. BTW, all food in the screenshots and video are vegan! ;)

I originally trained the model using Tensorflow 0.11.0 and Keras 1.1.2.

For this project, I am using the newer Tensorflow 1.0.1 and Keras 1.2.2.

I am not aware of any incompatibilities with taking a model trained with an older version of Tensorflow and using it for inference in a new version of Tensorflow. However, I could be wrong. Of course, Keras 2 has come out since I started this project and have not had time to test.

I consider the code here to be very hacky! There is not much documentation online about preparing Keras models for Mobile Tensorflow apps. I am also not an experienced iOS developer. I mainly wanted to prove to myself that this can work, then refine my approach in the future. I would appreciate feedback on any of these issues:

When running the app on the device, the inference randomly stops working. The video feed still updates, but no more predictions are made. I can't seem to find a way to reproduce this issue reliably, but it is very common. I noticed a recent open issue that may be related: Tensorflow freezes on iOS during Session::Run

I do not know if I am getting full performance from Tensorflow on iOS. I am doing a standard build of Tensorflow from my MacBook Pro. There are apparently some undocumented flags that can turn on optimization, but I don't know if they apply to the current version. In any case, I do seem to achieve 1-1.5 sec per inference on my iPhone 5s.

The level of accuracy that I achieved in my previous writeup depended on 10-crops at particular positions in an image. I am sticking with whatever came with the example app sample code to handle resizing of a single crop. I don't know how the portrait orientation of the camera affects accuracy when resizing the image to the 299x299 size needed by the InceptionV3 network.

I don't know if I'm dealing without dropout properly, as a result, the difference between the original model predictions and the modified model predictions are slightly higher than I would have thought. In a production system, I would definitely want to run through my test images on the device in order to compare test set accuracy with the original model.

I wasn't able to get weight quantization to work properly. I may need to manually find the min/max ranges.

I am getting non-deterministic results when evaluating the optimized models from disk to compare predictions.

Before trying to replicate what I have done here, know that there are probably better ways of doing this!

Here are some resources that can help you explore other paths, some that achieve much better performance than what I have here.

Squeezing Deep Learning Into Mobile Phones, excellent slides by Anirudh Koul that summarize various options available for mobile Deep Learning apps. I like how he breaks it down depending on how much time you want to invest (1 day, 1 week, 1 month, 6 months, etc.)

And though I don't believe it supports iOS at the moment, keep an eye out for Tensorflow XLA compiler. In the future, we might be able to do mobile-specific builds, which will allow us to execute our computation graphs on a mobile device without having to have the entire Tensorflow inference library. This could allow for dramatic size reductions and possibly speedups!

All in all, Deep Learning on mobile is looking bright! Hopefully it becomes easier and more straightforward to get your trained models running efficiently on a device.

Looking through the different nodes, we can see that some nodes take a Switch node as an input. In some cases, there is a :1 appended to the actual name of the node. I believe this stands for the output of a Switch node when it is true. If anyone reading this knows for sure what that means, let me know. Since there are no nodes with that name, through trial and error, I was able to get the model to run by routing those to the actual Switch node name.

Below, we change the inputs of all nodes that take in a Switch:1 input to remove the last 2 characters:

We don't need keras_learning_phase to be a Placeholder, as it should be set to a constant value of False, for Test mode. We can set it to a Const op, then set it's value to a 1-dimensional tensor containing False. The shape attribute is not valid in a Const op, so we just delete it:

We want our output .pb file to be self contained, with both the computation graph and all the trained weights. To do this, we simply need to convert_variables_to_constants assuming we want to run the computation all the way up to the Softmax output, which will give us the 101 class probabilities.

I originally intended to follow Pete Warden's Tutorial: Tensorflow for Mobile Poets. In that tutorial, he takes an InceptionV3 network and runs it through some optimizations to reduce the number of operations, decrease the resolutions of the weights, and overall make the network smaller and faster.

Then we can optimize the graph for deployment. Notice that we are rounding the weights so that the file can compress better when added to the device bundle.

(tensorflow) ➜ model_export git:(master) ✗ ../../tensorflow/bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=graph.pb \
--out_graph=opt_graph.pb \
--inputs='input_1' \
--outputs='Softmax' \
--transforms='strip_unused_nodes(type=float, shape="1,299,299,3") remove_nodes(op=Identity, op=CheckNumerics) round_weights(num_steps=256) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'
2017-03-22 00:35:27.886563: I tensorflow/tools/graph_transforms/transform_graph.cc:257] Applying strip_unused_nodes
2017-03-22 00:35:28.048049: I tensorflow/tools/graph_transforms/transform_graph.cc:257] Applying remove_nodes
2017-03-22 00:35:28.709523: I tensorflow/tools/graph_transforms/transform_graph.cc:257] Applying round_weights
2017-03-22 00:35:29.032210: I tensorflow/tools/graph_transforms/transform_graph.cc:257] Applying fold_constants
2017-03-22 00:35:29.064884: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-03-22 00:35:29.064910: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-03-22 00:35:29.064914: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-03-22 00:35:29.064917: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-03-22 00:35:29.064919: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-03-22 00:35:29.544610: I tensorflow/tools/graph_transforms/transform_graph.cc:257] Applying fold_batch_norms
2017-03-22 00:35:29.655708: I tensorflow/tools/graph_transforms/transform_graph.cc:257] Applying fold_old_batch_norms

Tweak decayValue, updateValue, minimumThreshold in setPredictionValues method to get a better user experience.

Be sure to go to Build Settings and update Other Linker Flags, Header Search Paths, and Library Search Paths to point to your local build of Tensorflow. This project folder is a sibling of my Tensorflow folder.