example

Introductory Tutorial to TensorFlow Serving

It’s easier to build a TensorFlow model
and train it – or at least you can find many great starting scripts to
help you begin. However, to serve your trained model is not as easy. You
will use a different library called TensorFlow Serving,
and there aren’t many examples out there (most of which are focused on
image models). This post serves as a basic tutorial to get things
started.

The focus here is to demo serving a
simple model (linear regression), which can then be used as starting
point for your own projects of servings. If you are interested in
serving an image model, where your input is a .jpeg image or something,
there are many great tutorials out there.

You can see the complete code for export_model as well as client script in my github repo.

1. Train and export a linear regression model.

Training part is very simple, basically, you have an x which is input, w and b which are the weight and bias variables you want to learn, and y and y_which
are your output and regression target, respectively. And we are letting
the model to learn some random equation like below:

y = x1 * 1 + x2 * 2 + x3 * 3

So here’s the code:

Input arguments declarations (which are self-explanatory):

1

2

3

4

5

tf.app.flags.DEFINE_integer('training_iteration', 300,

'number of training iterations.')

tf.app.flags.DEFINE_integer('model_version', 1, 'version number of the model.')

Once done, you should be able to see a folder named ‘1’ inside your working directory. When cd to the ‘1’ folder, you should see two file/folder below:

saved_model.pb variables

where the first one is your serialized
model in protobuf, which includes the graph definition of the model, as
well as metadata of the model such as signatures. And second one
contains serialized variables of the model.

Until now you have trained and saved your model!

2. Set up the docker environment.

Why docker – You can
choose to install everything including dependencies and build them on
your machine, and serve the model in your local environment. However, I
would rather go the ‘clean’ way – download the docker image with all
dependencies installed, and then serve my model inside the docker.

This will make sure all it needs to serve
the model is inside the minimal ‘VM’ which is created solely for the
purpose for serving. Also, google has well prepared a docker image file
with every set up step so all you need to do is ‘pull’ the docker image,
and all will be set up and configured for you!

To install docker software in your machine, follow this link:

– https://docs.docker.com/engine/installation/

Below is a sequential list of steps you can just follow along to set up docker after you have installed it.

You can use any name for ‘–model_name’, but it must be consistent later on in writing up your client.py file.

‘–model_base_path’ must be absolutepath to your model’s directory.

If successful, you can ‘cat’ your my_log file, and see the last line of something like:

Running ModelServer at 0.0.0.0:9000 ...

Then your server is up and running!

b) Prepare client.py file

The next step is to write up the client file.

Since our model is a simple linear
regression, I will prepare some random test data, and simply let the
model predict the outputs. All ingredients are wrapped inside one
function called do_inference, as below:

c) Run client.py for inference!

Finally, we come to the stage of making inference on the server. For this step, I choose to run the client.py using ‘python’ in command line. Then, you will need to pip install both tensorflow-serving-api as well as tensorflow for your python, as they are the dependencies in the client.py file.

However, if you want to run it with bazel without using python command, can explore here.
Basically, you will need to edit and put the ‘BUILD’ file together with
your client.py, and use bazel to build it. After built, you can run it
without worrying about those dependencies. Both approaches work, but I
will choose to ‘python’ way in the tutorial which is easier.

d) Can also use a Flask server!

You can also wrap the client inside a
Flask server in python, so that you are flexible to send HTTP requests
to it with prediction data set, instead of one-time service like above.
This additional layer of Flask only forwards data to tensorflow server
and to output results received from server.

Based on our load test, the Flask + TF Serving architecture has a relatively lowlatency.

4. What’s next?

In this post, we have demonstrated how to
use TensorFlow to train and export a simple linear regression model
into disc, set up the Model Serving environment on Docker, as well as
serve the model locally.

Naturally, the next step would be to deploy the model in Production, and
eventually to automate the process of training, deploying and
management, and to be able to scale up services as number of requests
increase.

I will save it possibly to the next session, and would share more later on about how to deploy model using things like Google’s Kubernete which is a highly scalable container engine.