Deploy Your Predictive Model To Production

5 Best Practices For Operationalizing Machine Learning.

Sometimes you develop a small predictive model that you want to put in your software.

I recently received this reader question:

Actually, there is a part that is missing in my knowledge about machine learning. All tutorials give you the steps up until you build your machine learning model. How could you use this model?

In this post, we look at some best practices to ease the transition of your model into production and ensure that you get the most out of it.

How To Deploy Your Predictive Model To ProductionPhoto by reynermedia, some rights reserved.

I Have a Model. Now What?

So you have been through a systematic process and created a reliable and accurate model that can make predictions for your problem.

You want to use this model somehow.

Maybe you want to create a standalone program that can make ad hoc predictions.

Maybe you want to incorporate the model into your existing software.

Let’s assume that your software is modest. You are not looking for Google-sized scale deployment. Maybe it’s just for you, maybe just a client or maybe for a few workstations.

So far so good?

Now we need to look at some best practices to put your accurate and reliable model into operations.

5 Model Deployment Best Practices

Why not just slap the model into your software and release?

You could. But by adding a few additional steps you can build confidence that the model that you’re deploying is maintainable and remains accurate over the long term.

Have you put a model into production?Please leave a comment and share your experiences.

Below a five best practice steps that you can take when deploying your predictive model into production.

1. Specify Performance Requirements

You need to clearly spell out what constitutes good and bad performance.

This maybe as accuracy or false positives or whatever metrics are important to the business.

Spell out, and use the current model you have developed as the baseline numbers.

These numbers may be increased over time as you improve the system.

Performance requires are important. Without them, you will not be able to setup the tests you will need to determine if the system is behaving as expected.

Do not proceed until you have agreed upon minimum, mean or a performance range expectation.

2. Separate Prediction Algorithm From Model Coefficients

You may have used a library to create your predictive model. For example, R, scikit-learn or Weka.

You can choose to deploy your model using that library or re-implement the predictive aspect of the model in your software. You may even want to setup your model as a web service.

Regardless, it is good practice to separate the algorithm that makes predictions from the model internals. That is the specific coefficients or structure within the model learned from your training data.

2a. Select or Implement The Prediction Algorithm

Often the complexity a machine learning algorithms is in the model training, not in making predictions.

For example, making predictions with a regression algorithm is quite straightforward and easy to implement in your language of choice. This would be an example of an obvious algorithm to re-implement rather than the library used in the training of the model.

If you decide to use the library to make predictions, get familiar with the API and with the dependencies.

The software used to make predictions is just like all the other software in your application.

Treat it like software.

Implement it well, write unit tests, make it robust.

2b. Serialize Your Model Coefficients

Let’s call the numbers or structure learned by the model: coefficients.

These data are not configuration for your application.

Treat it like software configuration.

Store it in an external file with the software project. Version it. Treat configuration like code because it can just as easily break your project.

You very likely will need to update this configuration in the future as you improve your model.

3. Develop Automated Tests For Your Model

You need automated tests to prove that your model works as you expect.

In software land, we all these regression tests. They ensure the software has not regressed in its behavior in the future as we make changes to different parts of the system.

Write regression tests for your model.

Collect or contribute a small sample of data on which to make predictions.

Use the production algorithm code and configuration to make predictions.

Confirm the results are expected in the test.

These tests are your early warning alarm. If they fail, your model is broken and you can’t release the software or the features that use the model.

Make the tests strictly enforce the minimum performance requirements of the model.

I strongly recommend contriving test cases that you understand well, in addition to any raw datasets from the domain you want to include.

I also strongly recommend gathering outlier and interesting cases from operations over time that produce unexpected results (or break the system). These should be understood and added to the regression test suite.

Run the regression tests after each code change and before each release. Run them nightly.

4. Develop Back-Testing and Now-Testing Infrastructure

The model will change, as will the software and the data on which predictions are being made.

You want to automate the evaluation of the production model with a specified configuration on a large corpus of data.

This will allow you to efficiently back-test changes to the model on historical data and determine if you have truly made an improvement or not.

This is not the small dataset that you may use for hyperparameter tuning, this is the full suite of data available, perhaps partitioned by month, year or some other important demarcation.

Run the current operational model to baseline performance.

Run new models, competing for a place to enter operations.

Once set-up, run it nightly or weekly and have it spit out automatic reports.

Next, add a Now-Test.

This is a test of the production model on the latest data.

Perhaps it’s the data from today, this week or this month. The idea is to get an early warning that the production model may be faltering.

This can be caused by content drift, where the relationships in the data exploited by your model are subtly changing with time.

This Now-Test can also spit out reports and raise an alarm (by email) if performance drops below minimum performance requirements.

5. Challenge Then Trial Model Updates

You will need to update the model.

Maybe you devise a whole new algorithm which requires new code and new config. Revisit all of the above points.

A smaller and more manageable change would be to the model coefficients. For example, perhaps you set up a grid or random search of model hyperparameters that runs every night and spits out new candidate models.

You should do this.

Test the model and be highly critical. Give a new model every chance to slip up.

Evaluate the performance of the new model using the Back-Test and Now-Test infrastructure in Point 4 above. Review the results carefully.

Evaluate the change using the regression test, as a final automated check.

Test the features of the software that make use of the model.

Perhaps roll the change out to some locations or in a beta release for feedback, again for risk mitigation.

Accept your new model once you are satisfied that it meets the minimum performance requirements and betters prior results.

44 Responses to Deploy Your Predictive Model To Production

Jason – I want to build a ecommerce streaming based recommendations. The key entities am considering are clicktsrram events like web logs to capture page hits for products. Real-time feed of product feature and category, orders in real-time.

Outside of this am also adding few booster to business as a boost to before ranking them.

Iam not clear conceptually, when am doing real-time on large data thro streaming does these ML algorithm will even scale or shd I go for lambda architecture which does in batches offline instead of real-time.

Again, if i have to add something like clustering algos/PCAs for dimensionality reduction, in such high volume transactions for realtime processing – will it scale because each model would take time to execute.

Hi Jason,
Thanks for a great article. I was wondering, if my model is comprised of a black box model, or an ensemble of black-box model. In this case, I do not have an easy equation to fit the model. In such case, how model implementation is handled in production?
Thanks,
Pallavi

There is definitely an emerging market of solutions to ease some of the deployment pains. TomK mentioned one. http://opendatagroup.com is another. At the moment, solutions in the space tend to focus on being model language agnostics (R, Python, Matlab, Java, C, SaS, etc.). They package up your model into an easy to deploy, scalable microservice. You can then set input and output sources for your model service to read and write from. The next facet is providing tools to monitor the performance metrics of your models, and manage the upgrading of models as new models are developed. Since many companies are still developing their data science strategy and infrastructure, I think a key point is flexibility. Look for solutions that have the flexibility to continue to connect with different data and messaging sources as your IT department continues to evolve the infrastructure.

Should the whole code be in a function, so that every time we can run function with required arguments?
or
Is there any way to write the code for such machine learning problems, as many write chunks of code for data processing, modelling, evaluation etc.

But what will make the prediction object created works on new data????

Maybe you want to take a look at https://github.com/orgesleka/webscikit. It is a webserver written in python which can hold multiple models at different urls. Models can be deployed later while the server is online. It is still work in progress, but I would be happy to hear your opinion about it.

training with keras in python should naturally lead to a python API server as a sensible choice for users making predictions. I tried with flask, with gunicorn.

however making the server performant/work with multiple simulateneous requests is awkward.

Multiprocessing with keras models is hard, by default they are unpicklable. if you hack them to make them picklable, they are still so large that the overhead of pickling these complex objects makes multiprocessing really slow.

therefore you have to use multi-threading instead.
even here you have to make some weird arbitrary seeming lines to get tensorflow to behave

self.graph = tf.get_default_graph()
after loading model in main thread,
then
self.graph.as_default()

when using in the child threads?

might have to and/or call model._make_predict_function() in main thread before spawning workers.

I honestly have no idea about how/why multithreading keras models was being dodgy so who knows why one/both of the above make it work \_0_o_/

there are multiple different github issues for it, with seemingly some different random workarounds that work for peoples specific cases.

what you also have to note is that with multithreading you are still limited by the GIL. if you have any cpu bound work (i.e. possibly some of the input preprocessing), even if your server can now take multiple requests at a time, the GIL will still block/limit to one cpu core.
To avoid this I try and have all pre-processing of inputs done externally to api server, so all api has to do is literally call predict and return results.

(Im not sure whether simultaneous predicts in different threads/requests are blocked by the GIL. apparently some numpy stuff is, some isnt…not sure what that means for tensorflow predictions)

What if one later does “added” training from saved weights? Should that added training use the old scaler, or should one create a new scaler whenever there is a larger amount of new training data used to update the weights?

In case of parametric model deployment, We do lot of data preprocessing techniques to get the best accurate model and when deploying these model’s to production how we can take care of data preprocessing steps if we only deploy co-efficient’s or approximation function?