How do we avoid complexity in machine learning?

At Quora we have developed a set of recommendations that help us avoid or reduce complexity in our ML systems. We ask our ML engineers to understand these principles and follow them when creating new ML models and developing new product features using an ML solution.

1. If you build something scrappy, clean it up asap

It is ok to build scrappy ML solutions for testing purposes. However, once the system is proved successful and launched to 100%, you need to clean it up. Cleaning-up means taking all the necessary steps to make the system simpler. In particular, you should take time to consolidate the system into existing solutions if possible, and/or remove unnecessary code and features.

2. Have others vet your design

When you are designing an ML solution, it is important to share your proposal with others. You might not realize that there are existing solutions that already address most of your requirements. Even if that is not the case, including others in the initial discussions can provide valuable feedback to make your system simpler.

3. Document your system

If your system is difficult to document and explain, it is too complex, period. Of course, code (and comments) are documentation, but it is documentation at a level of detail that might not be easy to digest. As a rule of thumb, you should be able to explain the key points of the whole system in 30 minutes or in 2 pages.

4. If you are trying to build something on an existing system and it is too complex, fix the system

If you are trying to add something to an existing system but realize it is going to take more time than building your own, what should you do? If you know your solution is here to stay in the long run, you should definitely build it into the existing system. If the system is not flexible enough for what you want to do, evaluate the cost of making the system more flexible and do it. The flexibility you add now not only benefits you, but also projects that come after yours. Treat this as a “dev velocity” investment.

If you are on a timeline, you are experimenting, or the cost of building your new feature into an existing system is way higher than building a new system from scratch, you should go ahead and be scrappy. That said, when you are done experimenting, please read Principle #1.

5. Implement ML features to be reusable

Ideally there should be a Feature Engineering Framework that enables the creation of reusable ML features. Without such a framework, it is important to try and implement features that are reusable, transformable, interpretable, and reliable.

6. Do not use more features than necessary

We recommend engineers to do feature selection and model tuning iteratively. You might be able to remove many ML features that add complexity and computation overhead while maintaining the quality of the model. That said, it is important to understand the interrelation of the features and the model. Some features might not make a difference simply because the model is too simple to learn them.

7. Choose the simplest model possible

When deciding between different ML models, always choose the “simplest” one. The definition of “simplest” may depend on the situation, but please refer to the first section of this blog post to understand different dimensions that contribute to complexity. Here is our take on some ML models ordered by general complexity:

Linear regression

Logistic regression

Collaborative filtering

Random forests

Gradient boosted decision trees

Elastic nets

LambdaMart and other learning-to-rank approaches

Neural networks

That said, the devil is in the details, and you could have a simple implementation of gradient boosted decision trees that is simpler than a complex implementation of logistic regression. Also, it is important to remember that there is a direct interaction between model complexity and number (and type) of features. So, a complex model might not show any quality wins simply because it does not have the right features to learn.

8. Use open source implementations whenever possible

While there may be good reasons to implement an ML model in-house, the truth is that most models already have existing implementations that are reasonably good. Implementing something in-house allows for greater flexibility, but also means that you need to invest in maintaining and evolving the system. Also, unless a good internal ML framework is developed, an in-house implementation usually makes it much harder to switch from one model to another.

9. There should only be one way to do similar things

If you find yourself wondering why there are several ways of doing similar things, it is time to consolidate them into one. Some example situations that we should avoid include:

Different implementations of the same ML model

Different implementations of similar ML features

Different tools used for the same functionality in different teams

...

10. Abstract complexity

ML is as much about experimentation as it is about production. Therefore, you want to provide flexibility for experimentation. The way to provide flexibility while avoiding complexity is to use the right level of abstraction. For example, you should be able to change your ML model easily if the connection between the model and the data on one hand, and the production engine on the other, is done through a well-defined interface.

Conclusions

In this blog post, we've shared our thinking about complexity that comes with ML systems and analyzed common traits and reasons for its existence. We believe that engineers and organizations that depend on ML to improve their products need to be aware of the complexity it introduces, and be wise about the tradeoffs among complexity, quality, and performance. We recommend a few principles to avoid or reduce such complexity which we have adopted here at Quora and hope you find them to be useful.

Acknowledgement

Special thanks and credits go to Xavier Amatriain, our VP of Engineering, who has been driving the ML complexity initiative at Quora and written the vast majority of the key observations and principles described in this blog post.
References: