The Care & Feeding of Your Machine Learning Apps

by BDW Editor on July 15, 2016

Machine learning is absolutely cutting edge. It’s bright, shiny, and new, and only the coolest of cats are even doing it yet beyond a few simple test environments. But just like that lawn mower you bought back in ’04 or the PC you scored on Black Friday in ’13, ML (machine learning) won’t be new and shiny for long. In fact, the models and algorithms you developed just last year are already probably starting to deteriorate. Maintenance isn’t as fun or interesting as building sexy new ones, but over time the performance declines unless algorithms are monitored and tweaked on a regular basis. Here’s what you need to know to properly feed and care for your ML apps.

The Concept of Technical Debt

Technical debt is a lot like financial debt. If you work hard and pay it off, it likely won’t be a problem. But if you ignore it, the debt snowballs and can crush you.

Every developer is aware of the concept of technical debt. Technical debt is all of the known issues that accumulate during the development process that are put on the backburner to be addressed after the project is deployed. If things go well, developers release a relatively stable, solid app, and then go about the business of resolving the technical debt issues. In fact, user feedback after the app is released can be essential in finding and addressing the problems. Technical debt can also occur when the app is patched or updated. ML apps are the same. If you don’t address the technical debt early on, it piles and mounts until the app becomes hopelessly useless and root causes are difficult to find.

Environmental Conditions Can Affect the Performance of Your ML Apps

Even if all the technical debt is resolved, changes environmental conditions can affect the performance of your ML models and algorithms. Algorithms are essentially asking the data questions, so changes in the environment can affect how algorithms produce answers. Think of it this way, if you and your significant other just finished dinner, and they said, “Well, that was nice,” you’re going to take it very positively. On the other hand, if you just finished an argument, and they say the exact same thing, you’ll take it quite differently. As the environment changes, so does the interpretation of information. That’s why algorithms can’t be “set it and forget it” kinds of things.

ML models and algorithms aren’t the only factors that need to be considered. The types and quality of the data can also affect the performance of your algorithms, and deserve some attention, as well. It’s kind of like a person’s diet: after the holidays, people start being very careful about what they consume. Over time, they become more lax, and soon they’re eating as badly or worse than before. Data is like that, too. At first, you pay lots of attention to the quality of data you’re feeding your ML apps. Over time, you become less careful, and soon your ML apps are gorging on the digital versions of Snickers bars and Mountain Dew — data with lots of duplications, errors, omissions, and probably even some high fructose corn syrup. An ongoing data cleansing plan, plus good data governance, policies and procedures, are essential for high-performing ML apps.