Better Machine Learning Model Management: A Streaming Approach

There’s a surprising trick for greatly increasing the chances of real impact, true success with many types of machine learning systems, and that is “do the logistics correctly and efficiently.” That sounds like simple advice—it is—but the impact can be enormous. If the logistics are not handled well, machine learning projects generally fail to deliver practical value. In fact, they may fail to deliver at all. But carrying out this advice may not seem simple at all.

That’s where the new rendezvous architecture, coupled with suitable technology to support this style of work, comes to the rescue, as part of the solution to the problem of handling logistics for a range of machine learning types. Some of the problems that underlie the need for better model management include the fact that machine learning models – even successful ones – are not singular. Successful endeavors need to manage dozens or hundreds or more models at the same time, often while making data available across a geo-distributed data centers either on-premises or in cloud.

We’ve written a short book about the challenges and solutions of machine learning logistics. Here’s an excerpt that introduces a solution based on streaming in what might traditionally have been a call and response style system.

Rendezvous Style

We can solve these problems with two simple actions. First, we can put a return address into each request. Second, we can add a process known as a rendezvous server that selects which result to return for each request. The return address specifies how a selected result can be returned to the source of the request. A return address could be an address of an HTTP address connected to a REST server. Even better, it can be the name of a message stream and a topic. Whatever works best for you is what it needs to be.

Using a rendezvous style works only if the streaming and processing elements you are using are compatible with your latency requirements.

For persistent message queues, such as Kafka and MapR Streams, and for processing frameworks, such as Apache Flink or even just raw Java, a rendezvous architecture will likely work well—down to around single millisecond latencies.

Conversely, as of this writing, microbatch frameworks such as Apache Spark Streaming will just barely be able to handle latencies as low as single digit seconds (not milliseconds). That might be acceptable, but often it will not be. At the other extreme, if you need to go faster than a few milliseconds, you might need to use nonpersistent, in-memory streaming technologies. The rendezvous architecture will still apply.

The key distinguishing feature in a rendezvous architecture is how the rendezvous server reads all of the requests as well as all of the results from all of the models and brings them back together.

Figure 3-4 illustrates how a rendezvous server works. The rendezvous server uses a policy to select which result to anoint as “official” and writes that official result to a stream. In the system shown, we assume that the return address consists of a topic and request identifier and that the rendezvous server should write the results to a well-known stream with the specified topic. The result should contain the request identifier to the process sending the request in the first place since it has the potential to send overlapping requests.

Figure 3-4. The core rendezvous design. There are additional nuances, but this is the essential shape of the architecture.

Internally, the rendezvous server works by maintaining a mailbox for each request it sees in the input stream. As each of the models report results into the scores stream, the rendezvous server reads these results and inserts them into the corresponding mailbox. Based on the amount of time that has passed, the priority of each model and possibly even a random number, the rendezvous server eventually chooses a result for each pending mailbox and packages that result to be sent as a response to the return address in the original request.

One strength of the rendezvous architecture is that a model can be “warmed up” before its outputs are actually used so that the stability of the model under production conditions and load can be verified. Another advantage is that models can be “deployed” or “undeployed” simply by instructing the rendezvous server to stop (or start) ignoring their output.

Related to this, the rendezvous can make guarantees about returning results that the individual models cannot make. You can, for instance, define a policy that specifies how long to wait for the out- put of a preferred model. If at least one of the models is very simple and reliable, albeit a bit less accurate, this simple model can be used as a backstop answer so that if more sophisticated models take too long or fail entirely, we can still produce some kind of answer before a deadline. Sending the results back to a highly available message stream as shown in Figure 3-4 also helps with reliability by decoupling the sending of the result by the rendezvous server from the retrieving the result by the original requestor.