Share

Data Science at Instacart means using unorthodox solutions to get an edge

Given a list of destinations, what is the shortest possible route that a traveling salesman can take to reach all of them?

Seems simple? This is known to computer scientists as the traveling salesman problem, and it’s NP-hard, which for all intents and purposes means it becomes practically unsolvable as it scales up.

Because it’s unsolvable, companies like Instacart hire people like data scientist Jagannath Putrevu to find innovative ways to solve it as well as possible. Armed with a degree from the absurdly exclusive Indian Institute of Technology and a Master’s Degree in Operations Research from Georgia Tech, Jagannath has spent a lot of time thinking about the hard problems in logistics, and continues to do so in his role at Instacart.

An unsolvable problem

Instacart’s particular version of the unsolvable problem is this: how many shoppers should it have working at a given time, where should they be going, and what should they be carrying? Too many shoppers working means shoppers are sitting idle, wasting their time and Instacart’s dollars. Too few means customers aren’t able to get delivery at their preferred timeand Instacart is losing customers. The goal is to land as close to the that sweet spot in between, where every shopper is working and every customer is satisfied.

Image via Jagannath Putrevu

Like any problem of this scale, they can never get the “right” answer. There are simply too many variables. But in a big market, Instacart can get very close.

For a while, the team was using a version of the “guess and check” method. They were taking educated guesses based off experience and tweaking them to be more accurate over time as new data came in. It’s the most intuitive way to approach a logistics problem: try something, learn from it, do better next time, repeat.

Perhaps the biggest problem with that strategy is that by focusing only on how their markets are reacting to what they are doing, it becomes more difficult to see how those markets would react to a totally different strategy. In mathematical terms, they were risking moving only toward a local maxima rather than a global one; that is, as their improvements started to diminish, was it because they were hitting the ceiling of what was possible, or because they were bumping their head on the shelf of misunderstanding?

“We thought we were pretty efficient,” Jagannath tells us. “But we wanted to know: what is the best we could do? Is there is an upper bound that we could hit, and how far away were we from the upper bound? We had no answer because we were just flying blind. We were making improvements. We were making progress, but we couldn’t say how close we were to hitting the ceiling.”

Jagannath wanted to find that ceiling.

“We thought, ‘If we were given infinite resources, infinite computing power and a set of orders that we need to fulfill, what is the best we could do? What are the most optimal set of routes that we could design? When do we need shoppers to fulfill these optimal routes?”

Happy accidents

At the time, Jagannath was only tasked with figuring out a way to improve Instacart’s batching and routing—basically, figuring out how in-store shoppers should batch purchase orders for pickup and dropoff to customers by out-of-store drivers. He thought having a benchmark could give them a sense of how efficiently they were operating, as well as a simpler metric to understand.

To do so, Jagannath and the team at Instacart designed a Monte Carlo simulation. For those less into advanced mathematics and/or grocery logistics, Monte Carlo systems consider the likelihood of a variety of inputs of occurring and create a distribution of possible outcomes. You can think of it as an alternate universe generator where each universe represents a possible outcome, which then becomes a single data point in a distribution. They are what someone like Nate Silver uses in his prediction models for both sports and politics, or how they make dating apps in one of the non-scary episodes Black Mirror.

Giving appropriate weights to obvious factors (day of week, time of day, shopper efficiency) and less-obvious ones (projected weather, whether or not Game of Thrones is on), they ran thousands of simulations to figure out what the closest thing to perfection would look like––what were the likeliest outcomes given the way supply and demand vary in a given market? And what was their upper bound for how well they could match supply and demand?

In asking these questions, they realized something: the simulation engine ended up giving them more than they’d expected.

“We were just trying to solve our routing and batching problem, looking at things like whether or not vans would be more efficient than cars,” Jagannath says, “But we also realized that with this simulation engine, we could also see when and where it’s more optimal to have more shoppers. And we knew that solving both the batching and routing and the staffing problems together would be more impactful than working on either one alone.”

Progress was initially slow; Jagannath’s team rolled out their new method in San Francisco, which was operating as one of Instacart’s most efficient markets. But despite not seeing huge results early, they kept at it. “We were staffing in a way that was in some instances totally different than what the company had been doing. But everyone on the engineering and data science side really believed this was a superior mathematical approach and luckily, they were confident our team could prove that this approach really works.”

They were right to be. While improvements in San Francisco did start showing up, as one of Instacart’s most saturated markets, there was more room for improvement elsewhere.

“We realized a less-developed market like Raleigh or Indianapolis might not be at the same efficiency level as San Francisco. That’s where there were huge decreases in shopper idleness or lost deliveries,” Jagannath tells us.

Using data to take on the world

Perhaps what is most exciting for Jagannath and the team at Instacart is that they are going to be able to re-apply the Monte Carlo method elsewhere. He sees opportunities ranging from drawing more data-driven geographic zones, adjusting customer delivery windows, revamping marketing to target underperforming areas, and providing incentives to customers to make for a more efficient overall delivery system.

When it comes to using its data to build a better business, there is no shortage of possibilities open to Instacart.

“Instacart has always been unique in that we are not just working with one specific store. Partnering with so many retailers allows to pool demand density, which will allow us to operate at a higher level of efficiency and hence lower costs. And because of those partnerships, we have a more robust, diverse dataset. So as we go forward, getting the most out of that data will give us a huge advantage over others in the grocery delivery space.”

For more on the topics explored in this article, read Jagannath’s post on it here.