Marijuana News Insights from the Zefyr Blog

Synopsis: Deep Reinforcement Learning for the Cannabis Retail Market

This week’s hot article, “Deep Reinforcement Learning for the Cannabis Retail Market”, published by Medium and written by Micheal Lanham, takes a look at how retailers looking to take advantage of the marijuana gold rush are going to need tech-based reinforced learning techniques to logically predict future consumer behavior or be forced to deal with ever-shrinking profit margins.

Regular machine learning (the type you label or classify) just would not globally fit. For instance, a model built for Washington state would not carry over to Colorado and vice versa. In fact, in many states, models would not even predict well across its own cities. The solution to accurately model marijuana consumer buying behavior arose through understanding consumer demographics as the larger contributing factor in predictions rather than other marijuana business data. Reinforcement learning was the key to that solution.

“After looking at reinforcement learning for some time I realized we were looking at the problem backward. It wasn’t the dispensary but rather the customer that could be modeled using reinforcement learning to determine the optimum customer happiness or reward. Being able to identify satisfied or unsatisfied consumer markets would be extremely beneficial to retailers.” — Micheal Lanham

Reinforcement learning is time-dependent, therefore, purchases are evaluated over a period. At the end of this period, each purchase is evaluated, summed, and then the total reward is used as a value that trains the customer agent. The reward for each purchase being evaluated is allotted by a random Monte Carlo selection process as discussed further in the article.

As this model begins running, the consumer is buying anything and everything. Over time, the agent gets much smarter and starts to evaluate better buys for the consumer in its demographic region using the same model. As the consumer model gets smarter each purchase is also smarter and rewards increase. In reinforcement learning, rewards train the agent, or autonomous program, how to behave.

The output was amazingly accurate based on what we knew about Denver, CO. The outputs accurately reflected where consumer satisfaction was high and where it was lower. One of the things noticed was a couple of regions with low customer satisfaction values next to multiple dispensary locations. Everywhere else the model seemed to predicate quite accurately, but what was the problem in these regions? Lanham writes:

“Not being from Denver and a Canadian I was a little surprised until I learned through speaking to my American colleagues that those were in fact very depressed regions and my results were not that surprising. It seems that even the dispensaries close to the depressed regions were still marketing for higher-end consumers, thus reinforcing my model further.”

Zefyr works hand-in-hand with Michael Lanham to develop groundbreaking data models for the cannabis industry. We are in the process of commercializing this amazing new innovation. If you’d like to know more, feel free to contact us at info@gozefyr.com any time.