I co-host a monthly meetup called Design+AI, where we invite an AI practitioner to share their work and explore the design decisions that went into a specific project among a small group of designers, researchers, and engineers. For the August edition of Design+AI, we welcomed Inmar Giovoni — current Autonomy Engineering Manager at Uber ATG, and former Head of Data Science at Kobo — to share a case study with us. Inmar talked us through the thinking and decisions her Kobo team made when using reinforcement learning to suggest ebooks to their customers. Here is a roundup from the discussion.

Multi-Armed Bandit

Inmar’s team wanted a data-driven way to determine the optimal arrangement of book carousels on the Kobo website to drive book purchases. This was a challenge in terms of knowing which variables to include on the website, as well as what arrangement to display these variables, per customer segment. So, she used the multi-armed bandit algorithm.

Multi-Armed Bandit (MAB) algorithms are a form of reinforcement learning. And the MAB problem comes from slot machines, a.k.a. one-armed bandit. Imagine you are at the casino in front of a row of slot machines. You want to maximize your winnings and have a limited amount of money to gamble. Since you have no prior knowledge about which machines pay out more often, you just start playing them; adjusting which machines you play, in what order, and how often — in order to bias towards playing the machines that maximize your reward. That is the concept behind MAB. That you try a number of options, but once you have a sense of which one is offering the most success, you play that one more than the others (also known as the exploration exploitation tradeoff — more on that below).

So, Inmar’s team took this method and applied it to organizing the variables on their e-book website — to figure out which book carousel combinations and recommendations resulted in more online book sales.

MAB is different (better!) than AB Testing in two major ways.

One of the most exciting things about MAB is that it enables you to boost intended outcomes while testing is still live. For Inmar’s team, this meant that once a particular combination of book carousels (genre, popular now, recommended for you) started to perform well, the algorithm would boost that combination as well as decrease the number of customers seeing a combination that was performing less well. This is interesting for companies as it means that money lost during testing is minimized. In contrast with a/b testing, even if option b is yielding a less desirable outcome (less sales), half of customers are shown this option until the end of the test — meaning that the company is potentially missing out on sales that they could have captured if they had sent customers to option a.

As you might imagine, maximizing profits while testing is desirable for companies. MAB also has healthcare applications — for example, when running clinical trials for a new pharmaceutical drug. MAB used in this setting means that, if the new drug being tested is creating good results for participants, the trial administrators can allow more people to get actual medicine and fewer participants receive the placebo. Thus, MAB has the potential to improve health outcomes for people, and maybe even save lives.

Furthermore, MAB can be useful where there are numerous possible combinations of variables to consider — more than would be possible to test in an a/b test or very costly and time consuming to do so. This was the case for the Kobo website; there were so many combinations of different ways to display book options on the website that it would have taken too much time to test all of these using an a/b test.

Contextual Multi-Armed Bandit

Inmar’s team recognized the diversity of people’s reading tastes and wanted to create more personalization in their algorithm — to make it smarter and more delightful for customers over time. So, they tweaked the MAB to be a Contextual Multi-Armed Bandit. How it worked was: if I am someone who normally reads mystery novels (bucket a), sometimes reads leadership and management books (bucket b), and once in a blue moon reads a biography (bucket c), the contextual MAB starts to take these preferences into account.

This tweak means that the algorithm would generally show me books and carousels that match the segmentation that I am closest to (bucket a and b), but occasionally it would recommend books out of categories that I very rarely choose from (bucket c). This starts to mimic what people do in real life and allow for the serendipity of seeing books that I might be interested in but are outside of my usual buying behaviour.

Some Terminology

Cold Start Problem

What happens when a company wants to recommend a book, a movie, a song to a new customer — but they don’t know anything about what the customer likes/dislikes, their behaviours, their purchasing habits, or their routines? That is the cold start problem: the challenge of not having information about a new user/customer when they first join a platform, and thus, it is difficult to segment them. It is also the reason that recommendations improve the longer you use the platform — the better the algorithm knows you and your preferences, the better the recommendations can be.

Exploration / Exploitation

Imagine you’re at a restaurant. Do you order the thing on the menu that you know you’ll like? Or, do you risk trying something new in case you’re missing out? At what point do you decide that you’ve tried enough different things and you just want the turkey burger? There is a similar trade-off happening with the MAB. The exploration exploitation trade-off in reinforcement learning is illustrated in the multi-armed bandit problem, where the algorithm must decide between acquiring new knowledge and maximizing reward.

Where else have you seen the Multi-Armed Bandit used? What about the Contextual MAB?

-Satsuko

References

In writing this post, I discovered a great podcast on data science and machine learning. They had some great episodes on the multi-armed bandit and reinforcement learning.

Imagine a smart car virtual assistant that would help you with directions and finding parking. This is exactly what Jane Motz and Geoffrey Hunter of Tribal Scale designed and prototyped. This post shares Jane and Geoff’s top tips when designing a smart car console: considerations around the cognitive load of the driver, entering and exiting the modality, regional linguistic variations, and how to build off what other’s have already done.

This post explores machine learning considerations of two products. The first is Wattpad, an ai-augmented storytelling platform that, on the reader-side makes story recommendations and on the writer side assists with story tagging. And, the second is Dango, an ai-first smartphone emoji-suggester, which required designers to develop algorithm empathy (the algorithm only knows what you tell it) and required Dango's CEO to hand label 30,000 data points because the appropriate data set didn’t exist. This post was co-authored by Satsuko VanAntwerp and Scott Wright.

Toronto is poised to become a hot spot for the fast-growing field of AI. While most of the current practitioner level conversation is engineering-centric, we’re interested in the intersection of AI and design. How is AI shaping human experiences? What does the work of designers look like in an AI-first future? This post explores a number of topics, including:

What do we mean by the term AI?

Our fear of sentient AI taking over stems from our sense of self importance as a species.

Designing for AI forces us to self reflect and confront the systemic bias in our societies.

The process of creating influential documents turns out to be incredibly political. The mere existence of a document makes it’s the ideas it contains powerful. Documentation equals sense making - rather than simply recording pre-conceived ideas.Documents build and bound communities. And, Authors hold power. In this post, I dig into these reflections about the secret powers and politics of documentation.

The most common criticism of focus groups is that people's opinions will be swayed or silenced by whoever is the more dominant voice in the group. But what if we accept that opinions are constantly shifting, then this is just part of what group discussion is about. So then what we want to watch for in focus groups are the dynamics and social norms about the topics. At what point are people being silenced, how and why might that be happening, what can that tell us about attitudes and societal norms? In this post, I dig into these and other reflections about user research methods.