Reaching for the Sky at our latest recommender systems meetup

Insights from the expert talks at our fifth RecSys London get-together

At Bibblio we recently organised our fifth RecSys London Meetup, hosted by the team at Sky. The meetup took place at their offices at Thomas More Square, offering a great view of St Katharine Docks.

The presenters for the evening were Dr. Jian Li (Principal Data Scientist at Sky), Preriit Souda (Data Science Consultant and Marketeer) and Adam Rees (Senior Research Engineer at Sky).

Read on to get the highlights of their talks, and grab their presentation slides too.

---

Content discovery using moods

The first talk of the evening was by Dr. Jian Li, Principal Data Scientist at Sky. He's the manager of the data science team responsible for machine learning research, innovation and product development for Sky's content discovery services.

Jian kicked off his presentation with a short intro. on how recommendation engines play an important role in helping customers to discover content that matches their particular interests. An interesting use case for Sky is to figure out when to promote new (but basically non-viewed) content to customers:

While they have experimented extensively with a large set of algorithms to improve accuracy, there are still lots of areas to improve. One area Jian is interested in is how to establish natural, smooth and effective communication between customers with their services. His question is how a content discovery service could respond naturally and accurately to a customer’s natural expression, or mood:

Mood has two aspects, Jian explained: firstly, it’s the natural expression of a customer’s feelings. For example "I want to watch something funny". Secondly, it's the kind of feeling that a piece of content can evoke, i.e. this film will make a customer laugh a lot. The question is how do we connect them.

You could hard-code moods like 'funny' into a movie genre, e.g. 'comedy', but this creates some restrictions. On the one hand it's not true that only comedies make you laugh, on the other, a mood like 'exciting' can't be translated into a single genre because lots of things can be exciting. During the project Jian wondered whether they could stop this kind of manual translation and directly rank content by how funny, exciting etc. it is. Here's what he and his team came up with:

They started with a number of pieces of content, each associated with a number of keywords describing the nature of the content and also a list of mood labels. They built a model to learn the correlations between moods and keywords. The outcome is a set of semantic representations of moods. Every mood is represented by all keywords and each key word is scored to indicate how relevant it is to a particular moods. If a mood is not relevant, the score will be low (but there'll still be one).

The completed content mood profiles gave the team the ability to create a large number of recommendation features. For example, a customer can query one mood, a combination of multiple moods or even adding preferred weights to the combination. For example, "I want to watch something that's really exciting, somewhat funny and scary too." Jian's team is currently working on the UX so the user can query this intuitively.

Knitting data to create a brocade of strategic insights

Our second speaker was Preriit Souda. He's a data science consultant and marketeer, and has a sophisticated social media analytics toolkit. He and his team have analyzed around 20+ TB(!) of post-processed social media data. In his talk he focused on how you can use metadata to deliver improved customised advertising experiences.

Preriit kicked off his presentation by asking the attendees how much can you find out about a person by looking at just one of their tweets. The answer is quite a lot:

Slide from Preriit's presentation showing all the data points you can gather based on his 4-word tweet with an image.

As you can see in the above slide, there's mention of different data linkage groups such as image mining and information connected to text, weather, location and demographic attributes. See below for the overview of Preriit's framework. On the right hand side you'll see other data sources you can use, which are not directly linked, but very useful thematical databases to find out more about people who are like your research subject.

Preriit is very skeptical about the effectiveness of marketing surveys, and he suggested that using these novel techniques could effectively make them redundant.

So how can you use this data to solve companies' strategic problems? Preriit spent the last part of his presentation going through a couple of use cases. One of the use cases was a restaurant with falling sales. They used the data sources below to figure out what was going wrong:

By analyzing the data (and without having to actually ask anyone directly), they found that the restaurant was struggling to differentiate itself. There were also many questions over its premium pricing in the chatter and review data they picked up. Uncovering these two main issues allowed this client to take informed action.

The restaurant kicked off a re-branding exercise and added new, affordable, seasonal dishes to its menu. These had actually been wishes expressed by people online: using data likes this offers a very practical way to help you learn about problems and come up with solutions.

Exploring movie posters with neural networks

Last up was Adam Rees, Senior Research Engineer at Sky, discussing his side project: using neural networks to find patterns in the design of movie posters. His work in this space was inspired by a blog post he read about movie poster clichés. And there are many:

Even though the creativity is often questionable, Adam stressed the importance of movie posters and the role they play in performance, including on Sky's streaming service. On their Cinema platform, the poster images are used as the main navigation as users scan for something to watch:

With a Coursera course in machine learning under his belt, and interested in finding out more about the make up of Sky's movie poster offering, Adam decided to first have a go at clustering by similarity. He used the final hidden layer of a pre-trained VGG 19 neural network to extract the features which were then used to cluster the movie posters:

By using the 'tools' Jupyter (Python), Keras and Tensorflow, and the T-SNE (t-Distributed Stochastic Neighbour Embedding) algorithm, Adam could plot the images in a 2D environment. Here's one of the beautiful first results:

Using the Neural Style Transfer technique to amplify the clustering, Adam identified groups such as posters with as focus point 'dogs', 'Marilyn Monroe characters', 'guys with hats' and 'gangs'. Other major themes were described by him as 'Gritty Crime Dramas', 'Romantic Boxes' and 'Spooky Houses':

Adam isn't sure where he will take the project next. An idea suggested in the Q&A was to actually team up with Jian's research team, to discover whether there's a relationship between the imagery and text or mood data allocated to it. Could it help in creating better recommendations or open the pathway towards a personalized movie poster?