We discuss the relevance of "Purchase Graph", Slice platform, analytical insights from mining all activity around a customer's purchase, experimentation strategy, experience of working as a data scientist and more.

Conal Sathi is responsible for using machine learning and data mining algorithms to bring structure to Slice’s purchase data. Prior to Slice, he earned his B.S. in Symbolic Systems and M.S. in Computer Science at Stanford University where he did research in the intersection of social network analysis and network language processing. He examined sentiment flow through hyperlink networks as well as building a content prediction system for Twitter users. You can see how predictable his content is by following him @csathi.

Here is my interview with him:

Anmol Rajpurohit: Q1. Can you please explain the term "Purchase Graph"? Why is it so relevant for online retail companies?

Conal Sathi: In this age, graphs are becoming quite popular for exploring a domain. Google popularized the ‘information graph,’ by modeling web pages as nodes and hyperlinks as edges. Facebook popularized the ‘social graph’, treating each user as a node and friendships as edges. In the purchase graph, you have products as nodes and affinities between products as edges. You can determine the affinity between purchases using collaborative filtering, i.e. examine the number of shoppers by who bought both products compared to the number of shoppers who bought each individual product.

This is crucial for not just online retail companies, but for all kinds of companies. If you understand the affinities between products, you have opportunities to advertise, cross-sell and up-sell. I’m sure most users are not fully aware of the entire catalog from all retail companies, so using the purchase graph, you can hint at what purchases the users might be interested in and what they’re like (and in an automated method!).

AR: Q2. What all purchase attributes play a role in the Purchase Graph - total bill amount, number of items in the cart, returned items, etc. ? What kind of insights can a user get by analyzing one's purchase graph built on Slice platform?

CS: A Slice user can learn a lot about their own shopping habits - how much they spend, what they buy the most of--and we are currently working on new features that will enable Slice users to get new utility from their own purchase data. That said, most online shoppers aren’t as interested in digging deep into their own data as we might be...so we dedicate most of our product features to help them keep track of their receipts, monitor their shipments and deliveries, be informed of price drops and get alerts when something they’ve bought has been subject to a recall.

The way we really bring our data to life, though, is through our partners, who can tap into the Slice API to create new features and experiences. Slice captures all activity around a consumer’s purchase--what they bought, how much they paid, how they paid, where they live and where they shipped it-- over time, and how these behaviors change. We enable our partners--retailers, web publishers and service providers--to harness this valuable data to create personalized customer experiences, such as product recommendations and targeting offers based on a visitor’s unique shopping history. And we’re actively working on new products to bring even more insights to light, which we plan to launch in the coming months. Stay tuned!

AR: Q3. Is there any plan in future to integrate information about offline shopping (for example, users scanning and uploading their paper receipts), in order to make the analytics more valuable?

CS: We actually looked into that and found that paper is quickly becoming a thing of the past. I’m sure you’ve noticed this as well, as a large number of retailers are beginning to e-mail in-store receipts. Companies like Apple, Nordstrom, Bloomingdale’s and Macy’s, for example, allow users to have the option of getting their receipt e-mailed to them instead of or in addition to a paper copy of their receipt. In addition, we are seeing more e-receipts with the rise of mobile payments, such as PayPal and Square. We believe more and more retailers will send more e-receipts as it is more environmentally friendly, less hassle for the customer, and it facilitates a communication channel between the retailer and the customer.

AR: Q4. A key component of all e-commerce analytics is experimentation such as A/B testing. What kind of experimentation does Slice do in order to improve it's understanding of users' e-commerce activities? What are some of the best insights you have achieved from such experimentation?

CS: In the machine learning team, we build models for prediction and categorizing data. As we iterate on these models, we establish metrics from the start, so as we experiment with new features and algorithms, we learn which ideas help and which ideas don’t help.

It’s interesting to see that sometimes ideas that we intuitively think should improve the models are not always correlated with ideas that actually improve the model. This is why metrics are so more important. That said, you cannot just blindly rely on metrics. Sometimes you need to look deep into the data and understand what’s going on as you make changes to the algorithms.

AR: Q5. Do you observe any hesitation amongst users towards Slice because of privacy concerns? Are users comfortable sharing their email inbox, given the immensely private nature of information stored in emails?

CS: We have not, and here’s why -- we are clear with our customers that our technology only identifies and analyzes the receipts in their inboxes--and nothing else--and that we never, ever release their personally-identifiable information, full stop.

AR: Q6. What motivated you to work in data analytics? What aspects of your job do you like the most and what are the aspects that you do not enjoy much?

CS: In the past several years, the sheer volume of data has grown immensely--it’s mind-boggling. In addition to that, we now have better technology to store and mine this data to come with interesting analyses and to improve the lives of people. What’s most amazing--and inspiring-- to me is that we can do much of this in an automated fashion through machine learning and software engineering!

What I like most about my job is dealing with such a unique, high-definition data set – a longitudinal, cross merchant purchase graph as well as working with smart people in a small startup that moves quickly.

What I don’t enjoy? Picking restaurants to eat on University Ave in Palo Alto! There are just too many great options. Also, I do not enjoy when my coworkers (rarely) beat me in ping pong--which is pretty serious at Slice! Seriously speaking though, sometimes data sets can be very noisy and messy, especially with the data set comes from so many sources. This both makes the problem fascinating and sometimes frustrating as it makes it harder to build algorithms and training sets that can generalize. But that is what makes what we are doing so valuable and so challenging.

AR: Q7. In your Data Science career so far, what is the best advice that you have got? Why is it so important?

CS:

The best advice I got was that before iterating on a machine learning engine, make sure to establish metrics. What should the engine do and what is most important? Once you figure that out, quantify it. This will allow you to iterate quickly and help you understand whether a new feature or algorithmic change has improved the engine or not.

AR: Q8. What was the last book that you read and liked? What do you like to do when you are not working?

CS: The last book I read was The Tipping Point by Malcolm Gladwell. It was fascinating to read a social scientist discuss how epidemics and ideas spread. While reading, I couldn’t help but turn his problem into a graph with nodes and edges. While not working, I like to sing and play the piano, as well as hiking and biking.