Analyzing visitor flows with Google’s chart tool in R

Let’s say you have a website or an app and you would like to know how your visitors navigate through it. I came across the googleVis package to solve this task. It provides you with an interface to Google’s chart tools and lets you create interactive charts based on data frames. In this package you will find a function to create sankey diagrams, which are a specific type of flow diagram. Usally the weight of an arrow is shown proportionally to the flow quantity. Let’s put this into practice.

First we need some data. Imagine you have a data set were you have all the page accesses from your visitors stored in a simple data frame.

UserID

Timestamp

Screen_name

1947849340340

01.02.2017 12:55:02

Main Screen

1947849340340

01.02.2017 12:55:05

My Prizes Screen

1947849340340

01.02.2017 12:55:10

Tutorial Screen

1947849340340

01.02.2017 12:55:20

Reminder Screen

1947849340340

01.02.2017 12:55:22

Terms Screen

1947849340340

01.02.2017 12:55:42

Main Screen

1453754950034

01.02.2017 21:14:22

Main Screen

1453754950034

01.02.2017 21:14:23

My Prizes Screen

1453754950034

01.02.2017 21:14:29

Prizes Screen

1453754950034

01.02.2017 21:14:44

Prizes Screen

…

…

…

To build a sankey diagram we will need to transform our table from long format into visitor paths. As you can see from the code below I was using a mix of simple dplyr code and the seqdef function from the TraMineR package, which lets you create a sequence object. I totally recommend checking out TraMineR if you working with any kind of sequence data, as it provide a lot of different function for mining, describing and visualizing sequences data.

For plotting purposes I needed to transform the data back to long table format. I also changed the states which named % to END, just to make sure that this means a customer’s journey has ended at this point. After calling the gvisSankey function your browser will open and you will have your neat visitor flow diagram.

And of course you can use sankey diagrams to visualize any type of sequence data. Make sure you check out my Github for the full code along with other projects.