Whether you are deeply versed in the intricacies of SEO, or simply curious to optimize your web property’s organic search performance, you must likely are involved with the Google Search Console and its analytics. It is a very useful (and free!) tool to understand how users reach your website through a Google Search, and will help you ground your intuitions with facts: diagnose traffic and search position changes, find opportunities for optimizations and content ideas. Must Have for webmasters and digital marketers. Here is how it looks like for the very blog you are reading:It even comes now with a fresher look and better filter ergonomics:
That new version lets you access up to the last 16 months worth of data, all the way until today-2 days. The legacy version was offering only 90 days. Each transaction is structured around 7 dimensions:

Date

Query

Page

Country

Device

Search Type

Search Appearance, of little use

and 4 indicators:

Impressions: How many times was the web page served into the displayed results of the Google Search

Clicks

CTR: Click Through Ratio calculated from Clicks/Impressions

Position: How high on the Google Search results page did the page appear

Overall the Web UI is decent and responsive. Alas, it suffers from grouping and sampling issues, just like its Google Analytics peer. Let’s take an example: you can’t display a list of all the keywords in Search Analytics and their corresponding landing pages. You would have to filter query-by-query to see their associated landing pages, or page-by-page to see their queries. Here is an illustration of my top 5 Queries:And now my Top 5 Pages:But impossible to get the pages for my top 5 queries…

This sounds like a mission for Tableau:I can then get all the features of Tableau to build whatever I need with the whole 16 months of data, that I can stockpile going forward:

Achieving that type of view requires to feed Tableau with good transaction data, out of the Google Search Analytics API (GSA API for short). That API will give us full control over filters and grouping, and we will be able to avoid sampling as well, since we won’t group by page : “When you group by page and/or query, our system may drop some data in order to be able to calculate results in a reasonable time using a reasonable amount of computing resources”. The strategy we will follow is to make one call by day, as detailed here. There are 2 ways to get the data to Tableau. we will cover the quick & dirty approach through Google Sheet, appropriate for a one shot analysis, and the thorough approach through API in Alteryx, appropriate for long term operations.Continue reading →

Tableau Prep is positioned as a standalone tool to “combine, shape, and clean data for analysis”. That puts Tableau Prep squarely in the same category as Alteryx, whose core functionality is data prep. The twist with the launch of Tableau Prep is the pricing strategy: it looks like Prep is bundled with existing Tableau Desktop & Server licenses and subscriptions.

Why is Tableau bundling a separate product for essentially free? Why not incorporating the Prep features into the core Tableau Desktop experience, if Prep is not intended to generate a new line of revenue? My pure speculation is that at that stage, Prep as a V1 is not ready to stand on his own, and is still too weak to prevail in competitive situation. It will eventually mature into a solution worth paying for, but today, it is still too limited. I have been kicking the tires and will share my first impressions and hopefully justify my assessment.

When trying to get a better grip on the performance of a large number of customers (or accounts, or pupils for the matter), and progression over time of their count, their revenue or their conversion from one status to another, it really helps to organize them in cohorts, to compare them fairly. Namely, you would want to compare the 2017 revenue generated by customers who have been transecting since 2014, with the revenue of those who transact since 2016, as they are at different stage in their customer journey, hopefully. For illustration, the Tableau SuperStore data set has 793 customers and 5,009 orders over 4 years:

From that perspective, the growth of the number of customers looks steady and healthy, but is it really? Are we acquiring customers in 2017 as we did in 2015? Critical details, such as the number of orders per customers are buried within that mass of data.

A better angle would be to depict the progression of those customers by cohorts, taking for reference the date of their first order to group them, and plot how are they growing in subsequent years.

Have you ever found yourself in a situation where you need to filter a set of transactions based on categories which need to be calculated from that same set?

Let’s say you have a data set containing all the customer responses triggered by your Marketing activity and you need to rank Marketing campaigns performance based on the number of first touch responses they originated.
Another example more familiar to Tableau users: you have a full list of orders with line items from the Superstore data set and you need to filter customers based on the largest item’s product family they bought on their most recent order… Or how about showing on a map the revenue of customers who have been inactive for a year or more, based on the shipping address of their last order?

Sounds like an awful lot of SQL steps, doesn’t it?
All those scenarios revolve around the same meta problem: the need to switch between different levels of aggregated data within the same process. In other words, the filter is computed on a subset of the transactions, and we need to run that filter without destroying any information, nor storing the same information multiple times, which could work, but would be resource intensive and frankly inelegant.
The better approach requires a Data Base engine (sorry Excel cave folks!): by combining a series of INDB tricks and SQL functions, this can be done in a couple of easy clicks with great performance, at no additional storage cost. Continue reading →

Why a post about Snowflake? What does Alteryx and Tableau have to do with Snowflake?

Here is some context: as I was planning to launch my Data Lake consulting business, which is now launched, I was looking for a place to store and share prototypes, which need to be queried from Tableau with high performance. Cloud databases offer the flexibility in terms of capacity, performance, pricing and portability that I was looking for.

The leaders in that field, AWS and Azure, offer a lot of solutions, but I decided to take a closer look at Snowflake, not only because it is a hot startup, but also because it promises Uncompromising simplicity, read: no need for a DB Admin! I consider that feature a huge advantage over Redshift, which takes quite some technical skills to get even started…

My experience testing the platform proved that that specific promise of easy administration is met, even though that simplicity does not extend to the overall experience. To put it another way, Snowflake is not Citizen Analyst ready yet, but with a bit of tweaking, and the research I will share below, it comes pretty close!

To illustrate the step by step process, I will use the business scenario of a previous post: storing 27k rows of review data for a Word Cloud analysis.Continue reading →