Common Statistical Operations

Calculating Proportional Values in SQL

Periscope Data

|

September 7, 2016

SQL is great for aggregating: counting, summing, and averaging numbers are straightforward tasks in SQL. When looking at a distribution, it is often useful to look at them in context and compare the relative size of each part of the proverbial pie.

We want to track the number of users that we acquire from each marketing source. To do this we can write a simple query to calculate users by source:

select

date_trunc('week',created_at) as date

, source

, count(1)

from

users

groupby

date

, source

We can plot this data, and track the growth of users by source over time:

It’s great to know how many users are coming from each source, but what if we want to know what proportion of sources are coming from a given source on any given week? Let’s take a look!

Redshift

Redshift has the handy ratio_to_report window function to help us out. The ratio_to_report function divides the current value by the sum of all values in the dictated partition. In our query, we want to divide the count of users from one source by the count of users from all sources.

select

date_trunc('week',created_at) as date

, source

, count(1) ascount

, ratio_to_report(count(1)) over (

partition by date_trunc('week',created_at)

) as proportion

from

users

groupby

date

Postgres

Postgres does not have the same ratio_to_report function, but we can use sum() over in a window function to achieve a similar result. We will first create a common table expression with the count of users:

with users_by_source as (

select

date_trunc('week',created_at) as date

, source

, count(1) ascount

from

users

groupby

date

, source

)

We can then use our users_by_source CTE, with a sum() over function to divide by the total for each week.

select

date

, source

, 1.0*count/sum(count) over (partition by date)

from users_by_source

MySQL

If our database is MySQL, or if we don’t want to use window functions, we can do this with a few simple subqueries. We will first want to calculate the total for each date:

select

yearweek(created_at) as date

, count(1) ascount

from

users

groupby

date

Next, we want the users by source:

select

yearweek(created_at) as date

, source

, count(1) ascount

from

users

groupby

date, source

Finally, we can join and divide:

select

T1.date

, t1.source

, t1.count/ t2.count

from

(

select

yearweek(created_at) as date

, source

, count(1) ascount

from

Users

groupby

date

, source

)

as t1

join(

select

yearweek(created_at) as date

, count(1) ascount

from

Users

groupby

date

)

as t2 on

t1.date = t2.date

Periscope

In Periscope, you can make your bar chart proportional for a visual representation of the data:

You can use Periscope’s pivot tables to quickly calculate the percentage-of-column to show the data in tabular form.

Now you can spend less time writing queries and more time analyzing the results of your data!