Databases

Faster Redshift Queries with Materialized Views — Lifetime Daily ARPU

The best way to make your SQL queries run faster is to have them do less work, and a great way to do less work is to query a materialized view that’s already done the heavy lifting.

Materialized views are particularly nice for analytics queries, where many queries do math on the same basic atoms, data changes infrequently (often as part of hourly or nightly ETLs), and those ETL jobs provide a convenient home for view creation and maintenance logic.

Redshift doesn’t yet support materialized views out of the box, but with a few extra lines in your import script (or a tool like Periscope), creating and maintaining materialized views as tables is a breeze.

Lifetime Daily ARPU (average revenue per user) is common metric and often takes a long time to compute. Let’s speed it up with materialized views.

Calculating Lifetime Daily ARPU

This common metric shows the changes in how much money you’re making per user over the lifetime of the your product.

<td rowspan=2 style="width: 41%;">Lifetime ARPU (Date) =

Sum of purchases up to Date

Unique user count up to Date

For that we’ll need a purchases table and a gameplays table, and the lifetime accumulated values for each date. Here’s the SQL for calculating lifetime gameplays:

with

lifetime_gameplays as (

select

dates.d,

count(distinct gameplays.user_id) as count_users

from (

selectdistinct date(created_at) as d

from gameplays

) as dates

innerjoin gameplays

on date(gameplays.created_at) <= dates.d

groupby d

),

The range join in the correlated subquery lets us recalculate the distinct number of users for each date.

Here’s the SQL for lifetime purchases in the same format:

lifetime_purchases as (

select

dates.d,

sum(price) as sum_purchases

from (

selectdistinct date(created_at) as d

from purchases

) as dates

innerjoin purchases

on date(purchases.created_at) <= dates.d

groupby d

)

Now that the setup is done, we can calculate lifetime daily ARPU:

with

lifetime_gameplays as (...),

lifetime_purchases as (...)

select

lifetime_gameplays.d as date,

round(

lifetime_purchases.sum_purchases /

lifetime_gameplays.count_users

, 2

) as arpu

from lifetime_purchases innerjoin lifetime_gameplays

on lifetime_purchases.d = lifetime_gameplays.d

orderby lifetime_gameplays.d

That’s a monster query and it takes minutes to run on a database with 2B gameplays and 3M purchases. That’s way to slow, especially if we want to quickly slice by dimensions like what platform the game was played on. Plus, similar lifetime metrics will need to recalculate the same data over and over again!

Easy View Materialization on Redshift

Conveniently, we wrote our query in a format that makes it obvious which parts can be extracted into materialized views: lifetime_gameplays and lifetime_purchases.

We’ll fake view materialization in Redshift by creating tables, and Redshift makes it easy to create tables from snippets of SQL:

createtable lifetime_purchases as (

select

dates.d,

sum(price) as sum_purchases

from (

selectdistinct date(created_at) as d

from purchases

) as dates

innerjoin purchases

on date(purchases.created_at) <= dates.d

groupby d

)

Do the same thing for lifetime_gameplays, and and calculating Lifetime Daily ARPU now takes less than a second to complete!

Redshift is especially great for this kind of optimization because data on a cluster usually changes infrequently, often as a result of hourly or nightly ETLs.

Remember to drop and recreate these tables every time you upload data to your Redshift cluster to keep them fresh. Or create views in Periscope instead, and we’ll keep them up to date automatically!