Tuesday, February 14, 2012

Summary Tables with MySQL

I was recently talking with a few people and the concept of summary tables came up as solutions for them. The issue is, they never thought of it. Hopefully this post will help them as well as others.

Summary Tables are not a new concept. They are exactly what they sound like, basically summary of existing data. Aggregated tables, materialized views and summary tables are very dominate in data warehouses. You can also take advantage of these concepts as part of your reporting systems as well.

So summary tables are real tables. So you can take advantage of indexes and etc with them. In the examples I am using, I consider them more of a summary table than aggregated tables . Depending on your application and needs it could grow into more of an aggregated tables and or materialized views situation.

How you separate your data and tables is dependent on your reporting and application needs.

The following is a high level example of how you can use summary tables.

Let us assume that you have a report that is populated with different website traffic aspects.

This could be a report that includes impressions, hits and leads gathered via your site from some other sources. Other in the company, of course, want to spend money where it best makes money. So they need updated information reliably and do not want to wait for it.

Currently you have table for your reporting tools and you calculate ROI per report request via SQL.

This table is populated hourly, so an insert could be similar to the query below. Your system executes queries like this thousands+ of time per day because it is based on different traffic sources as well as affiliate and their affiliate keys. So this table is going to grow dynamically, hours per day * traffic_source_id * affiliate_id * ad_id * affiliates_key.

You have reviewed your explains per query and your SQL query is the fastest you can get it. It just takes time to calculate all the ROIs when reports are spanning a lot of different entities. The web pages are taking to long and people are getting upset.

So you decide to change your tables and create a process to roll up your data across different summary tables using an ETL or Events or cron job based scripts. So your system populates the RETURN_VALUED per ad_id & affiliate_id. You do not know what the affiliate is paying out to other affiliates_keys but you are aware of what you spent and what the lead returns to you.

First start with our lowest common denominator. Here you have data per hour.

Notice I left the field date_time labeled as date_time. This is assuming you might have your report presented via column names and you just adjust the FROM table on selects based on what type of report needed. This is of course dependent on your application.

Notice that your inserts are going to be “group by” to match your key.