Scaling out mysql (or any other database for that matter) means creating a master slave setup where the slaves replicate information from the master. This naturally creates replication lag. Depending on the load of the systems the slaves will be behind master anything between seconds or even minutes and more. This of course creates problems in the application tier. When data consistency in the user interaction is required the application needs to fall back to the master. When the user changes a setting the subsequent request should not show stale data from a slave that is not up to speed yet.

Providing time series data about plays/views/visits is a rather simple problem compared to providing top-k information for arbitrary time windows. Most research found in this area has a different focus of what the usual web startup needs. The following approach seems to work fairly well.

The Problem

For exact order information an ordered list with all counters has to be maintained.

track1, 1040 plays
track2, 100 plays
track3, 10 plays
...

Of course this does not scale if you add another dimension into the mix. Adding...

Many sites provide statistics for their users contents. And many of them provide a graph of how many plays/views/visits the content got over time. This requires a quick aggregate of counts over a certain time window.

One way to provide this information in the short lifespan of a web request is to offload the aggregation from "on read" to "on write". Instead of summing the number of raw events "on read", the aggregates are kept up-to-date....