Title: Describing and Forecasting Video Access Patterns
Authors: Gonca Gursun, Mark Crovella, Ibrahim Matta
Date: November 10, 2010
Abstract:
Computer systems are increasingly driven by workloads that
reflect large-scale social behavior, such as rapid changes in the
popularity of media items like videos. Capacity planners and system
designers must plan for rapid, massive changes in workloads when such
social behavior is a factor. In this paper we make two contributions
intended to assist in the design and provisioning of such systems.We
analyze an extensive dataset consisting of the daily access counts of
hundreds of thousands of YouTube videos. In this dataset, we find that
there are two types of videos: those that show rapid changes in
popularity, and those that are consistently popular over long time
periods. We call these two types rarely-accessed and
frequently-accessed videos, respectively. We observe that most of the
videos in our data set clearly fall in one of these two types. For
each type of video we ask two questions: first, are there relatively
simple models that can describe its daily access patterns? And second,
can we use these simple models to predict the number of accesses that
a video will have in the near future, as a tool for capacity planning?
To answer these questions we develop two different frameworks for
characterization and forecasting of access patterns. We show that for
frequently-accessed videos, daily access patterns can be extracted via
principal component analysis, and used efficiently for
forecasting. For rarely-accessed videos, we demonstrate a clustering
method that allows one to classify bursts of popularity and use those
classifications for forecasting.