Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

There have been a number of interesting projects in the last few years that have mined big data sources to try and spot trends in how people behave. A recent study from academics from the University of Bristol’s ThinkBIG project has done a similar thing to try and spot patterns in media content and consumption.

The analysis revealed that people display predictable, periodic patterns in their behavior. Interestingly, however, the authors believe that these patterns largely lay hidden unless large numbers of people are analyzed over a long period of time, which of course has been kinda tough to do historically.

Modern approaches to big data analysis, however, means that researchers can unify newspaper content, thus allowing several publications to be analyzed at once over a very long timeframe. This can then be coupled with social media content, whether from Twitter or Wikipedia.

“What emerges is a glimpse at the regularities in our behavior that are hidden behind the day-to-day variations in our lives,” the authors say.

“Our two papers have shown that by analyzing massive data sets of modern and historical news, social media and Wikipedia page views, we can obtain an unprecedented look at our collective behavior, revealing cycles that we certainly suspected, but that have never been observed before,” they continue.

Media Patterns

The analysis looked at 87 years worth of newspaper publications between 1836 and 1922. What emerged was a consumption pattern that appeared heavily influenced by the weather and the seasons.

In a less globalized time, this is perhaps to be expected, with the foods we ate and the festivities we enjoyed all very seasonal. Whilst to a large extent, this is no longer the case, there are nonetheless new seasonal patterns emerging around sports and popular festivals.

To test this more modern interpretation, the authors analyzed content shared on Twitter and Wikipedia over a four-year period. It emerged that gloomier content was shared excessively in winter, with a peak in November, with anger being most prevalent between September and April.

Likewise, visits to mental health-related pages on Wikipedia during the winter months.

Altogether, the research doesn’t perhaps provide anything too groundbreaking, and largely confirms what has always been suspected. It does, however, provide a further reminder that we can increasingly confirm our hunches by trawling through the data.

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.