Facebook shutting down?

Recently, researchers at Princeton, one of the premier US institutions of higher learning after a thorough study concluded that if the present trends continues, Facebook will cease to exist by 2017. This is contrary to recent trends. Facebook is steadily growing users, reported record recent quarter and replaced yahoo as the 2nd largest internet advertisement revenue earner.

Are the days when people (particularly youngsters) constantly updating their Facebook pages & checking their Facebook news feed at all hours of day and night & at all places finally over? Is Mark Zukerberg the CEO of Facebook, having sleepless nights about his billions disappearing? He is worth US$ 32.9 billion at the moment.

Well, it looks unlikely at least at present, even though its popularity among very young seems to be declining slightly. In a hilarious rebuttal, by using similar approach Facebook researchers have predicted that Princeton university will shut down by 2021. And air we breathe will run out by 2060 and mankind as we know of course will be no more.

What is happening here?

The amount of data generated has been increasing exponentially in the last few years. Along with the data deluge, software for analysis (many of them free, such as R) are becoming readily available and used by many. Most of the enormous amount of data so generated daily is in public domain. Data from Twitter feeds, NASA, US govt, US weather service, Google search data, etc is freely available . Along with the data explosion there is boom in the number data crunchers called data scientists who work on interpreting this data to gain insights, unearth relationships and importantly to make predictions.

By using these powerful easily available data analysis tools it is possible to unearth relationship between many variables (sometimes in 1000s) and the variable of interest. One commonly used measure of strength of the relationship (or correlation ) is the R2, generally higher the R2 stronger is the relationship. In the financial/ marketing fields this could be as high as .7 even .9, whereas in the drug discovery .05 is considered to be a decent number. Machine learning computer algorithms like Neural Networks can model complex relationships even when they are not obvious or even when do not seem to be plausible.

This is where the problem arises. High correlation number is does not imply that one causes the other. This has to be proved by logic, theory or prior knowledge. This is where even experienced scientists sometimes tip over. This seems to be the case in this context. Princeton researchers in their study modeled spread of social networks on spread of contagious diseases. On seeing steady decline of web search for the word ‘Facebook’ they concluded that Facebook will cease to exist by 2017. The reason for this finding was quite simple; most users have Facebook app on their phones and have no need to search for this key word anymore.
This is the classic case of equating correlation with causation.

One famous example to explain this concept is the following story. Statistical study of shark attacks at a certain beach indicated that there is good correlation between amount of ice cream sold and number of shark attacks. So, it was concluded that eating large amounts of ice cream causes increased shark attacks.

The reality was more mundane. During some summer months due to dwindling food in their traditional hunting grounds sharks come closer to beach looking for food. During the same time of the year, number of sun bathers and surfers thronging the beach exploded. They became easy target for sharks looking for food. And during hot summer months sunbathers consume lots of ice cream.

Another similar case study is the near perfect correlation (R2=.97) between fresh lemons imported from Mexico into US and total US highway fatality rate.