Subscribe to this blog

Follow by Email

Search This Blog

A preliminary look at my activity on Facebook.

Because I have nothing better to do on a Friday night, I downloaded whatever data that Facebook had on me, which you can as well by going to this part of Facebook and clicking on the "Download a copy of your Facebook data". It might take a couple of minutes but you'll finally get a zipped file, one that will contain a "wall.htm" file inside of it.

The contents of this "wall.htm" file is what I'll constrain myself to at this moment. Here's a small part of the file to give you an idea of the kind of information available in this file.

As you can see above, there between the <p></p> HTML tags are <div></div> HTML tags whose contents are timestamps related to when the post was shared on your wall. We have day, date, month, year and time of the day available so let's see what all we can do with that information.

I'm not going to get into the details as to how I parsed this "wall.htm" file and what tools I used for it. I used Python and the BeautifulSoup library to parse the htm file and make extracting the string contents inside the <div></div> HTML tags easier. For more on the exact steps involved in reading the file, understanding its contents, extracting all of the timestamps available in the file and then collecting the various types of time events together to display their frequency, you can take a look at this Jupyter Notebook file. If you want to download the Jupyter Notebook yourself, you can do so from my Github here. If you have no idea what a Jupyter Notebook is but are comfortable running Python scripts, you can download this Python script to pretty much do the same thing that the Jupyter Notebook did.

Enough talk. Let's get to the actual numbers.

347 348 361 297 339 278 397

is the total number of times I posted on Facebook on each individual day of the week, starting with a Monday and ending with a Sunday. As was expected, I use Facebook the most on Sunday, evident by the fact that I posted a total of 397 times on Sundays. It was also interesting to note that I used Facebook less frequently on Thursdays (297 times) and much lesser on Saturdays (278 times).

122 422 183 115 166 219 183 144 247 220 115 231

is the total number of times I've posted on each individual month on Facebook, starting with January and ending in December. What stands out in that set of numbers is how frequently posts appeared in February (422). But, for those of you who know me, it should be obvious as to why that is so. It's because my birthday is in February and the large number (422) is because of the birthday wishes that people post on my wall. Skipping that, you can see that the lowest activity is in the months of April (115) and November (115), during which end-semester exams are held. Highest activity is seen in the months of June (219) and December (231), peak holiday period. It's also interesting to note that September (247) sees a lot of activity too but I don't exactly have a reason as to why. I'll have to go through the whole file to see if there's any particular reason or year which is causing this aberration.

51, 225, 474, 495, 661, 252, 204, 5

is the total number of posts, starting from 2009 till 2016. 2009 was when I joined Facebook, also the year when I joined IIT Madras. And my activity rose sharply till 2013, after which it tanked in 2014 and 2015. And there's a good enough reason for that as well. Most of the people I hung out with were undergraduates, who graduated in 2013. This was easier to make into a histogram so here's one -

Total number of posts per year on my wall

Now, let's look at date of the month. As can be seen below in the following two charts, there's a peak in activity on the 6th. And that's because by birthday is on the 6th. Neglecting that, my activity on the day of the month seems stable, except for two weird lows around the 5th and 20th and two weird highs on the 17th and 26th. Can't explain those.

total number of posts by day of the month - a histogram

total number of posts by day of the month - a line plot

And finally, we get to my behaviour according to the time of the day. Again, the histogram looks as expected. It slowly starts rising after 5 AM, peaking at 7 AM and tanking at 8 AM, which is roughly when classes started usually. After that it slowly keeps rising till I hit a peak at around 6 PM, i.e before dinner, and peaks again at 11 PM, just before sleeping, after which it drastically falls off.

total number of posts by the time of the day

And that's pretty much all I could think I could do with the "wall.htm" file. While I had expected to look at which of my friends appears the most on my wall, that information as easy to extract as the time stamps were. For example, here's a different portion of the "wall.htm" file that contains comments. See how there's no mention of who posted the comment and no special way of identifying friends. We can read this and understand that Srinikethan was a friend, that I am Rahul and that Sivaramakrishnan was a friend but how do I tell the computer this. And how do I make it automatically extract such name from all of the <div></div> elements of the class="comment" from the file. I don't know.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

<p><divclass="meta">
Wednesday, February 6, 2013 at 12:01am UTC+05:30
</div><divclass="comment">
Happy birthday Rahul! Have a good one :)
</div></p><p><divclass="meta">
Sunday, February 3, 2013 at 4:28pm UTC+05:30
</div><divclass="comment">
epic! Srinikethan, you should check this out!
</div></p><p><divclass="meta">
Saturday, February 2, 2013 at 5:31pm UTC+05:30
</div><divclass="comment">
excerpt from the article - &quot;The basic problem can be stated very simply: A student&#039;s grandmother is far more likely to die suddenly just before the student takes an exam, than at any other time of year.&quot;
This article deserves an Ignobel via Sivaramakrishnan :D...
</div></p>

So this is where it stops for the moment. But next week, I'll take another dataset that also contains information on how frequently I used the online service. Say Youtube. Gmail. Hangouts. Twitter. Google Fit. GitHub. And maybe, just maybe, I'll be able to get out a little more information from those datasets. Until then,

PS : As always, any comments/suggestions/criticism is welcome and highly appreciated. Thank you.
Note : The highlighted HTML code was embedded using hilite.me.

Popular posts from this blog

Animation using GNUPlotI've been trying to create an animation depicting a quasar spectrum moving across the 5 SDSS pass bands with respect to redshift. It is important to visualise what emission lines are moving in and out of bands to be able to understand the color-redshift plots and the changes in it.
I've tried doing this using the animate function in matplotlib, python but i wasn't able to make it work - meaning i worked on it for a couple of days and then i gave up, not having found solutions for my problems on the internet.
And then i came across this site, where the gunn-peterson trough and the lyman alpha forest have been depicted - in a beautiful manner. And this got me interested in using js and d3 to do the animations and make it dynamic - using sliders etc.
In the meanwhile, i thought i'd look up and see if there was a way to create animations in gnuplot and whoopdedoo, what do i find but nirvana!

For those of you who don't know, MOOC stands for Massively Open Online Course.

The internet is an awesome thing. It's making education free for all. Well, mostly free. But it's surprising at the width and depth of courses being offered online. And it looks like they are also having an impact on students, especially those from universities that are not top ranked. Students in all parts of the world can now get a first class education experience, thanks to courses offered by Stanford, MIT, Caltech, etc.

I'm talking about MOOCs because one of my new year resolutions is to take online courses, atleast 2 per semester (6 months). And I've chosen the following two courses on edX - Analyzing Big Data with Microsoft R Server and Data Science Essentials for now. I looked at courses on Coursera but I couldn't find any which was worthy and free. There are a lot more MOOC providers out there but let's start here. And I feel like the two courses are relevant to where I …

Inspired by this blog post : https://langui.sh/2016/12/09/data-driven-decisions/, I wanted to play around with Google BigQuery myself. And the blog post is pretty awesome because it has sample queries. I mix and matched the examples mentioned on the blog post, intent on answering two questions -
1. How many people download the Pandas library on a daily basis? Actually, if you think about it, it's more of a question of how many times was the pandas library downloaded in a single day, because the same person could've downloaded multiple times. Or a bot could've.
This was just a fun first query/question.
2. What is the adoption rate of different versions of the Pandas library? You might have come across similar graphs which show the adoption rate of various versions of Windows.
Answering this question is actually important because the developers should have an idea of what the most popular versions are, see whether or not users are adopting new features/changes they provide…