Personal Analytics with RSS Feeds

I am currently working on a paper on Academic Blogging, from my own experience. And I wanted to do something similar to Stephen Wolfram’s personal analytics of my life. More specifically, I wanted to understand when I do post my blog entries. If I post more entries during office hours, then it should mean that, indeed, I consider my blog as a part of my job (which is something I believe, actually). On the other hand, if I post more in the evening, or in the middle the night, then it could mean that my blog is clearly only for fun, and somehow outside the official academic time schedule.

With the help of @3wen, we have here a function that can read rss feeds, and extract the publication date (and other pieces of information actually),

The trick is that the page containing the rss feeds is truncated: you get only 30 post (the latest ones). With WordPress, you can easily go further (thanks @3wen) using

> df.freak2 <- baseRSS("http://freakonometrics.hypotheses.org/feed?paged=2")
Namespace prefix dc on creator is not defined
Namespace prefix content on encoded is not defined
Namespace prefix wfw on commentRss is not defined
Namespace prefix slash on comments is not defined

Just a short comment here. If you look at the code, there is a difference between 2013, and before. The reason is simple: in December 2012, I officially decided to migrate from my old blog to this new one. All the post prior December 2012 were initially published on the old blog. Which was at Montréal (East Coast) time. And I have the feeling that my new blog has a European time. So I did translate, of 6 hours. But the problem might be more complicated actually

> hour(datarss(df.freak))

Now, if we try to comment. On the week days, I find it a bit scary, to see that I spend so much time during the weekends on my (supposed to be) professional blog. And on the hour, I can explain the 2013 easily. I usually spend most of my evenings working (on the blog, or on my courses, or on my research). But usually, I try to avoid posting an entry at 2 a.m. So usually, I keep it until the morning, then when I arrive at the office, I finalize the post, and I make it available.

To understand the difference with previous years, I should probably add a technical comment : the previous blog was on a dotclear platform. On dotclear the Publication time is not exactly the time the post was officially posted online, but the default value is more the time the post was saved for the first time. So there might be some slight differences. I believe that previously, I started to work on a post in the afternoon, then I might spend some time in the evening, even the day after, but when I publish it, if I do not change the default settings, then the publication time would be the afternoon, when I did save the post.

Let us try on another blog… The problem is that is it is quite difficult to get old entries from the rss feeds. Except with WordPress… So I tried to run the previous code on http://economix.blogs.nytimes.com/. The extraction is simple here.

501 Tue, 14 May 2013 04:01:

But here again, I do have trouble with 2013. To be more specific, when I look at the feeds I get

We do observe an interesting dynamics here : I guess that previously people were working during the day, and then posting at the end of the day. It looks like, now, people work in the day, sometimes late in the evening, but wait till the next morning to post the entry. Just as I did, in order to read one last time, with a fesh mind… Anyway, I still have to understand what did happened in 2013, just to make sure that the data I extract can be used…

Post navigation

3 thoughts on “Personal Analytics with RSS Feeds”

I think you are lucky if you are able to write some blog posts in minutes. On the contrary I do not post every day because I need much more time to prepare a post. So the publication date does not mean anything, apart that I have finished and-re-read the work I’ve done in previous days.
Obviously, the matter for tweets is totally different, since they are instantly thinkings.

somehow, you’re right… I guess it should be better to work on blog with several contributors (such as the NYTs). I should have mentioned that I thought almost the same as you : either it was pure random, or there might be some general trend, such as posting in the evening, once the post is finished – ’cause it takes me days to finalize a post, not minutes 🙂

for big blogs, I guess there is also a ‘strategy’: publishing in the morning means you can hit the U.S and Europe (at least, I can observe that on Twitter)

An Open Lab-Notebook Experiment

Some
sort of unpretentious (academic) blog, by a surreptitious economist and
born-again mathematician. A blog activist, and an actuary, too. Always curious.
Because academics are probably more than the sum of our publication lists, grants and conference talks...

Used to live in Paris (France),
Leuven (Belgium), Hong-Kong (China), and Montréal (Canada). Professor and researcher in
Montréal, currently back in Rennes (France). ENSAE ParisTech & KU Leuven Alumni