Meta

In two previous posts (post 1, post 2) we have reported some metrics about a sample of 53 Wikipedia articles related to Climate Change, manually selected by Tommaso Venturini. The list contains probably the main articles related to this topic; however, given the wideness of Wikipedia we can suppose that many more articles concerning Climate Change exist, and it is hardly feasible to manually collect all of them. In this post we explain how we have expanded this list in a semi-automatic way, relying on Wikipedia’s category structure.

In Wikipedia, each article can be assigned to one or more categories, and each category can in turn be assigned to higher level categories. This can be achieved by any user just by inserting a special tag into a page.

Articles are usually not assigned directly to high level categories, to make the category structure usable: what if you had thousands of articles directly assigned to “Natural Sciences”? It would be impossible to make sense of the categories. Instead, most articles are only assigned to lower level categories, and these are in turn assigned to higher level categories.

So, it is natural to suppose that, starting from a given category, such as Climate Change, one could identify all the articles assigned to it or to its direct or indirect subcategories.
As the category graph is maintained by the community, and the sub-category relationship is interpreted in different ways (ontologic, thematic…) this can be problematic: the graph contains over 500 thousands categories, and has been shown to contain inconsistencies and even loops (90 strongly connected components, according to [Farina et al., 2011]). Moreover, topics are often overlapping, so the task of isolating a whole category is unfeasible in the case of broad categories such as Politics or Culture, where boundaries with other categories such as History, Geography or Religion are fuzzy. However, for smaller categories corresponding to reasonably delimitable topics, it is possible to follow this approach.

The following figure shows the Wikipedia page for Category Climate Change, where the hierarchy of subcategories has been partly expanded.

We automatically processed the sub-graph of the categories situated under Climate Change, and we collected all the articles belonging to these categories. In order to avoid including unrelated branches, we needed to remove a few categories:

Note that we had to remove these categories because they included subcategories or articles which are not related to Climate Change; however, the articles which are related to Climate Change are very likely to belong to other categories, which we include instead. For example, we exclude the category Climatologists, because not all climatologists have taken a position about Climate Change, but the ones who have are probably included also in other categories, such as Climate change environmentalists.
We also removed all the categories whose name follows the pattern “Energy in <country>”

This way, we collected 915 article titles. Furthermore, we found a Wikipedia page containing a manually redacted list of articles related to Climate Change; out of the 245 titles contained, 105 were already in our list, while 140 were not, and we added them.

Finally, we compared the resulting list with the one prepared by Tommaso, and we found that out of 53 articles, 45 were already in our list, and 8 were missing:

As of May 23th, 2012, only 495 of the articles in our list had received at least 2 comments. The following Table lists the 58 articles with the largest number of discussion chains.

Several measures for Wikipedia articles related to the climate change controversy (in parenthesis the rank of the corresponding values in a set of 495 articles related to climate change). These results are extracted from the English Wikipedia as of May 23rd, 2012.

We notice several articles among the most discussion which were not covered by the previous list. Most notably: