How to censor the internet?

After exploring, analysing and making presentations on a network of political blogs, we decided it was time to demolish it like a neat little LEGO- house. How can we know whether the network is robust enough? Will it fall apart from one well-aimed blow or will we need to use our tooth and nail to dismantle it? What should a wicked little goblin do if it doesn’t want us to find connections and paths between websites expressing various views of the world? This is what our research group was trying to find out by comparing two strategies of a network attack.

To bombard our network of 747 blogs and news-sites connected with 1195 paths we adopted strategies described in the article of Réka Albert, Hawoong Jeong and Albert-László Barabási. The first strategy imitated random attacks. Since an error comes up randomly we also chose to pick a node the same way and to delete it with all of its connections, then move on to the next node, delete that one as well, and so on. When adopting the second strategy however we didn’t simply rely on mere chance but we targeted the most vulnerable parts of the network. Unlike Barabási’s demonstration where the nodes with the highest vertex degree– the most connected nodes– were eliminated, here we deleted the ones having the highest PageRank value. (We had tried it previously and PageRank proved to be a relatively more useful destruction tool.) We could carry on destroying nodes until all of them disappeared but our aim was to feed our destructive tendencies so we were happy if we could rip the whole network apart by deleting the minimum number of pages.

The following two videos show which strategy was the winner:

Both of them show the elimination of 100 sites following the two strategies respectively. As we might have expected when sites having the highest PageRank values were deleted the network suffered major injuries and it soon fell apart. What may be surprising though is that random attacks were like a mere trifle to the network with hardly any effect on its structure.

As Barabási’s article says it’s due to the fact that the network in question– just like the majority of real networks– is scale-free. It means that there are a lot of nodes in the network which only have few connections and there are only a few which has many. This is the reason why in case of a random attack we should most probably find a site with only few connections and hardly any effect on the structure of the whole network when deleted. When however, we adopt the PageRank value strategy to attack we can see that it’s exactly the most significant nodes in the graph structure which are destroyed.

This phenomenon is shown in the next chart from a different point of view. It shows how the average path length changes in the network as a result of random or PageRank attacks. In the original network the average path length was 3.26 which means that generally one could get from one site to another through approximately three pages. After the PageRank attack the average path length almost immediately begins to increase in the deteriorating network showing that significant connecting elements have been removed. By deleting not more than only one tenth of the nodes the whole network falls into pieces and the value of the average path length is decreasing too. Random attacks do not have a huge influence on the average path length. At least ¾ of the nodes need to be deleted in order to make the network fall apart.

To sum up: should we one day have the burning desire to rip a scale- free network to pieces the best to do is to terminate its most important parts. The level of importance can be established based on vertex degree, PageRank, betweenness centrality or other values.

Visualizing Star Wars Movie Scripts

A long time ago, in a galaxy far, far away data analysts were talking about the upcoming new Star Wars movie. One of them has never seen any episode of the two trilogies before, so they decided to make the movie more accessible to this poor fellow.

Hungarian sentiment dictionaries

You can find our Hungarian sentiment dictionaries on opendata.hu If you'd like to know more about these resources, check out our blog post.

Hungarian Teachers’ Protests: What happened on Facebook?

In the beginning of 2016 Hungarian teachers, students and supporters organized a mass movement to express their strong disagreement over the government's education policy. In Budapest and in other big cities notable demonstrations took place, one on 13 February and the other on 15 March. During these months many posts and comments were published and many opinions were expressed on Facebook. With the help of our Facebook scraper we scraped the posts, comments, likes, reactions and active persons of the two events. After that we created a Shiny dashboard working on this data, aiming to understand the community's Facebook activity and discourse through visualizations.

Anti-roma topics on kuruc.info

From the mid 2000's the number of anti-roma and racist utterances have been increasing in Hungary and this manner of speech has become accepted in the common discourse. Our research focused on the anti-roma topics in the far-right media over this period. Here you can discover the distribution of various topics over time along with the most important events that influenced each one of them.

Facebook scraper

You can find our little Python-based Facebook scraper on github.

Search for:

Precognox

This site is dedicated to our R&D activities, to see what we are doing for profit, check out our customers' page.