Category Archives: Coding

There are a lot of Wikipedia visualizations. Some concentrate on article contents, others on the links between articles and some use the geocoded content (like in my previous blog post).

This new visualization is novel because it uses the geographical content of Wikipedia in conjunction with the links between articles. In other words, if a geocoded article (that is, an article associated with a location like a city) links to another geocoded article, a line will be drawn between these two points. The result can be found on the map on the left.

A large number of Wikipedia articles are geocoded. This means that when an article pertains to a location, its latitude and longitude are linked to the article. As you can imagine, this can be useful to generate insightful and eye-catching infographics. A while ago, a team at Oxford built this magnificent tool to illustrate the language boundaries in Wikipedia articles. This led me to wonder if it would be possible to extract the different topics in Wikipedia.

This is exactly what I managed to do in the past few days. I downloaded all of Wikipedia, extracted 300 different topics using a powerful clustering algorithm, projected all the geocoded articles on a map and highlighted the different clusters (or topics) in red. The results were much more interesting than I thought. For example, the map on the left shows all the articles related to mountains, peaks, summits, etc. in red on a blue base map. The highlighted articles from this topic match the main mountain ranges exactly.

A good number of my consulting clients use the very useful and powerful survey tool Limesurvey. Unfortunately, since version 1.92+, it seems impossible to reimport deactivated responses tables into new response tables if the survey was modified. I’m sure this doesn’t matter for long form surveys and mainly static surveys, but some of my clients use this platform as a dynamic form engine. In that case, forms can and will change over the duration of a project.

To resolve this problem and enable the importation of old responses tables, I’ve written a quick Python script. It uses MySQLdb, but that library should be installed by default on most Linux boxes. The script also requires a MySQL database backend but it should be easily adaptable to other database engines.

Less visually striking than my last project, this visualization shows the voting patterns of Canadian Members of Parliament. It uses a Principal Component Analysis (or PCA) transformation to convert the multidimensional voting record of each MP to a 2D (or Cartesian) form.

Each point on the chart represents an MP. The color of every MP follows their party affiliation. They are tightly clustered because of party discipline : in Canada, MPs normally vote in accordance to directions given by the Prime Minister.

When I worked in a survey firm, I was tasked with building a VOIP system to cut costs and to raise productivity. The biggest productivity drain in an outbound call center is the dialing time and getting someone on the line. By implementing an Asterisk server, we could control and expand the server to our needs. Furthermore, this meant we could have remote workers. We saved a bundle of money in long distance and in fixed costs. The hosted server and the bandwidth itself cost about 80$ a month, while the connectivity to the phone network was negligible and, more importantly, flexible. In other words, if it was a slow month, the cost was low, and conversely, if it was a very busy month, the costs were higher but the money was coming in.

Incoming call popup

Since it was my server and I was billing the company for it, I figured I could use the same server for my personal phones. So I decided to connect my PSTN numbers to this system. I could now use the server as my private VOIP server.

After configuring the VOIP server to my liking, I started to explore the Asterisk API and related Java and Python bindings. My first module was an interactive IVR system to manage callbacks from the survey outbound number. The callers could know who called them and remove their number from our calling lists.

While Limesurvey is a very nice tool to create and manage web surveys, it’s a bit lacking in the reporting area. The functions are a limited and even if you want a quick and dirty, but presentable report, you must export the data and use other tools to format and present the answers to the survey.

Limesurvey results imported in Crystal Reports

The problem is the way how Limesurvey stores its data, more specifically the recorded answers. Every column in the table contains theanswer for the question identified by the column and each line (or record) contains a respondent. It seems like a very sensible way to store the respondent’s answers, but the trouble is that it’s very difficult to construct a generic report template in reporting tools with this kind of table schema. To be of any use, the table data have to be converted in a usable form. Furthermore, the schema used by Limesurvey is less than optimal and doesn’t use foreign keys to link tables. This complicates everything.

To convert the answer table data, I’ve written a little python script. It should be pretty plug-and-play. To use it, you have to change the database access variables. The script only takes one argument, the survey id. You can get the survey id in Limesurvey’s administration panel. It should follow the survey title when you select a survey in the administration panel. Continue Reading