inmensoSofa

domingo, 5 de junio de 2016

I'm very proud to release CubesViewer 2.0.1, a major review of my database visual analytics application.

This is a major release of CubesViewer featuring tons of improvements, new features, a rebranded look and feel as well as a new code architecture that greatly eases development and paves the way for following versions.

CubesViewer has undergone a major upgrade. The code is now built upon AngularJS, and the UI framework has been migrated from jQueryUI to Bootstrap and Angular Bootstrap components. HTML has been rewritten and separated into easier to handle templates.

The application is now more responsive and mobile friendly and looks more stylish overall. CSS has been reworked and namespaced, easing integration into other web documents.

Migration to AngularJS has involved an comprehensive refactoring and review of every module, and we trust it's been for the better. Internally, the build pipeline now uses Less, Grunt and Bower, and a lot of dependencies have been removed. All together allow CubesViewer to now be distributed as a single .js file (minified version also available) and accompanying .css file. JSDoc has also been introduced.

Other additions feature:

Printer friendly CSS.

Export charts as images.

New horizontal bars chart.

Line and area charts with curved lines.

Improved error reporting and user interface.

CubesViewer Server (optional) upgraded to Django 1.9.

Plugin for cube usage tracking via Google Analytics.

Improved documentation and tutorials.

I hope to be soon publishing an open data site using CubesViewer. In the meanwhile, it's open source! Download it, use it, share it and contribute :).

sábado, 19 de enero de 2013

What can we find if we download and generate some statistics about Wikipedia?

1) Overview: Wikipedia on January 2nd 2013 has 13 057 082 entries (the Encyclopaedia Britannica sums 228 274 entries according to Wikipedia itself). There are almost as many redirections as actual articles:

Wikipedia Articles (blue: articles, green: redirections)

2) Articles: Let's look at the real articles contents only (no redirections). This is more than 7,1 million articles:

The average article belongs to 2.6 categories, links to 2.31 pages outside the site and 33.86 links internal to Wikipedia. Thousands of articles feature thousands links. In total, there are 16 413 888 external links and 240 751 315 internal links. Enough to get lost for a while!

The average size is 4 634 characters (roughly about 110 words per article), but the total size of article text is 32 GB. And this is only raw text, images are not included.

3) Content

This is one of the most striking result of all. I have searched for certain words within the text of the articles, and assigning a score. The follownig diagram shows how many articles are defined by a particular word (the word with most occurrences). This has been shownbefore, but the results still seem astonishing to me:

Perhaps we should start thinking of how airily we use the term "war".

4) Geography

I can only report for the articles last updated by anonymous users. But for the sake of it, this is how real article updates (by anonymous users) were distributed among the different continents. This population includes 490 080 articles:

5) Updates: This is a result I am pretty surprised of. The following graph shows the year and quarter was the time that articles were last updated (separating redirections, in yellow, from articles, in blue). Apparently, a huge percentage of articles have been updated during the last quarter of 2012, which could mean that Wikipedia is very lively and is being updated frequently, although this value seems to high to me, and so I wonder if this may be some automatic process updating wikipedia articles.

Unfortunately, I can't get the "creation date" of articles as the normal Wikipedia dump doesn't include that information.

6) Titles

The average title entry is 26.9 characters long. There are entries starting with every character you can think of: €, Ɣ, ¢, £, § ...We can also see how entries are distributed along letters. Surprisingly, numbers 1 and 2 have got more entries than letters Q, X, Y or Z. Even also more than U and V.

domingo, 13 de enero de 2013

I have recently been working on a data exploration and visualization tool, and I am very happy to announce the release of this new project to the public domain.

It is called CubesViewer, and it is an Online Analytical Processing (OLAP) exploration tool. In everyday words, it allows people to design and produce reports and charts about many kinds of data that can be extracted from a database (like contracts, invoices, climate, demography, scientific production, wikipedia articles, public spending, logistics...).