Tag: Data visualization

It all started with this tweet. Time was scarce indeed and I had not created an interactive visualization till now. I was supposed to have been doing some d3 this holidays but I never got round to it.

The dataset provided for the contest has 45k+ rows and that was slightly intimidating for a beginner like me.

So I initially decided to take a small set of the data and use it. Like the top 10 found and fell meteorites size wise. Using Tilemill to geotag these was simple anyway.As you can clearly see above, the size of the red and yellow dots represent the found and fallen meteorites respectively. The relative size of theirs is because of their actual size varies that much with the biggest weighing in at 60 tons. Problem with this graphic was that the size was too big for you to pinpoint the targeted coordinate. Certain impact points hid/overlapped with others. No additional information about the meteorites could be presented.

This led me to the second iteration. Importing a slightly better map into Illustrator(via Export as PDF from Tilemill, SVG didn’t work for me), I played around with labelling the meteorites and this resulted with the map below.

Things seemed a little clearer in the above version, but there were still some problems. The scale was skewed a lot if I considered only these values. The yellow dots were nearly invisible and the red ones overshadowed everything. About that time I came across this visualization. That prompted me to play with the impact sizes a bit more. Iterations were made to get the size right. It was a slow process because most of the dots overlapped and made sections obscure. Finally I came up with this before I slept just past midnight.

Playing with the opacity above gave me an idea about the denser areas. At around this time, when I was discussing these visualizations with @Rasagy, @Hashnuke asked if I wanted the reverse geotagged locations so that I could perhaps map the Countries in a sort of choropleth. He said he would write a script and run it using a reverse geotag service like Google. So we set Sunday as the day to do this.

I continued to work on Tilemill and decided to export the map and host it on Mapbox. Decided to learn Wax so that I could build an interactive visualization. At that moment, most of the visualization that you see at the end was forming in my head. With 2 days left to go for the submissions at Visualizing.org. I decided to give it a go.

Playing around with Wax led me to a bug when the map was fullscreen and led me to file my first issue at Github. After that I got stuck while trying to figure out pivot tables in MS Excel. @Sevenaces helped me realise my stupid mistake and I got the data I need to plot the Column charts.

Choosing a charting library was the next thing. I wanted something simple that did not need too much work, offered interactivity out of the box, etc. I went through my bookmarks to check dviz. I had been meaning to use this sometime soon and decided to for this project. The other option I had was dviz but the simplest examples looked like a lot of work. That tipped the scales in dviz’s favour. dviz is the simplest charting solution for people who don’t know javascript and wont be bothered to learn javascript. So the column charts worked wonderfully. Stacked column charts were my first option but that showed me the fact the the fallen meteorites were much lesser and were easily hidden by the found meteorite data. Hence I decided to separate the two charts and show the data separately. Poking around Google’s Visualization api, I figured how to customize the dviz charts some more. I used Flatuicolors for the colours. That done, I turned to Foundation 3 to build something simple.

Next, Akash walked me through setting up a github account, hosting a .io repo there. I installed Github for Windows and everything was simple and intuitive. Git incidentally was something I was meaning to learn from a long time, this project gave me and opportunity to do that. The prototype visualization was up and online on Saturday night but it was quite a long way from finishing.

There were obviously problems with the data set. The section at 0,0 seemed to be awfully dense for a point in the middle of the ocean. This led me to review the dataset. I found that more than 10k rows didnt have coordinate data and some of them had 0,0 instead. I decided to clean these rows out of the geotagging. They were bad locations and did not contribute of anything. The dataset was now slightly above 32k rows. More Tilemill followed. Tilemill kept hanging every now and then and I had to close it every few minutes. Frustrating indeed. The huge dataset could be a possible reason. Figuring out the legend and tooltip design took me some more time. Finally the map was done. More hangups followed and I was finally able to export and host the final map on Mapbox.

The next problem came due to the 32k row .csv file. The big file was throwing errors. We then split the dataset into 3 sections and Akash ran the script on Geonames via nitrous.io. He should really write a post on how he did all that. Here’s the scripts and the processed data. There were about 40 bad locations in the dataset which were removed.

The output of the reverse geocoding was the country code. I wanted the country names. This is how I learnt about vlookup in MS Excel. I also learnt how to fill all the blanks in a table and how to divide a column by a number. These are not as straightforward as you think. Excel hung up on me as well. Lot of times. Remember making everything a table helps a lot when doing Excel operations. I used the country name list from here(It’s missing SS=South Sudan). Finally everything seemed ready. Now all I needed was a good scatterplot example to borrow 😉

A quick search of mbostock’s d3 gallery and I located a scatterplot that I could use. It was simple to understand. I promptly hacked the example to meet my demands. I learnt a bit of d3 along the way.
With the final changes all done, I was done with ‘Meteorites: Earth Impact’. In 3 days I learnt such a lot. It was indeed a wicked journey.

I was running through the book, Visualizing Data by Ben Fry and came across his pincode example for USA. I decided to replicate the example for India. Thus began my search for an geo-tagged dataset of indian pincodes. Sadly it does not exist. The best set of easily available data is hosted by datameet at http://pincode.datameet.org/

I ran the set through Tilemill and found large parts of India still untagged especially almost all of Maharashtra and Bihar. I asked Arun and he told me that ” there are no official public datasets available. But there is reasonably good coverage in the openstreetmap data. The simplest way to view the data is to probably use http://maperitive.net” which I will try to explore for now.

You could help by mapping your own pincode on this website. Go on, it only takes a minute.

While browsing through the the schedule for Open Data Camp 4, I came across and etherpad entry containing details of the ‘Data and Maps’ crash course on Day 2.

Tilemill (by Mapbox)fascinated me and I downloaded and installed it. It has a wonderful crashcourse to introduce beginners to the map designing scene. The Map styling is done using CartoCss which is similar to CSS. The interface has layers like Photoshop and good import and export features(Mapbox free accounts offer 3000 map views/month and 50 MB space). Poking around in the documentation provided you will be able to locate data set sources, shape files and how to import data.

Below I shall explain how I went about re-doing a class assignment using TileMill.

Earlier this semester we played around with Processing to visualize data sets. I decided to try visualizing a data set that I had prepared earlier. Not knowing any way to scrape data off a website, I had manually collected the statewise distribution of the US olympians by birth(excluding/clubbing athletes of foreign origin).

Before I could use the data, I had to geocode the data. To do this I first uploaded my Excel sheet to Google Drive and then installed a script (Tools > Script Gallery > Search 'Geo' > Install 'Geo' by dhcole@gmail.com) . This helps convert addresses into lat, long values which TileMill can identify. I used Mapquest to convert for me and it was far from satisfactory, maybe the Yahoo provider would work better. I then used Google Map to manually get the co-ordinates. Alternatively you could use GetLatLong to get/verify co-ordinates.

Next I published the datasheet on the web via Drive (inbuilt) and used the csv format as a datasource for a layer in Tilemill.

I also used the US State border line shape file provided by the US Census.

With a bit of tweaking with the CartoCss, I was able to come up with a fairly decent looking visualization with minimal interactivity.

Statewise distribution of US Olympians 2012 (excluding foreign born US Citizens)

Do feel free to critique the visualization choices so that I can improve them. This was an experiment to see how TileMill works and what I can do with it.

Disclaimer: There could be a manual error when I created the data set hence I am not linking the dataset here.