I have been wanting to do something with the R package Shiny for a while, and in my case I found the right dataset from The World Bank’s Doing Business website. So for the dataset I basically went for only the portion for ‘Starting a Business’.

The visualisation I had in mind was a heatmap, where the value (or as some visualisation tools call it, measurement), will be in shades of a colour that deepens/lightens with the value. The thing is I had this done in Tableau in minutes. But Tableau is, well, not free. So I thought, a good opportunity to compare how Shiny fares to build a sort of interactive visualisation.

For the design, I already knew I wanted to be able to select one of the indicators (aka measurement) i.e. Number of Procedures, Time in Days, Cost and Paid in minimum capital. The rows would be made up of the economy, the value of the selected indicator in different years and each cell would be filled with varying intensity based on the value.

As a tiny improvement, I added a slider for the user to select the range of the indicator that he wants to include in the table cum heatmap. And of course the slider’s maximum and minimum values will depend on the max and min of the selected indicator.

The first challenge was to cast the dataframe properly into the right table. The second, to use ggplot for the heatmap rendering. The last challenge which took up most of my time was trying to subset the dataframe properly based on the selected indicator and input range. After scouring all over the web and R Shiny google group, I finally got a break from the “Stock” demo (source code here) from the RStudio’s Shiny website.

The app looks like this. In the next post I will show the R code for the app.

I picked up sketchnotes recently to help me understand better the thoughts I have in my head. And one of the fuzzy things that I am trying to figure out is, in the new big data, analytics hype. Who is selling what and who needs what?

It seems that typically, the top and bottom layers are so ill-defined as to what data sources exists, and what kind of information or insights will be useful to you. Sure, in exploratory data analysis the questions are not well-formulated up front, thus those are effortful tasks.

The middle two layers are so jam packed with products that the user or middle person like me, has to spend much time understanding and evaluating them. Less time to do actual data analysis.

Maybe it’s a wrong role for now, I really really want the right role asap though. Wish me luck please.

I’m not a regular expression expert, no, not even amateurish in that area, as is with hadoop,…. sigh.

But no fear, there is always the Internet, without which I will be…..of diminished value, until I save enough stuff on my local hdd, and my search engine implemented on it. Alas, the day the world ends could be when the day Internet collapses. Is that even possible….i hope not.

So, regular expressions. It all started with a new dataset, given in excel format. So there were ALT-ENTER in some cells, for some wordy descriptions in some columns. Saved as a csv file introduces all kinds of newlines, cr, lf, crlf etc. To see such non-printable stuff, open in notepad++ > View > Show Symbol > Show End of Line

In R, use gsub and grep to get rid of unwanted stuff. For my particular case, I used