Tuesday, 30 July 2013

Introduction

I couple of weeks ago I had an idea for a website where people can collaborate to create the first real Audio Atlas, using the power of the Google Maps API.
The problem was that I do some programming in R but I did know very few things about HTML and javascript.
However, I knew that having a project was a good way to get serious about learning a bit of these two languages.
So I started reading some books about how to use the Google API, I borrowed some code from other, more experienced, programmers
(whom I thank very very much!!!) and in the end I created a website called Audioramio.com.
Here you can come, record your voice and add it to the map. If everyone helps a tiny bit we can try and build the first Audio Atlas!!

While I was writing the code to build this site I realize that it would be very cool to be able to interface the Google maps API with R.
I thought about it because in the past it happened that I created some map of soil properties where on the same location I had multiple data to show.
The classic example is that you perform kriging and you end up with the actual estimation, plus the uncertainty.
Normally, what you do is showing two maps or if you are familiar with webGIS you can produce an interactive website where the user can select which of the two maps to show.
However, it would be good to have a way to perform some kind of analysis on these data “on the fly”.
So I start thinking at a way to do exactly that, to create a website where on one end you have the map, while on the other it shows some plot or at least a summary of the data.
I start looking into it and I found out that the team at RStudio had created a magnificent tool, called Shiny,
which is able to create an interactive webpage in HTML and javascript where the user can change interactively some parameters and the page react accordingly, changing the plot or the summary.
This is just one example: spark.rstudio.com/uafsnap/RV_distributions

To create a Shiny app you simply create two .R scripts, once for the user interface and one for the server side where you specify the analysis you want to perform and the variable the user can tweak.
For more info you can read the tutorial or take a look at this blog, by Matt Leonawicz.

The interesting thing is that Shiny is also able to work with the server.r script, plus a user interface completely created in HTML5 and javascript.
So potentially Shiny can be used for almost everything, and specifically it can be used to plot a Google map on the side of a Histogram, for example.
The problem with this idea is that it is difficult to let the two application share information.
Shiny has been built around an interactive console concept, meaning that the HTML pages are created for the user to interact with R inputs and generate dynamic outputs.
However, with the Google Maps API, as far as I know, it is not possible to extract data from the markers on the map.
You can add an infowindow with lots of information, but it is impossible to export these data to other applications.
I solved this by working around it. I created a .json file with an array of all the coordinates of the points.
Then in javascript I created a simple loop that generates, from the coordinates, a series of markers.
Instead of attaching an infowindow to them, I created a function that for each elements in the loop, when the point is clicked it updates a certain text form in the Shiny interface.
It seems complicated, but it is actually very basic stuff. I attached an image of the console, so that I can better explain.

As you can see, the Shiny console is very basic. There is a single text input and a single plot output.
The text input is connected to a subset call, and the input number is the row of the data file to be subseted. So when I click on the point, its ID goes to update the ID field (in the Shiny interface), and the focus goes on that.
Now, I need to click “enter” 2 times and the plot is updated.

Code Description

Now let’s take a closer look at the code, starting from the interface.
In Red you have the bits of code related to the Google maps API, while in Green you have the bits related to Shiny.

As you can see I have two Shiny elements: the plot of the right side and and ID text input on the top left.
These are the elements that interact directly with R.
To these, I simply added a map frame plus three buttons to show my data and clear the frame, if needed.
The communication between Shiny and the API is done purely by these lines:

What this says is that when I click on any marker the custom function SetValue will kick in. This function simply change the value in the ID text field to the one of i, which is the element of the loop and the ID of the marker.

Now, let's see what this ID text field controls. Here is the server.R code:

library(shiny)
data_file

The R code is extremely simple. I have one input, ID, which is used to subset the data_file data.frame and one output, which is the histogram plot.

Problems with this approach

Now let’s talk about the problems with this approach.

The first is that the plot does not update automatically, the user needs to click “enter” 2 times before this to happen.
However, this may not be a problem after all, simply because as soon as the R script becomes more complex, and its execution time longer, it is good to have a way to avoid updating the plot all the time I accidentally click a point on the map.

The second problem is related to the Google Maps API and the way it shares data. In general, when I plot markers on the map the only way to access their data is clicking on them and look at the infowindow.
This is not true for KMZ map layers. When I plot a raster layer on the map it is treated exactly as an image (it is in fact a .png file georeferenced inWGS84), and it is therefore impossible to access its data.
The only way to plot a map and give the user a way to access it is by plotting a marker layer on top of it, with invisible icons so that they do not disturb the visualization of the map.
When the user clicks of the map, he is clicking on the invisible markers layer that triggers the Id field update. However, if I increase the zoom level the markers get smaller and therefore it becomes more difficult to click on them.
So accessing the map is possible only at the zoom level at which it is presented. A way to solve this would be showing (by using a different icon, maybe a point) the markers, but when they are on a grid their visual impact is not very pretty at all.

Conclusions

I think this approach has the potential to be used for very cool mapping experiments.
It relies directly on all the packages available in R and therefore it can virtually visualize every sort of statistics from the map.
However, it is a bit difficult to set up, because you need to transform all of your data into KML, extract the coordinates of your cells into WGS84, and transform them into a .json array.
For these steps I used ArcGIS and Notepad++. They are not difficult to complete, but they may take half of your working day or more, depending on your dataset.

A possible, quicker, alternative is http://www.jstat.org/. However, I never used it and so I do not know how to set it up for working with the Google Maps API.
In addition, I do not think it has the same potential as R for performing mind blowing statistics.

Download

To run the App on your PC just open R, install the package shiny and run these two lines:

Thursday, 6 June 2013

Yesterday I wanted to create a box-plot for a small dataset
to see the evolution of 3 stations through a 3 days period. I like box-plots
very much because I think they are one of the clearest ways of showing trend in
your data. R is extremely good for this type of plot and, for this reason, I
decided to add a post on my blog to show how to create a box-plot, but also
because I want to use my own blog to help me remember pieces of code that I
might want to use in the future but that I tend to forget.

For this example I first created a
dummy dataset using the function rnorm() which generates random
normal-distributed sequences. This function requires 3 arguments, the number of
samples to create, the mean and the standard deviation of the distribution, for
example:

rnorm(n=100,mean=3,sd=1)

This generates 100 numbers (floats to be exact), which have
mean equal to 3 and standard deviation equal to 1.

To generate my dataset I used the following line of code:

data<-data.frame(Stat11=rnorm(100,mean=3,sd=2),

Stat21=rnorm(100,mean=4,sd=1),

Stat31=rnorm(100,mean=6,sd=0.5),

Stat41=rnorm(100,mean=10,sd=0.5),

Stat12=rnorm(100,mean=4,sd=2),

Stat22=rnorm(100,mean=4.5,sd=2),

Stat32=rnorm(100,mean=7,sd=0.5),

Stat42=rnorm(100,mean=8,sd=3),

Stat13=rnorm(100,mean=6,sd=0.5),

Stat23=rnorm(100,mean=5,sd=3),

Stat33=rnorm(100,mean=8,sd=0.2),

Stat43=rnorm(100,mean=4,sd=4))

This line creates a data.frame with 12 columns that looks
like this:

Stat11

Stat21

Stat31

Stat41

Stat12

Stat22

Stat32

Stat42

Stat13

Stat23

Stat33

Stat43

5

2

9

-3

10

4

1

1

4

1

5

9

6

13

8

3

7

3

10

10

10

5

9

8

4

4

6

0

10

6

7

6

6

8

2

7

6

7

6

3

9

1

7

0

1

0

6

0

0

2

8

1

6

8

0

8

3

10

9

8

0

19

10

0

11

10

5

6

5

8

10

1

7

4

5

-5

7

0

3

5

2

5

5

3

4

12

9

-4

7

1

9

0

7

2

1

7

7

3

9

0

11

0

8

1

7

0

7

7

6

19

8

3

10

10

9

6

0

2

8

2

6

13

6

-5

12

8

1

4

0

4

5

10

8

11

6

-1

11

4

4

1

4

6

6

10

8

13

5

-5

7

10

0

4

2

7

3

1

2

8

5

-2

5

7

4

2

7

0

3

1

8

11

7

3

11

1

0

9

2

3

5

8

4

19

5

-1

11

6

3

4

9

5

9

0

2

9

5

-3

12

7

6

4

8

2

6

8

7

10

5

-4

8

9

6

9

1

4

3

4

…

…

…

…

…

…

…

…

…

…

…

…

As I mentioned before, this should represent 4 stations for
which the measure were replicated in 3 successive days.

Now, for the creation of the box-plot the simplest function
is boxplot() and can be simply called by adding the name of the dataset as only
argument:

boxplot(data)

This creates the following plot:

It is already a good plot, but it needs some adjustments. It is in black and white, the
box-plots are evenly spaced, even though they are from 3 different replicates,
there are no labels on the axis and the names of the stations are not all
reported.

So now we need to start doing some tweaking.

First, I want to draw the names of the stations vertically,
instead of horizontally. This can be easily done with the argument las. So now the call to the function boxplot()
becomes:

boxplot(data, las =2)

This generates the following plot:

Next, I want to change the name of the stations so that they
look less confusing. For doing that I can use the option names:

Here I am specifying that I want the first 4 box-plots at
position x=1, x=2, x=3 and x=4, then I want to leave a space between the fourth
and the fifth and place this last at x=6, and so on.

If you want to add colours to your box plot, you can use the
option col and specify a vector with the colour numbers or the colour names.
You can find the colour numbers here, and the colour names here.

I just added the two arguments highlighted, but the result
is not what I was expecting

As you can see from the image above, the label on the Y axis
is place very well and we can keep it. On the other hand, the label on the X
axis is drawn right below the stations names and it does not look good.

To solve this is better to delete the option xlab from the
boxplot call and instead use an additional function called mtext(), that places
a text outside the plot area, but within the plot window. To place text within
the plot area (where the box-plots are actually depicted) you need to use the
function text().

The function mtext() requires 3 arguments: the label, the
position and the line number.

An example of a call to the function mtext is the following:

mtext(“Label”, side = 1, line = 7)

the option side takes an integer between 1 and 4, with these
meaning: 1=bottom, 2=left, 3=top, 4=right

The option line takes an integer with the line number,
starting from 0 (which is the line closer to the plot axis). In this case I put
the label onto the 7th line from the X axis.