Menu

the ramblings of a software technologist in the Boston area. focus on GIS, mapping, and 2D/3D visualization. sometimes i'll talk about running, music, and geeky tech stuff too. :D the list of links to the right are technologies i am currently tinkering with.

Adventures in R

Taking many attempts to broaden my development languages and platform knowledge, I have become keenly aware that most of the data, statistics and especially spatial worlds that operate beyond the enterprise are taking advantage of many sets of libraries, languages, and tools that are a bit alien when coming from a primarily .NET/Java background.

A few months back I tried to try to get into Python, and explore the scientific libraries of NumPy and SciPy, in trying to handle future DEM interpolation and assembly. It turns out Python isn’t all that hard to pick up (despite the syntax being very different from the C-style syntax of Java, Javascript, C++ and C#), but the libraries and integration methodologies took a bit longer to try to “figure out what to do with them”.

While my Python exploration has taken a back seat for the time being, I was recently turned on to using the R programming language – which was a whole new experience for me. I had only heard of it within the past year with regards to data analysis and statistics stuff, but further exploration of R proved to yield some fascinating results.

First, learning how the heck R worked was a challenge. It took me a bit to figure it out, and finally downloading and running RStudio is what helped out immensely. Available for Windows, Mac or UNIX, I started off in Windows thinking it would be easier but quickly migrated back to my native OS X.

The first thing to understand about R is that it’s basically the scripting language for the R environment, very similarly to how VBA is the scripting language for the Microsoft Office environment. R isn’t a stand-alone programming language. It’s explicitly tied to the R environment and is your interface for controlling it. While I have not used MATLAB, if I understand correctly, the correlation between R and the R environment is very similar to MATLAB’s scripting language and environment.

The second thing I had to grasp was the syntax of the language. I think the first thing that tripped me up was seeing the usage of a dot (.) contained within variable names, and assuming they referred to methods or properties as if they were a class. This was completely untrue. Apparently the underscore is not permitted in R for variable names, so the convention became to use the dot in variables (why Camel-Case didn’t win out here is beyond me, but let’s move on). If you want to access properties of an object, instead of dot, you use dollar sign ($).

The other real tricky thing was the assignment operator. It’s not the equals sign (=), its ‘<-‘ as in:

myvar <- x * y

There are also no line terminators (semicolons, tabs, spaces, etc) Once you get past a few of these basics, you can move on pretty quickly.

So, let’s get moving. I wanted to actually SEE something pretty quickly. Just to understand how easy it might be to use R. I opened RStudio and eventually cobbled together a few lines of code that used some built in library data, to testing out 3d plotting functions using the library rgl (which plots points using OpenGL).

These 4 lines do 3 things – the first two lines load the library functions in ‘rgl’ (an OpenGL visualization lib) and ‘akima’ (an interpolation lib). This is akin to an ‘import’ or ‘using’ statement in other languages. You can also substitute ‘library’ with ‘require’ and get the same result.

The third line un-lazily loads sample data from akima into a ‘DataSet’, which from what I understand is not too popular a convention in actual practice, but for demo purposes, I was OK with this.

The fourth line creates spheres of a diameter of 0.5 and a color of blue using rgl, with the x, y, and z’s mapped to the akima dataset’s x, y, and z values.

After hitting enter in the console, you will notice a window pop-up (in OS X, this will be running in X-Windows). Hey look! Blue dots! Actually, if you zoom in you will see that they are spheres:

These correspond to x,y,z values in space based on the sample akima data. Want to see those data values? type akima at the prompt, you will see the data as follows:>

50 points total. Go ahead, count those spheres. OK don’t do that, there are 50 spheres. OK, so you’re thinking great – I have 50 spheres in 3D space – now what? Well, I imported the akima library specifically to test out interpolation. Important to note, is that obviously this is (obviously) an irregular grid. So interpolation practices will work a little bit differently here.

Next, let’s interpolate some lines into a new vector and then display them in rgl as a surface for the points:

This is starting to look frighteningly like a DEM. Now what did we do here? Well, we created a new vector of lines that span a range of the x and y coordinates of our points in the akima data (x,y). Then we use a default “40 points per line” evenly spaced over the x-axis, and extrapolated. We then set these lines so form a surface mesh over the 3D points, paint them green and give them some alpha channel transparency. The result kind of makes it look like a terrain draped over a DEM, and essentially, that’s pretty much what it is.

OK, so what just happened now? We created a new vector called ‘points’ and we used the akima interpp function, which is a function for pointwise bivariate interpolation for irregular data. We passed in our existing x,y, and z values from the akima data and we created an xo and yo range of 200 points of uniform density (the ‘runif’ function has nothing to do with “RUNning” anything, it’s r – uniform for uniform distribution across a range). Then, the interpp function takes care of finding out our z values for us. We pump it into our rgl display as points, though we could have easily done spheres or some other geometry, set them to a size of 4 and paint them red. And voila, we now have 250 points – 200 of them being interpolated via the akima bivariate algorithm.

So after I played around with this for a while, i really wanted to try to get some “real data” in the mix, to see what that would look like. So, I set out to open a geotiff that I had made from a set of LiDAR point cloud files. This is my DEM, and I wanted to see if I could make a mesh of the 3D points. Well, it turned out to be much easier than I had anticipated. Here’s what I ran:

Pretty cool eh? It’s now pretty easy to visualize the point mesh and be able to see where over the 3D surface this was sample. Please be aware I am not accounting for spatial projection at all right now, this is a simple viewing mechanism for “visualizing” the 3D points stored in our file. This reminds me very similarly of how MeshLab would work for visualization. I basically convert my raster to points, name the columns (they “may” have names already, but just to be sure), set to the active data frame and draw the points.

So this is cool, but what If I want to visualize this in 2D to see a height differential? And heck, what if I even account for the correct projection? As it turns out, it’s not all that hard – we can use spplot (alternatively, you can use ggplot, or a few other plotting mechanisms):

When I ran this against my file, I ended up with something that looked like this:

So now we’ve plotted and created a 2D elevation visualization heat map of elevations – and yes, the dark blue is 0 meters and that is sea level. This is a GREAT way to discern where buildings and other structures are located amongst your data, or to help plan for where data may need cleaning up during overlay – or better yet, what kind of interpolation you may want to perform.

So I know what you are probably thinking – this is great, but who cares? R is an isolated environment. How can I possibly integrate this into my existing C++/JavaScript/.NET/Python work and even remotely take advantage of it? Well, quite simple, there are many R bindings available. For example, R.NET allows for evaluation of R functions, and then conversion of data back to .NET for consumption. I will be looking to this for potential PROJ.4 conversions and interpolations in the near future, just as an easy way to get to using both environments – and sidestepping the hassle of integrating PROJ.4.NET plus all the other GDAL components. Plus this could address the issue of finding a good .NET-accessible interpolation library.

I look forward to future experimentation with R – including integration into .NET, as well as potential visualization for pre-processing components I am working with in the 2D and 3D mapping space. I think there’s much to explore here, and I’m looking forward to more experimentation.