Category: R

Leaflet is a great way to display spatial information in an interactive way. If you want to display the difference between different neighborhoods you would usually get the proper shapefiles on the web and connect your data to them. But sometimes it does not need detailed shapefiles and you want more abstraction to get your information across. I came up with the idea to draw my own little simplified polygons to get an abstract map of Hamburg.

There are some great and free tools on the web to create your own polygons. I was using click2shp. You are just going to draw your polygons on a google map and afterwards you can export your polygons as a shapefile to use them from within R. Down below you find a little R script to display your polygons in a Shiny App.

Recently I took part at Coding Durer, a five days international and interdisciplinary hackathon for art history and information science. The goal of this hackathon is to bring art historians and information scientists together to work on data. It is kind of an extension to the cultural hackathon CodingDaVinci where I participated in the past. I also wrote an article about CDV on this blog.

At CodingDurer we developed a Shiny App to explore the genre of church interior paintings developed in the Netherlands in the middle of the 17th century. There are hundreds of church interior paintings scattered across collections around the world. The research of this subject to date has focused mainly on particular artists or churches, rather than the overall genre and its network of artists and places. This project, born during the Coding Durer 2017, addresses this issue by providing a platform for further research on the paintings and creating an insight into the bigger picture of the genre for the first time. This visualization of over 200 paintings of 26 different churches by 16 different artists was created with the following research questions in mind:

In what places the artists were active and in what places did they depict church interior(s)?

Did the artists have ‘favourite’ church interiors?

In what places and when would the artists possibly meet?

What church interiors were depicted the most?

What church interiors were depicted by most artists?

The starting point of the project was a spreadsheet listing the paintings, artists, collections, etc. that was created for research purposes two years ago. This re-purposed data needed cleaning and additional information, e.g. IDs (artists, churches, paintings), locations (longitude, latitude), and stable URLs for images. You can see an image of the Shiny App above and try it out yourself here.

You can get the whole code on my Github along with other data driven projects.

Recently I took part at Coding Durer, a five days international and interdisciplinary hackathon for art history and information science. The goal of this hackathon is to bring art historians and information scientists together to work on data. It is kind of an extension to the cultural hackathon CodingDaVinci where I participated in the past. There is also a blog post about CDV. I will write another blog post about the result of Coding Durer another day but this article is going to be a twitter analysis of the hashtag #codingdurer. This article was a very good start for me to do the analysis.

First we want to get the tweets and we are going to use the awesome twitteR package. If you want to know how you can get the API key and stuff I recommend to visit this page here. If you have everything setup we are good to go. The code down below does the authentication with Twitter and loads our packages. I assume you know how to install a R package or at least find a solution on the web.

We are now going to search for all the tweets containing the hashtag #codingdurer using the searchTwitter function from the twitteR package. After converting the result to a easy-to-work-with data frame we are going to remove all the retweets from our results because we do not want any duplicated tweets. I also removed the links from the twitter text as we do not need them.

Now we want to know the twenty most used words from the tweets. This is going to be a bit trickier. First we extract all the words being said. Then we are going to remove all the stop words (and some special words like codingdurer, https …) as they are going to be uninteresting for us. We are also going to remove any twitter account name from the tweets. Now we are almost good to go. We are just doing some singularization and then we can save the top twenty words as a ggplot graphic in a variable called word.

The grid.arrange function let us plot both of our graphics at once. Now we can see who the most active twitter users were and what the most used words were. It is good to see words like art, data and project at the top.

Sometimes you have your data stored in multiple csv files and want to load them into a single data frame in R. There are several answers on the web to this questions and I recently found a fast solution to this problem.

The code above uses both lapply and the cool fread function from the data.table package to load in your data in a quiet fast manner. I recommend to try out this approach if you dealing with long import times.

Sometimes people create csv files that are just too huge to upload them into your R session while most of the times you just need a subset of this data set. Recently I tapped into this problem and first I tried to import the whole file with functions like fread or the classic read.csv but this did not help much as the file was just too big and my computer failed to import it. With the awesome read.csv.sql function from the sqldf package I found a good way to solve my problem. This function enables you to use SQL statements within the import function which make it possible to select only a subset of the file to reduce the import size.

The code above loads only those lines of the file in where the city is Hamburg. I still had trouble to with the encoding that is why I used this ugly string with backlashes in the SQL statement. I will leave it like this as you might having the same problem.

A few years ago I started to use the R programming language more intensive while writing my master thesis. I used the wonderful arules package for mining association rules and frequent item sets from Michael Hahsler and others. I used this package in the field of forensic accounting. Forensic data analysis is a branch of digital forensics. It examines structured data with regard to incidents of financial crime. The aim is to discover and analyze patterns of fraudulent activities (Wikipedia). Find down below an excerpt from my thesis.

A study by Schneider and John from the year 2013 shows that 37% of the surveyed companies in Germany report that they have already become victims of economic crimes in the last twelve months. The literature research of the present master thesis has been shown that a large part of the forensic analysis methods are used to uncover economic crimes on aggregated data (e.g. balance sheet positions). On the basis of various scientific researches, it can also be shown that there are currently only a few publications which use analytical methods to investigate unaggregated transactions of financial accounting directly on economic crimes. A study of Debreceny and Gray from 2013 reveals that the analysis of the company’s internal financial accounting data has great potential for detecting fraud. For these reasons, this master thesis uses the data mining methodology of association analysis to directly apply financial accounting data for the purposes of forensic accounting to investigate economic crimes.

Make sure you check out the code on my Github along with other projects.

Recently I wanted to create a leaflet map with a specific type of map style but I could not find an appropriate design on the web. I found out that you can use Mapbox Studio to easily design your own maps and use them from within the r package for leaflet. With the code down below we will get an interactive map for Hamburg with our own little design.

First we have to visit the Mapbox website, sign up for an account and create our own map via Mapbox Studio. After creating your own style (the best is to start from a default style and manipulate it for your needs) they will offer you a URL which can be used to display your style in Leaflet. You will find the URL under styles and the dropdown menu of your own created style (next to the edit button). If you haven`t created any style yet go to “New style” to create your first own map design.

Make sure you check out the code on my Github along with other projects.