Land Registry Part 2

In my first land registry post I imported a month’s worth of land registry data, named the rows and had a go at using the ggplot2 package to produce a number of nice looking charts. This time I want to progress a little further. My aims are, using the same dataset to:

Look at the distribution of prices

Look at the prices by different factors

Initially just using factors in the land registry data

The first aim is easy, plotting a histogram only requires the code

#Look at the distribution of priceshist(landregistry$Price)

Which produces the chart

The problem is that this data contains sales from different times, not just 2016 sales. I only want to see the sales from 2016 so need to first filter the sales and then produce a histogram of these sales. This is done using the code

In the end it looks very similar to the non filtered dataframe but it was the right thing to do either way.

Now to move onto the house prices by the different factors. In the data there are a number of different factors which could be worth looking at:

Property type

Old/new

Duration

County

Date of transfer, I’ll just look at the month

There are details on these variables here. I could also look at details based upon the postcode but I don’t feel like looking at them at the moment. Throughout this section I’ll be using the 2016 only data.

Which is a bit better, but not much. Using the same process for property type and old/new we get:

Which show that, unsurprisingly shows that Detached houses are the most expensive and flats, terraced houses and semi-detached houses being cheaper. It is interesting that on this view new builds are more expensive but I think there is more to it than that. I haven't looked at the other variables I said I would because I have run out of time.

Hopefully I'll have time to look into this data more and post what I find. I'll try to post my R code from what I do here