Contact

The City-to-City Housing Value Impact Modeler

Published March 11, 2018 by Jacob Kohlhepp

UPDATE 5/30/2018: The mortgage interest control was updated (previously the tool was only controlling for the portion of the mortgage representing realtor fees). All city pairs have been updated. The below graphs and resulting effects have been updated to reflect this change. We have also added San Francisco to San Mateo County to allow San Francisco city to be compared with other San Mateo County cities. Previously San Francisco was not included because it is the only city in San Francisco County.

Hello, and thank you for taking the time to read! This is the first ever blog post for Intrepid Insight, our not-for-profit association of consulting, accounting, IT, data analytic and statistical professionals focused on solving interesting problems for good causes. In this article I will be discussing Intrepid Insight’s first freely available tool: The City-to-City Housing Value Impact Modeler (CTCHVIM for short). CTCHVIM is the first in a set of free tools Intrepid Insight is planning to provide. In the course of this blog post, I will briefly describe how to use CTCHVIM, how to interpret the results, and future applications of CTCHVIM.

First, without further ado, here is CTCHVIM:

It is also available (separate from the distractions of this blog post) here. If you would like to display the app itself in an iframe or something similar on your own website, the shiny app is directly available here.

Before discussing how to practically use the tool, here is a quick summary of how the tool works: CTCHVIM uses Zillow Median Home Values per square foot data from the past several decades to run VAR regressions on every pairwise combination of cities within all counties across California, controlling for trend, mortgage interest rates, and seasonal effects. This means that within any given county, the CTCHVIM will compare the impact of home values between each city (but not across counties). The resulting graphs are impulse response functions that show the effect of a median home value increase in housing in one city on the median home value in another city. Bootstrapped standard errors are also computed. There are of course a number of subtleties, assumptions and methodological notes associated with this, and I plan to write another blog post describing the process in more vivid detail, including code excerpts and background literature. Hopefully the elevator pitch version will suffice in the meantime! (But, as with everything, reach out if you have questions). If you are super curious, the documentation for -VAR- and -irf-, the main two commands in R used to perform the analysis, is here.

Using the tool

Now, back to how to use CTCHVIM. Simply decide which county you are interested in, and then select which two cities within the county you would like to see analyzed. Which city is selected as the first city and which is selected as second city is not important – it will only impact which city’s graph is on the bottom or the top.

After this selection is done, two graphs will be displayed. As I mentioned earlier, these graphs are the (non-orthogonlized) impulse response functions derived from the vector auto regressions run on the home value data for the two cities. The last few paragraphs contained a lot of statistical jargon, so I will now launch into interpretation.

To explain how to interpret the tool, I will use the cities South Lake Tahoe and Placerville as examples. Both cities are within El Dorado County. here is quick map:

Entering them into the tool, we get the below pair of graphs:

A few things are important to notice. First, the y-axis of the graphs is not the same, but the x-axis is. This is an unfortunate issue that we will work on fixing so that each pair of graphs uses the same scale. For now it is necessary for the viewer to pay attention to the magnitude of the axes tick marks. Second, the y-axis units are dollars and the x-axis unit is in months. The time series used in CTCHVIM are monthly, and all of the impulse response functions (these graphs) display the effect of the impulse out to 12 months ahead.

Each graph is essentially a single line surrounded by a shaded region. The dark line represents the point estimate of the response every month following the initial increase. The shaded region is the area between the upper bound and the lower bound of this point estimate, and it represents a measure of error for the point estimate. The first observation I make is the distinct upward slope of South Lake Tahoe’s response to a $1 increase in Placerville, and the much more moderate, arched trajectory of Placerville’s response to South Lake Tahoe. It is also notable that in the first graph, the shaded region is always above the $0 line, while in the second graph the shaded region initially stays above zero, but eventually contains $0 as time goes on. This is preliminary evidence that, at least historically, increase in home values in Placerville have a statistically significant impact on home values in South Lake Tahoe, and this effect persists and grows over time. The reverse does not seem to be true. A home value increase in South Lake Tahoe does only temporarily yields a statistically significant effect.

The magnitudes of the effects are also quite different. The maximum impact of a $1 increase in median housing values per square foot in Placerville on South Lake Tahoe is expected to occur at some point beyond 12 months, and this effect is likely more than $5 per square foot ( due to computing restraints, the IRFs are not computed beyond 12 months, but South lake Tahoe’s graph suggests that a maximum has not yet occurred). The maximum impact of a $1 increase in median housing values per square foot in South Lake Tahoe on Placerville appears to occur after 4 months, with an estimated effect of $0.38 per square foot.

The graphs display the non-cumulative impulse response, meaning that they are the impact in period x, not the total impact after x periods. It is also helpful to consider the cumulative impulse response function. We won’t graph them here (but we will in a post to come!), but we can sum the value of all the points at each month interval. If we do this we find that:

Following a $1 increase in median housing value per square foot in Placerville, the estimated cumulative impact on median home value per square foot in South Lake Tahoe after 12 months is $31.00.

Following a $1 increase in median housing value per square foot in South Lake Tahoe, the estimated cumulative impact on median home value per square foot in Placerville after 12 months is $4.14.

So far, all of these results have been presented in terms of their original units, namely median home values per square foot. In order to bring the results of this tool into the real world, it is useful to expand on what these units mean.

Interpretation

First, the input data are home values NOT prices. The difference here is that home values can be estimated, in this case by Zillow, for all homes in a location, whereas prices are only realized and recorded for homes that are actually sold. This makes the results generally applicable for all home owners in a community, not just those who are about to sell their house or about to buy a house. It also removes many of the selection bias problems with other home price indices. The trade off is that home values have to be estimated, and aren’t truly “real data.” Zillow does this using a number of proprietary methods which are partially described here.

Then there is the second part of the name, “median.” Of course most people know how to calculate a median, but it is important to tease out the implications in this context.Based on the way it is phrased, “median home values per square foot” seems to suggest that the median home value was found, and then divided by the square footage for that home. It was not immediately clear to me where this is laid out in Zillow’s documentation, so for now I will assume this is the correct order of the calculation. Because these are “median home values,” the impacts we calculate in the graphs are most valid for homes that are at or near the 50th percentile in terms of total value in a given geographic location. Although it is tempting to multiply the results by whatever number of square feet we want, there is no telling if there are distributional differences in impact. In other words, because we only used median values, we cannot say for certain whether our conclusions are valid for homes that are at different points along the home value distribution. Because one of the biggest determinants of individual home values in a given location is the amount of land the house occupies (square footage), it is highly likely that houses deviating from the median in terms of square feet are also deviating from the median in terms of value.

Taking all of that into consideration, we can now scale up the impact calculated from our tool for the median home in Placerville and the median home in South Lake Tahoe. Assuming that “median home value per square ft” is the median home value divided by the square feet of that home, we can backwards engineer the approximate square footage of the median home in each location by dividing the median home value from the “ZHVI All Homes” data set from the “Homes Values” section of Zillow’s data page here by the “Median Home Value per Sq Ft” data set from the same source (this is the data we used for our analysis). Because the data only contained data out to December 2017 when we ran this analysis, we will use December 2017 as an example. This methodology yields 1,688 square feet for Placerville’s median home and 1,369 square feet for South Lake Tahoe’s median home. Multiplying the cumulative impacts $31.00 and $4.14 we calculated earlier, along with the $1 initial shocks, we get the following result using December 2017 as our frame of reference:

An initial, $1,688 increase in the median home value in Placerville is estimated to result in a $42,439 increase in the median home value in South Lake Tahoe after 12 months.

An initial, $1,369 increase in the median home value in South Lake Tahoe is estimated to result in a $6,988 increase in the median home value in Placerville after 12 months.

So, the tool exists, and I know how to intrepret it. How is it helpful for me?

Well, to be honest, it might not be helpful for you. If your sole concern is predicting future housing prices, it would be more meaningful to choose a single geographic area and assess whether a multi-variable model (which is precisely what any VAR system is) even presents an improvement over a single variable model in terms of prediction. You would also need to do a more thorough examination of how to model housing prices for each location. I will discuss these issues at length in another post, with a test example.

CTCHVIM is better tailored towards policymakers at the county and municipal level who want to do a cursory look at how their city’s housing market might react to a policy change or demographic shift in a neighboring city that is expected to result in a future price increase in that other city. Perhaps they are concerned about property tax revenue. Perhaps they need to understand if skyrocketing home values are likely to spillover to their city. If you are a city manager or planner interested in more depth analysis of your city’s situation, please reach out to us. As part of our 18 for 2018 initiative, we are offering to do the first 18 approved projects for free.

4 Comments

Do you mind if I quote a few of your posts
as long as I provide credit and sources back to your website?
My blog is in the very same area of interest as yours
and my visitors would truly benefit from a lot of the information you
present here. Please let me know if this okay with you.
Thank you!

We are a gaggle of volunteers and opening a new scheme in our community.
Your site offered us with useful information to work on. You have done
a formidable activity and our whole neighborhood can be grateful to
you. https://Nicolitalia.com/