analyze the home mortgage disclosure act (hmda) microdata with r and monetdb

back in 1975, congress had it up to here with discriminatory lending practices and decided to require financial organizations originating home mortgages to report some basic operational statistics publicly. the home mortgage disclosure act mandated a major ramp-up in the transparency of home-lending activity across the country. almost forty years later, the data are better than ever. the main downloadable file - the loan application record (lar) - contains one-record-per-loan-application (regardless of origination) and comprises upwards of ninety percent of all federal housing administration (fha) loans. there's also a one-record-per-lending-institution table (ins), but that'll be merged to the lar for your convenience. you know, just in case you want to look at loan-by-loan bank activity in your neighborhood. like most thorough public data providers, the federal reserve provides its own summary report. so give it a skim before you start writing code.

the gregorians celebrate the new year in january, the chinese in february, but the federal financial institutions examination council (ffiec) drops their ball in times square with a data release every september. prospero ano everybody, because the latest hmda (pronounced hum-duh) microdata have arrived. clocking in between ten- and thirty-five million records per year, this looks like a job for monetdb. it's sexy, it's free, it's the perfect companion for big public data. make learning a new language your resolution. this new github repository contains two scripts:

download all microdata.R

initiate a monetdb server on your local machine to house every table and every year of hmda

download and, without taking a breath, import every file into monetdb

merge the loan application record table with the institutional records table, for future easy access

construct some race and ethnicity variables to match those published by ffiec

replicate ffiec publications.R

open up and then connect to a monetdb server instance, like a champ

present a few simple sql queries so you can take it from here

reproduce a few sets of numbers published by the united states government