The Uniform Crime Report (UCR) are an collection of agency-level crime data published by the FBI. There are a number of different data sets included in the UCR including data on crime, arrests, hate crimes, arson, and stolen property. We’ll be using the Offenses Known and Clearances by Arrest data set (the “crime” data set), which is the most commonly used data set in the UCR and is sometimes used as a shorthand for UCR data. In this lesson we’ll use “UCR” and “Offenses Known and Clearances by Arrest” interchangeably but keep in mind that doing so is technically incorrect.

You can read more about the UCR program and all of the data sets it includes on the National Archive of Criminal Justice Data page here. You can also check out my site Crime Data Tool which visualizes several of the UCR data sets and has info in the FAQ pageexplaining the data.

Nearly every police agency in the United States - approximately 18,000 agencies including state, local (city, county, college), tribal, and federal police departments - now report their data to the FBI which compiles and releases the UCR data (some states have their agencies report to a state department which then sends the data to the FBI). This data is available since 1960 though early years have many fewer agencies reporting than do so in later years.

The data file has annual data on the number of crimes reported, the number of crimes cleared, the number cleared where all offenders are under age 18, and the number of unfounded crimes. We’ll discuss each of these a bit further as we dive into the data. Agencies report the monthly number of each crime though the data we’ll work with has aggregated that to annual counts.

Due to it’s longevity (it has data since 1960) and ubiquity (almost every agency reports and has done so for many years) it is a popular data set for criminologists.

19.1 Exploring the UCR data

We are going to look at data with the combined annual count of crimes for every year available - 1960-2017 - which I’ve made available here. The FBI releases the data as a single file per year and each file has monthly counts of crime. This data set does some cleaning for us by aggregating yearly and making it a single file for the whole time period. The first step when working with this UCR data is loading it into R. As with loading any data, make sure that your working directory path is correctly set using setwd() so R knows which folder the data is in.

load("data/offenses_known_yearly_1960_2017.rda")

Let’s start with a basic examination of the data. First, how big it is, what variables it has, and what the units are.

We can see this is a very big file - 159 columns and nearly a million rows! Normally we’d use the head() function to see the first 6 rows of every column but since this data has so many columns we won’t do that as it’d be hard to read. Instead we can use View() to open what’s essentially an Excel file showing our data. This is useful to quickly glance at the data but is limited as it can bias us to believe that the first several rows are representative of the data (an issue also present with head()). But, for a first glance it is useful and will be supplemented by better checks below.

View(offenses_known_yearly_1960_2017)

From looking at the data in View() we can see that the units are agency-years. Each row is a single agency for a single year. This is useful because it tells us we will have crime in agencies over time, which is a very common type of crime data. Let’s take a look at how many agencies report each year using the table() function which says how many times each value occurs for the column we select. This is also a useful check on if every year from 1960 to 2017 is actually available - don’t just trust that the data has what it says it has!

From these results it’s clear that there are huge differences in how many agencies report in early years compared to more recent years. Is this an issue in an analysis? From the above table it is concerning but not entirely clear there is an issue depending on our specific analysis. It we only care about recent years then it wouldn’t matter. If we only use large agencies, then knowing that relatively few agencies reported in 1960 doesn’t mean that few large agencies reported. For that you’d have to look closer at only the agencies you want to study - we won’t do that here but keep it in mind.

19.2 ORIs - Unique agency identifiers

In the UCR and other FBI data sets, agencies are identified using ORiginating Agency Identifiers or ORIs. These are unique ID codes used to identify an agency. If we used the agency’s normal name we’d end up with some duplicates For example, if you looked for the Philadelphia Police Department using the agency name, you’d find both the “Philadelphia Police Department” in Pennsylvania and the one in Mississippi.

Each ORI is a 7-digit value starting with the state abbreviation (for some reason the FBI incorrectly puts the abbreviation for Nebraska as NB instead of NE) followed by 5 numbers. In the NIBRS data (another FBI data set) the ORI uses a 9-digit code - expanding the 5 numbers to 7 numbers. When dealing with specific agencies, make sure to use the ORI rather than the agency name to avoid any mistakes.

For an easy way to find the ORI number of an agency, use this site. Type an agency name or an ORI code into the search section and it will return everything that is a match.

19.3 Hierarchy Rule

The UCR has what is called the Hierarchy Rule where only the most serious crime in an incident is reported (except for motor vehicle theft which is always included). For example if there is an incident where the victim is robbed and then murdered, only the murder is counted as it is considered more serious than the robbery.

How much does this affect our data in practice? Actually very little. Though the Hierarchy Rule does mean this data is an under-count, data from other sources indicate that it isn’t much of an under count. The FBI’s other data set, the National Inicident-Based Reporting System (NIBRS) contains every crime that occurs in an incident (i.e. it doesn’t use the Hierarchy Rule). Using this we can measure how many crimes the Hierarchy Rule excludes (Most major cities do not report to NIBRS so what we find in NIBRS may not apply to them). In over 90% of incidents, only one crime is committed. Additionally, when people talk about “crime” they usually mean murder which, while incomplete to discuss crime, means the UCR data here is accurate on that measure.

19.4 Which crimes are included

If you look back at the output when we ran names(offenses_known_yearly_1960_2017) you’ll see that it produced five broad categories of columns. The first was information about the agency including population and geographic info, then came four columns with the same values except starting with “actual”, “tot_clr”, “clr_18”, and “unfound”. Following these starting values was 30 crime categories. We’ll discuss what each of those starting values mean in a bit, let’s first talk about which crimes are included and what that means for research.

19.4.1 Index Crimes

The Offenses Known and Clearances by Arrest data set contains information on the number of “Index Crimes” (sometimes called Part I crimes) reported to each agency. These index crimes are a collection of eight crimes that, for historical reasons based largely by perceived importance in the 1920’s when the UCR program was first developed, are used as the primary measure of crime today. Other data sets in the UCR, such as the Arrests by Age, Sex, and Race data and the Hate Crime data have more crimes reported.

For a full definition of each of the index crimes see the FBI’s Offense Definitions page here.

Arson is considered an index crime but is not reported in this data - you need to use the separate Arson data set of the UCR to get access to arson counts. The ninth crime on that list, simple assault, is not considered an index crime but is nevertheless included in this data.

Each of the crimes in the list above, and their subcategories, are included in the UCR data. In most reports, however, you’ll see them reported as the total number of index crimes, summing up categories 1-7 and reporting that as “crime”. These index crimes are often divided into violent index crimes - murder, rape, robbery, and aggravated assault - and property index crimes - burglary, theft, motor vehicle theft.

19.4.2 The problem with using index crimes

The biggest problem with index crimes is that it is simply the sum of 8 (or 7 since arson data usually isn’t available) crimes. Index crimes have a huge range in their seriousness - it includes both murder and theft.This is clearly wrong as 100 murders is more serious than 100 thefts. This is especially a problem as less serious crimes (theft mostly) are far more common than more serious crimes (in 2017 there were 1.25 million violent index crimes in the United States. That same year had 5.5 million thefts.). So index crimes under-count the seriousness of crimes. Looking at total index crimes is, in effect, mostly just looking at theft.

This is especially a problem because it hide trends in violent crimes. San Francisco, as an example, has had a huge increase in index crimes in the last several years. When looking closer, that increase is driven almost entirely by the near doubling of theft since 2011. During the same years, violent crime has stayed fairly steady. So the city isn’t getting more dangerous but it appears like it is due to just looking at total index crimes.

Many researchers divide index crimes into violent and nonviolent categories, which helps but is still not entirely sufficient. Take Chicago as an example. It is a city infamous for its large number of murders. But as a fraction of index crimes, Chicago has a rounding error worth of murders. Their 653 murders in 2017 is only 0.5% of total index crimes. For violent index crimes, murder makes up 2.2%. What this means is that changes in murder are very difficult to detect. If Chicago had no murders this year, but a less serious crime (such as theft) increased slightly, we couldn’t tell from looking at the number of index crimes.

19.4.3 Rape definition change

Starting in 2013, rape has a new, broader definition in the UCR to include oral and anal penetration (by a body part or object) and allow men to be victims. The previous definition included only forcible intercourse against a woman. As this revised definition is broader than the original one post-2013 rape data is not comparable to pre-2013 data.

19.5 Actual offenses, clearances, and unfounded offenses

For each crime we have four different categories indicating the number of crimes actually committed, the number cleared, and the number determined to not have occurred.

19.5.1 Actual

This is the number of offenses that occurred, simply a count of the number of crimes that month. For example if 10 people are murdered in a city the number of “actual murders” would be 10.

19.5.2 Total Cleared

A crime is cleared when an offender is arrested or when the case is considered cleared by exceptional means. When a single offender for a crime is arrested, that crime is considered cleared. If multiple people committed a crime, only a single person must be arrested for it to be cleared, and as the UCR data is at the offense level, making multiple arrests for an incident only counts as one incident cleared. So if 10 people committed a murder and all 10 were arrested, it would report one murder cleared not 10. If only one of these people are arrested it would still report one murder cleared - the UCR does not even say how many people commit a crime.

A crime is considered exceptionally cleared if the police can identify the offender, have enough evidence to arrest the offender, know where the offender is, but is unable to arrest them. Some examples of this are the death of the offender or when the victim refuses to cooperate in the case.

Unfortunately this data does not differentiate between clearances by arrest or by exceptional means. For a comprehensive report on how this variable can be exploited to exaggerate clearance rates, see this report by ProPublica on exceptional clearances with rape cases.

19.5.3 Cleared Where All Offenders Are Under 18

This variable is very similar to Total Cleared except is only for offenses in which every offender is younger than age 18.

19.5.4 Unfounded

An unfounded crime is one in which a police investigation has determined that the reported crime did not actually happen. For example if the police are called to a possible burglary but later find out that a burglary did not occur they would put it down as 1 unfounded burglary. This is based on police investigation rather than the decision of any other party such as a coroner, judge, jury, or prosecutor.

19.6 Number of months reported

UCR data is reported monthly though even agencies that decide to report their data may not do so every month. As we don’t want to compare an agency which reports 12 months to one that reports fewer, the variable number_of_months_reported is way keep only agencies that report 12 months, or deal with those that report fewer.

From our table() output it seems that when agencies do report, they do so for all 12 months of the year. However, this variable is seriously flawed, and its name is quite misleading. In reality this variable is actually just whichever the last month reported was. If an agency reported every month of the year, meaning December is the last month, they would have a value of 12. If the agency only reported in December, they would also have a value of 12. While there are ways in the monthly data to measure actual number of months reported, these ways are also flawed. So be cautious about this data and particularly the value of this variable.