Summary

This study used crime count data from the Pittsburgh, Pennsylvania, Bureau of Police offense reports and 911 computer-aided dispatch (CAD) calls to determine the best univariate forecast method for crime and to evaluate the value of leading indicator crime forecast models.

The researchers used the rolling-horizon experimental design, a design that maximizes the number of forecasts for a given time series at different times and under different conditions. Under this design, several forecast models are used to make alternative forecasts in parallel. For each forecast model included in an experiment, the researchers estimated models on training data, forecasted one month ahead to new data not previously seen by the model, and calculated and saved the forecast error. Then they added the observed value of the previously forecasted data point to the next month's training data, dropped the oldest historical data point, and forecasted the following month's data point. This process continued over a number of months.

A total of 15 statistical datasets and 3 geographic information systems (GIS) shapefiles resulted from this study.

Output Data from Regression Forecast Program for Part One Property Crimes: Forecast Errors (Dataset 14) with 4,936 cases

Output Data from Regression Forecast Program for Part One Violent Crimes: Forecast Errors (Dataset 15) with 4,936 cases.

The GIS Shapefiles (Dataset 16) are provided with the study in a single zip file: Included are polygon data for the 4,000 foot, square, uniform grid system used for much of the Pittsburgh crime data (grid400); polygon data for the 6 police precincts, alternatively called districts or zones, of Pittsburgh(policedist); and polygon data for the 3 major rivers in Pittsburgh the Allegheny, Monongahela, and Ohio (rivers).

Citation

Gorr, Wilpen L., and Olligschlaeger, Andreas. Crime Hot Spot Forecasting with Data from the Pittsburgh [Pennsylvania] Bureau of Police, 1990-1998. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2015-08-07. https://doi.org/10.3886/ICPSR03469.v1

Smallest Geographic Unit

Distributor(s)

Time Period(s)

Date of Collection

Study Purpose

This study had two purposes: 1) To determine the best univariate forecast method for crime, and 2) To evaluate the value of leading indicator crime forecast models using the best univariate forecast model as the benchmark of comparison.

This study design is based on the rationale that in order to be a candidate for practical use, a leading indicator model must forecast more accurately than the simpler, but best univariate model.

Study Design

This study used the rolling-horizon experimental design, a design that maximizes the number of forecasts for a given time series at different times and under different conditions. Under this design, several forecast models are used to make alternative forecasts in parallel. For each forecast model included in an experiment, the researchers estimated models on training data, forecasted one month ahead to new data not previously seen by the model, and calculated and saved the forecast error. Then they added the observed value of the previously forecasted data point to the next month's training data, dropped the oldest historical data point, and forecasted the following month's data point. This process continued over a number of months.

For univariate forecast methods, the researchers used a five-year rolling horizon. For multivariate, leading indicator models estimated by least squares regression, the researchers used a three-year moving window. The researchers made forecasts over a 36-month period (January 1996 through December 1998) in order to generate an adequate sample size of forecast errors for statistical testing purposes.

The researchers took the following steps:

They collected all offense reports and 9-1-1 Computer Aided Dispatch calls from the Pittsburgh, Pennsylvania, Bureau of Police for the years 1990 through 1998.

They aggregated the crime space data and time series data.

They conducted two major sets of forecast experiments with these data: 1) a study based on precincts to determine the best univariate forecast method for crime and 2) a study based on 4,000 foot, uniform grid cells to evaluate the value of leading indicator forecast models with the best univariate forecast model as the benchmark of comparison.

To compare forecast accuracy of competing univariate methods, they used pair-wise (matched comparisons) t-tests of forecasts for significance testing.

They used a form of Granger causality testing (Granger 1969) to determine the relative value of leading indicator models.

To develop benchmark accuracy measures, they first carefully optimized over univariate methods to get the most accurate forecasts (Gorr, Thompson, and Olligschlaeger 2000).

Rather than assess accuracy based on the performance of individual point forecasts for each grid cell, they examined forecast performance within ranges of changes for both decreases and increases.

Using contingency tables they contrasted forecasts and actual outcomes within each range and designated correct forecasts as true positives and true negatives, and incorrect forecasts as false negatives and false positives.

They applied pair-wise comparison t-tests within classes to determine if leading indicator forecasts were significantly better than univariate forecasts.

Within actual change categories, they identified the corresponding sets of actual and forecasted values. A univariate and a multivariate-leading-indicator forecast resulted for each point.

They computed the difference of squared or absolute forecast errors for each matched pair in the same change category.

To evaluate the relative performance of the multivariate method within a change category, they asked whether the mean error over all matched pairs in the category was significantly different from zero. If they subtracted the univariate absolute error from the multivariate absolute error, then a mean error that is significantly different from zero in a negative direction would indicate that the multivariate forecast is more accurate (i.e., has smaller forecast errors).

Sample

Crime counts from all offense reports and 911 computer-aided dispatch (CAD) calls in electronic form from the Pittsburgh, Pennsylvania, Bureau of Police for the years 1990 through 1998.

Unit(s) of Observation

Data Source

Offense reports and 911 computer-aided dispatch (CAD) call records from the Pittsburgh, Pennsylvania, Bureau of Police for the years 1990 through 1998.

Data Type(s)

administrative records data

aggregate data

experimental data

Mode of Data Collection

record abstracts

Description of Variables

The Univariate Forecast Data by Police Precinct (Dataset 1) contain 11 variables comprised of 1 unique identification variable, 2 variables indicating time (month, year), 1 aggregate crime code variable, and 7 crime count variables (1 variable for each of the 6 police precincts in the city of Pittsburgh, plus 1 variable for the city of Pittsburgh as a whole).

Response Rates

Presence of Common Scales

Original Release Date

2015-08-07

Version Date

2015-08-07

Version History

2015-08-07 ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection:

Created variable labels and/or value labels.

Notes

The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.