This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.

The objective of the project was to use a series of Alteryx workflows and predictive models to generate predicted crime figures for metropolitan areas in England over the next three years. Additionally, we wanted to test the validity of our forecasts using data for the first three months of 2018.

Describe the business challenge or problem you needed to solve:

Climber recently became an Alteryx Preferred Partner and was keen to demonstrate its Alteryx abilities to clients. To do this, we looked at publicly available datasets to work with; which led us to look at the Crime Statistics for England and Wales. We thought if we could predict crime rates for the next few years, the information could be used to maximise spend on staffing, how best to allocate resources in specific time frames, and identify groups of areas to understand why that level of crime occurs in the first place. Crime today is a sensitive topic – it’s on the rise and we need to understand the implications to best understand how to deal with it.

Describe your working solution:

We loaded crime statistics datasets from the previous three years and filtered out unnecessary fields and rows. We manipulated the LSOA (Lower Layer Super Output Areas) field to give us the local area names, which we then mapped to wide area names (using dataset Local Authority District to Region Lookup in England). Crime categories were reclassified into higher level groups, for consistency due to variations in recent years.

Time Series Forecasting: We performed time series forecasting to predict crime rates on a test dataset (January-March 2018). We used “TS Model Factory” and “TS Forecast Factory” tools from Alteryx Gallery to run ARIMA (Auto -Regressive Integrated Moving Average) and ETS (Exponential Smoothing) models.

Validation: We then validated the predictions against the real crime numbers for January-March 2018.

The validation results show errors across the different regions:

Error against our “ETS” model were shown to be 12.2%, suggesting our forecasts were 87.8% accurate. Error against the “ARIMA” model was shown to be 12.1%, suggesting forecasts were 87.9% accurate. Since we found that the “ARIMA” model performed slightly better than the“ETS” model, so we decided that we would use that data for our analysis.

Enhancing the data: We were interested in performing cluster analysis to see the impact that deprivation has on crime levels through combining the dataset ‘Indices of Deprivation 2015’. The data contains the seven relative measures of deprivation for small areas (Lower-layer Super Output Areas) across England.

We took the indicators for each domain, excluding crime, as that’s what we were forecasting and used it for our clustering analysis. To do this effectively, we needed to find out which of these factors were more likely to impact crime and exclude those that didn’t. The Association Analysis Tool produced a Correlation Matrix:

Based on the correlation results from the matrix, we had five fields which had a high confidence level, which we reduced further using several techniques to select those that are only likely to increase the accuracy of predictions. We then applied K-Centroids Diagnostic tool to determine the optimal number of clusters, and then K-Centroids Cluster Analysis tool to assign each area to a cluster.

We exported the data from Alteryx and loaded in all the relevant files into Qlik Sense for visualisations, including the KML files for the maps ([3],[4]).

Describe the benefits you have achieved:

Using Alteryx allowed us to combine vast amounts of unstructured public data from multiple data sources, of which we easily prepared it for advanced analytics. We utilised advance techniques to produce meaningful insights so that we could understand and be ready for what is highly likely to happen – the key to a stable foundation in delivering valuable, actionable insight to ensure high quality decision making. We predicted crime rates with a higher degree of accuracy and demonstrated how predictive analytics can be applied to real-world scenarios. We showcased our results using Qlik Sense, a highly accessible tool that allows you to understand the story that we are telling. Our findings were presented to Police Force staff and Press in a webinar, and overwhelmingly positive feedback was received.

This is a great example for hands-on on predictive modelling. Thank you for sharing this. While going through the case study I faced some challenges. The objective set out for 2nd part of the problem was to see the impact deprivation had on crimes. Given that then, crime is our dependent variable and deprivation is our cause for that outcome (predictors). Again deprivation index was calculated based on the indicators underlying each of the six domains of deprivation. To study crimes and understand which indicators are more responsible for the crime, a correlation between these indicators and the crime was required to be explored. In your solution, I struggled to understand how you established this correlation between crime and the indicators of deprivation. Which were the indicators you finally considered? Was your cluster analysis based on these indicators of deprivation? If not, what was your objective for clustering and which variables were used as predictors? Thanks again and looking forward to it.