When simple arithmetic doesn't cut it

DNAInfo made a set of interesting maps using crime data from New York.

The analyst headlined the counter-intuitive insight that the richest neighborhoods in New York are the least safe. In particular, the analysis claimed that Midtown ranks 69 out of 69 neighborhoods while Greenwich Village/Meatpacking is second last.

According to the analyst, there is no magic -- one only needs to assemble the data, and the insight just drops out:

The formula was simple: divide the number of reported crimes in a neighborhood by the number of people living there, for a per capita crime rate.

By definition, a statistic is an aggregation of data. Aggregation, however, is a tricky business. And this example is a great illustration.

***

DNAInfo’s finding is captured in the following map of “major crimes”. The deeper the color, the higher the per-capita crime rate. The southern part of Manhattan apparently is less safe than areas to the north like Harlem, which has the reputation of being seedy. Greenwich Village has 1,500 crimes for about 62,000 residents (240 per 10,000) while East Harlem has 900 for about 47,000 residents (190 per 10,000). East Harlem is not marginally safer than Greenwich Village – it is 20% safer according to these crime statistics.

Major crimes is the aggregate of individual classes of crimes. The following set of maps shows the geographical distribution of each class of crime. It seems rather odd that the south side would bleed deep red in the aggregate map above while by most measures, it is very safe (light hues almost everywhere in the maps below).

Greenwich Village registers among the lowest for rapes, assaults, shooting incidents, murders, etc. The only category for which it has a poor record is "grand larceny". I have to look up Wiki for that one. Grand larceny is "the common law crime involving threat theft". In New York, apparently "grand" means $1000 or more. That sounds like stealing to me.

***

How is it that a precinct that is safe from most types of crimes and safe for people who don't carry around $1,000 or more ends up at the bottom of the safety ranking?

The “simple” formula assigns equal weight to any kind of crime, whether it is a murder or theft. As shown below, murders occur in single-digit frequency while hundreds of thefts happen each year. It turns out that most of the other crime types also occur in small numbers so this ranking really only tells us where one is most likely to get robbed if one is carrying more than $1,000.

Comments

Another thing that the analysis doesn't take into account is the fact that a relatively small percentage of the people in Midtown at any given moment actually live there. The neighborhoods that get the worst scores by this formula are all heavily trafficked by tourists, and many of them have more office buildings than dwellings. Calculating the "per capita" crime rate for such areas based on the number of residents is very misleading.

Wealthy areas often rank high for larceny: that's where the criminals find the good stuff to steal!

This reminds me of the annual lists of the Most Stolen Cars in America (for example, http://www.cnbc.com/id/42786485/America_s_Most_Stolen_Cars?slide=9) The top cars stolen are usually popular cars like the Camry, Corolla, Accord, and Civic, instead of expensive cars like Lexus and BMW. As the CNBC website states, "What makes them attractive to thieves is that these makes and models are easy to steal and the parts don’t change much from year to year.... As a result, [they are] a popular target for thieves all across America."