Other sites

San Francisco Crime Data Analysis Part 1

Hello. Today I will analyze the San Francisco Crime Data which can be found at Kaggle. In another post, I will plot the data onto the San Francisco map. Are you ready to discover how crime is taking place in this beautiful city?

NOTE: In my heat maps, if the plotted values are calculated within aparticular variable, across all the other variables (rather than both variables which is the usual case in heat maps), then to distinguish the particular variable I draw grey (horizontal or vertical) lines. I prefer this kind of heat maps when analyzing two variables, and their joint behavior (instead of say faceted plots), since in some cases having heat maps is easier to distinguish and provides a clearer big picture.

There are 39 crime categories, and 879 unique descriptions. The descriptions are like categories with finer details. It could be useful to break down the descriptions into words, and use them as predictors.

Let’s now have a look at police districts, resolution and some heat maps with previous variables.

We can see that each crime type is distributed evenly over the districts in most cases. Exceptional clearance, psychopathic case is happening the most in Southern District. I have added horizontal grey lines to make it clear that the percentages over the districts sum up to 100% for a particular Resolution.

Does the day affect crime type and volume? First let's order the days of the week, starting from Monday, and then plot a heat map.

Again we can confirm that Southern District being the least safe place, while Richmond and Park being the best place in terms of crime. Friday and Saturday seems to be highest crime days compared to the rest of the week.

Above, we try to see whether on a given day, the crime rate differs across the districts. As was shown before, Southern District is the most crime rich area. We can also see whether on certain days, the ordering of districts changes or not. The crime rate over the districts does not change with day of week, and is quite stable.

With the above plot we can see whether crime rate changes within a police district based on the day. We can see that Friday is the day with highest crime rate, and Sunday is the lowest. Wednesday is also seemingly high. Another interesting fact is that Tenderloin is relatively a safer neighborhood to be at on Fridays.

How about day-crime type relation? Do you think day would affect the behavior of the criminals? Let's see.

The distribution of crime categories stay the same in each day, with Fridays and Saturdays being a little more crowded. Also, some classes behave opposite to each other. Theft and crimes-other for example, have negative relationship. It seems like on Fridays and Saturdays Theft is preferred more, and on the other days 'other' crimes.

Does the crime rates with respect to categories change over days? It is very much possible that some days, some crimes might be more proffered compared to the rest. For this I'll calculate the percentages within each category.

We can see that overall the number of crimes committed per year stayed about the same. However, 2015 is very low. What suddenly happened in 2015? Well this info is already given on the data site, but let's find out ourselves. Suspecting not the same amount of data was collected in 2015, I have created the table showing each month and year combination, and then the number of months from each year data was collected. Indeed, in 2015 data in the first 5 months was collected. Taking that into account, 2015 is also about the same with the other years.

Now let's see whether crime categories changed over the years, or stayed the same. 2015 will be taken out from the data since it has only 5 months. I will also take out Trea since it is present only recently, and the occurrences are very few.

By holding the category constant and calculating the percentage of crime over the years, we see that some categories have some changes over the years. Loitering was higher during 2007 and 2008 with close to 18%, while it decreased to 2% in 2014. Vehicle theft also decreased by more than 50% from 17% in 200, to 7% in 2014.

The above plot let's us see the percentage of each crime type within certain a year, and also let's us realize any trends if there is any. Very quickly it can be seen that drug, burglary, assault, larceny/theft are the most common type of crimes. Also, the weight of larceny/theft within a year increased from 18% in 2003 to 26% in 2014. This can help the SFPD to look into this situation and take more precautions. Vehicle theft on the other hand decreased from 13% in 2005 to 5% in 2014.

I am also very much interested in seeing the relation between the crime types and districts. It might be that some crime types are more prevalent in certain areas.

When the category is held constant, we can see that prostitution happens most in mission district with 50% among all the districts, drug happens most in Tenderloin with 33%, and loitering in Southern with 35% among all the districts. So it is a fact that some crime types happen to choose some areas more than the others.

By checking the distribution of crime types within each area, we can see that they are mostly stable with about the same percentage within each district. However, larceny/theft happens to be significantly more prevalent in Central, Northern, and Southern.

In this post I have analyzed the San Francisco Crime Data. Although this analysis can provide insight into the relation between crime, district, weekday, and some other variables, it is by no way complete without maps. That’s exactly what I’ll be doing in my next post. See you then!