Saturday, July 19, 2014

Guns are cool - time effects

September last year I made a post using the shootingtracker data. It is attempted in shootingtracker to register all shootings with at least four victims, be they wounded or dead. The data starts January 1st 2013, which means that by now the amount of data has almost doubled. This surely is a dataset where I hope the makers find less and less data to add. Analysis shows Sundays in summer have the highest number of shootings on a day. Three to four shootings on Sunday in July and August.

Effect of day and month

Effect of day of the week is pretty easy to plot, just add the day and run qplot(weekday). In this case complexities arise because I want the days in a specific order, Monday to Sunday. Second is that not all days occur equally often in the test period. This is not enough to invalidate the plot, but since I had to correct for occurrence of months for a similar plot, I decided to reuse that code for weekdays. The data.frame alldays is used to calculate the number of days in the data set. I am not going to over analyze this, Sundays stick out in a negative way.

library(ggplot2)

r7$weekday <- factor(format(r7$date,'%a'),

levels=format(as.Date('07/07/2014',format="%m/%d/%Y")+0:6,'%a'))

r7$month <- factor(format(r7$date,'%b'),

levels=format(as.Date('01/15/2014',format="%m/%d/%Y")+

seq(0,length.out=12,by=30),'%b'))

alldays <- data.frame(

date=seq(min(r7$date),max(r7$date),by=1))

alldays$weekday <- factor(format(alldays$date,'%a'),

levels=levels(r7$weekday))

alldays$month <- factor(format(alldays$date,'%b'),

levels=format(as.Date('01/15/2014',format="%m/%d/%Y")+

seq(0,length.out=12,by=30),'%b'))

ggplot(data=data.frame(prop=as.numeric(table(r7$weekday)/

table(alldays$weekday)),

weekday=factor(levels(r7$weekday),

levels=levels(r7$weekday))),

aes(y=prop,x=weekday)) +

geom_bar(stat='identity') +

ylab('Shootings per day') +

xlab('Day of the week')

In terms of months, it seems summer is worse than the other seasons and winter is best.

ggplot(data=data.frame(prop=as.numeric(table(r7$month)/

table(alldays$month)),

month=factor(levels(r7$month),

levels=levels(r7$month))),

aes(y=prop,x=month)) +

geom_bar(stat='identity') +

ylab('Shootings per day') +

xlab('Month')

A model

It is not obvious, given the unequal distributions of weekdays over months, how significant a month effect is. To examine this, I have reorganized the data to display shootings per day. Data frame alldays is used again, now to ensure data with no shootings are correctly represented. The modeling shows a clear effect of months and the interaction of days and months on the brink of significance.

r7$one <- 1

ag <- aggregate(r7$one,

by=list(date=r7$date),FUN=sum)

counts <- merge(alldays,ag,all=TRUE)

counts$x[is.na(counts$x)] <- 0

g0 <- glm(x ~ weekday ,data=counts,

family='poisson')

g1 <- glm(x ~ weekday + month,data=counts,

family='poisson')

g2 <- glm(x ~ weekday * month,data=counts,

family='poisson')

anova(g0,g1,g2,test='Chisq')

Analysis of Deviance Table

Model 1: x ~ weekday

Model 2: x ~ weekday + month

Model 3: x ~ weekday * month

Resid. Df Resid. Dev Df Deviance Pr(>Chi)

1 548 653.86

2 537 624.86 11 29.008 0.002264 **

3 471 538.99 66 85.870 0.050718 .

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Predictions

To understand the interaction, a plot is made of the expected values by day and month. Looking at this plot, the Sunday effect is most pronounced in Summer. In addition I would not be surprised if next Sunday has four shootings again.

pred1 <- expand.grid(weekday=factor(levels(r7$weekday),

levels=levels(r7$weekday)),

month=factor(levels(r7$month),

levels=levels(r7$month)))

preds <- predict(g2,pred1,

type='response',

se=TRUE)

pred1$fit <- preds$fit

pred1$se.fit <- preds$se.fit

limits <- aes(ymax = fit+se.fit, ymin=fit-se.fit)

p <- ggplot(pred1, aes(fill=weekday, y=fit, x=month))

dodge <- position_dodge(width=0.9)

p + geom_bar(position=dodge, stat="identity") +

geom_errorbar(limits, position=dodge, width=0.25) +

theme(legend.position = "bottom") +

ylab('Shootings per Day') +

guides (fill=guide_legend('Day'))

Follow the example?

One might ask if shootings which have much media attention give cause to copycats. This is not easy to analyze, given that clearly time effects from day and month exist. Besides, which shootings get a lot of media attention? Yet we can look at the number of shootings over time and at least add the shootings with most victims in the plot. The number of victims to be marked has arbitrarily been chosen as 18 or more. In this plot I cannot see the connection.

r7$Victims <- r7$Killed+r7$Wounded

table(r7$Victims)

4 5 6 7 8 9 10 12 13 14 18 19 20 21

322 104 25 26 11 5 4 2 2 2 1 1 1 1

p <- ggplot(counts, aes(y=x,x=date))

p + stat_smooth(span=.11) +

geom_point() +

geom_vline(xintercept=as.numeric(r7$date[r7$Victims>15])) +

ylab('shootings per day')

An ACF

Mostly because making an ACF is integral part of analyzing time series. Here it is, in case anybody doubted a week effect. I chose this fairly long lag, because 10 weeks seemed nice to me.

No comments:

Post a Comment

Wiekvoet

Wiekvoet is about R, JAGS, STAN, and any data I have interest in. Topics range from sensometrics, statistics, chemometrics and biostatistics. For comments or suggestions please email me at wiekvoet at xs4all dot nl.