Donor analysis in R – Smith for Congress

In a previous post I introduced the Smith for Congress data set. The data is 49k contributions made by individuals to a congressional campaign for the 2006-2010 electoral cycles. Smith for Congress is not the name of the actual campaign.

Individual contributions are not required to be disclosed by a campaign unless the individual donates more than $200 during a single electoral cycle. The Smith for Congress campaign has, for their own reasons, published every individual contribution. This disclosure allows us an unprecedented look into how a modern campaign raises money. I’ve collected and scrubbed these contributions and published them for research use. In this post I will perform a detailed donor analysis on with R to better understand how the Smith for Congress campaign financed its 2010 election. Full code and graphs can be found on the simple-analysis github repository for this post:

Prepartion

# latest smith for congress data as of this writing is March 23 2011.
cd <-read.csv("smithforcongress-03232011.csv")#subset the data to just the 2010 cycle
cd0 <- cd[cd$cycle ==2010,]# clean up a date variable, and drop amounts < $1.
cd$contribution_date <-as.Date(cd$contribution_date,format="%m/%d/%Y")
cd0 <- cd0[-which(cd0$amount <1),]

Data for the 2010 electoral cycle consists of 11,721 contributions made by 6949 individuals, totaling over $770,000. Here is a sample:

personid

amount

ctd_aggregate

contribution_date

cycle

9zvlnzw1qj9bvq7k1x47v486a

10

20

2009-04-01

2010

iy8xcopedihv9vwqpg3iwmal

15

35

2009-04-01

2010

1f0lct995ckygk6y4vaxk2q44

20

20

2009-04-01

2010

bf2d43vdjdg07pgfmph6ghy7o

20

20

2009-04-01

2010

7sj05z74r8y10fcctvx4a38pn

20

20

2009-04-01

2010

Data Summary

Since the number of individual donors (6,949) is so much lower than the number of contributions (11,717) we can guess a good portion of those donors gave multiple times. The long-form contribution data is somewhat difficult to work when looking at multiple contributions from the same person. We’ll generate a summary data frame to help with our analysis. The following variables will be captured per individual donor:

Date of first contribution

The total value of all contributions by this individual

The total number of contributions by this individual

The amount of the first three contributions. Blank or NA if they have made less than 3 contributions.

The difference in time for the first three contributions. Blank or NA if they have made less than 3 contributions.

Now the cd0s data frame holds our summary table, which looks like this:

personid

first.contribution

num.contributions

dt1

dt2

dt3

am1

am2

am3

total.value

1023ryaqqbvz76kh3yq0r2ngq

2010-10-18

1

NA

NA

NA

25

NA

NA

25

1036lg58hd4skceuyqrr2peb4

2010-03-25

2

166

NA

NA

35

25

NA

60

106f366ysq6xe9ci731wejh0k

2009-12-11

4

91

185

63

50

50

50

250

1081wyujzkgninrt1srf79tbo

2009-08-27

3

58

114

NA

25

30

10

65

1094yhx62fcdx3c012mlpxnex

2009-10-15

1

NA

NA

NA

1000

NA

NA

1000

Giving Levels

With detailed giving levels we can infer a lot of information about a campaign, and about how the fundraisers are doing their jobs. If most of the giving was in the $15-20 range we can assume they focus on small donors and maybe online contributions. If most of the giving is in the $100-250 range then maybe the campaign throws lots of medium sized dinners. If most of the donations are close to the legal maximum of $4800 then the campaign is focused on major donors, and might be ignoring smaller donors all together.

Plotting a histogram of total donation amount per individual will give us better insight into the giving levels.

In 2010, 75% of contributors gave $100 or less total to the campaign. The summary table shows us the median total value donated was $50, while the overall average was $111. The maximum was $4800, which is also the maximum allowed by law for 2010. We can infer that while there was certainly some major-donor solicitation, the fundraisers were focused on much smaller donors.

Repeat donors

Now that we know more about giving levels, it would be helpful to better understand giving frequency. The amount of repeat giving may give us insight in to how involved the fundraisers are getting, and maybe even how often they are asking for money.
We’ll use a histogram and a cross-tab of the total number of contributions by individuals to help us with this analysis:

Our plot and table shows about two thirds (61%, 4,242) of the contributors to Smith for Congress only gave one time, leaving 2,707 people who gave more than once. Most of the people who gave more than once gave twice, but there were still several hundred people who gave 3 or 4 times each.

To understand how important repeat giving might be we need more detailed information. We need to look at the total amount donated by each group of contributors; we’ll also include the cumulative total, cumulative percentage, and individual percentage of total for each group.

We see the campaign raised $284,000 (36.8% of the total raised) from the 4,242 contributors that gave only once, and $212,000 (27.5% of the total raised) from the 1,599 contributors who gave two times. We also see the campaign raised $487,378 from 2,702 repeat donors; that is almost 64% of the total value raised for the entire cycle from individuals. It is obvious the Smith for Congress campaign is good at attracting small dollar donors, one-third whom gave more man once. This is a pretty impressive repeat donor rate.

Finally I’d like to look at what kind of donations make up each level of giving. We know repeat donors gave $487,000, but we don’t know if that was mostly in $50 donations or in $250 donations. We can use a box and whisker plot to break down each giving level. I’m leaving off contribution levels 8 – 14 since giving was so sparse at those levels. We’ll be plotting this histogram with a log transform on the y axis since few very large values will skew graph and render it mostly useless. I used a trick from this stack overflow thread to get the formatting correct on the Y axis:

This latest plot and table are both incredibly text heavy, but this is the critical intelligence required to start a fundraising plan.

We see the average total contribution increases with the giving frequency, this makes sense. The average increases in an approximately linear fashion which suggests the individual contribution amounts are staying constant. This may be a function of some campaign fundraising tactic, like “donate $35 now for a free tshirt.” We can also get a sense of how much success the Smith for Congress major donor program enjoys. An individual can legally donate $2,400 for both a primary and a general election per cycle. We can count how many individuals have maxed out at $4800 and measure how much impact the major donors have on the total amounts raised:

# how many individuals gave the max for one electionnrow(cd0s[cd0s$total.value==2400,])nrow(cd0s[cd0s$total.value==4800,])

We see 7 individuals who gave the maximum for one election, and only 2 individuals who maxed out for the entire cycle. The maxed out donors make up only 1.2% of total giving; this is very low for the average campaign. This tells us major donors aren’t the most important segment to Smith for Congress, but it could also mean that the campaign isn’t able or isn’t willing to ask the max amount from large donors.

Take Away

We can take away the following facts from our analysis:

40% of individual donors gave more than once to Smith for Congress

80% of donors gave $100 or less to the campaign

Repeat donors gave $487,000 total to the campaign

Two out of 6,949 (0.028 percent) donors gave the maximum amount allowable by law for a total of 1.2% of the total amount raised

From all this we can infer that Smith for Congress is running a very strong repeat donor program, and isn’t focused on only high-dollar donors. This information could be very useful in a number of different ways. A treasurer for Smith for Congress could use this information to design a 2012 fundraising plan and campaign budget. A candidate similar to Smith, or running in a similar district, could use this same information to plan their own campaign. Or a rival campaign could use this during opposition research and financial planning. Or researchers could use this to build better generic models of US House individual fundraising. I hope this shows that detailed campaign finance analysis is pretty simple when you’ve got access to the relevant data, which unfortunately is very uncommon.