Is a company’s Fortune 500 ranking correlated to its contributions to House and Senate election campaigns?

(Adapted from a final project report for a Data Science class at Brown University in Spring 2016. I worked in a group of three students to clean, analyze, and visualize the data. Please note that this page is not mobile-friendly.)

One of the main issues in the 2016 presidential race was campaign contributions–where do they come from? Who receives them? We were interested in the role played by money in elections. By contributing massive donations to campaign cycles, are companies essentially “buying off” the government for their own means?

We began by analyzing publicly available datasets (from OpenSecrets.org) to examine financial contributions towards candidates running for the House of Representatives and the Senate. We explored which sectors donated the most to elections, and found the top 15 industries that contributed to certain campaigns.

In the bubble chart above, each node corresponds to an industry that made contributions during the 2014 House and Senate elections. The size of a node corresponds to the total value of all contributions by that industry.

In the next visualization, we displayed the top 15 industries that contributed to campaigns in the 2014 election cycle.

Our initial explorations were rather broad; we started out with the goal of discovering trends between campaign contributions and the outcomes of close elections. After receiving some feedback on our ideas, we decided to narrow our focus.

We wanted to know whether company contributions might be correlated with company performance; in other words, if a company contributes to at least one winning candidate in the House or Senate, is that contribution correlated with an increase in their own financial success?

As an indicator of company performance, we decided to use the yearly Fortune 500 rankings. We specifically chose this indicator because 1) it was easily accessible and 2) the methodology used by the Fortune 500 is to rank companies by total revenues for their respective fiscal years, and takes into account profits after taxes. For our purposes, this seemed to be sufficiently representative of an American company's annual financial success.

Data Collection Methodology

Data Collection

Our project required data from 2 different sources: Fortune 500 rankings from 2012 and 2013, and OpenSecrets election data from 2012. While there was data about Fortune 500 company profits in 2012 and 2013, we couldn't obtain any for 2014 and so we could not factor profit changes between 2013 and 2014 into our analysis. However, we were able to incorporate Fortune 500 rankings from 2014.

OpenSecrets Data

To obtain the election data on campaign contributions, we used the OpenSecrets API to gather the legislators that won in the 2012 election cycle and the top 10 industries and companies (and their breakdowns) that donated to each of them, and wrote that information into a JSON file. That way, we could read from our own file to do any analysis, and avoid making API calls every time we wanted information (since API calls were limited to 200/day). Pulling in the data from the API proved to be more work than we anticipated because there were many inconsistencies in the available data. For example, when we tried to match the legislators' ids to the companies that contributed to them in an election cycle, some of the ids didn't exist in the candidate contribution API call. Thus, we had to manually check for these specific ids and make sure we didn't call them, which took up the majority of our data collection time.

Fortune 500 Data

We found the Fortune 500 rankings for 2012, 2013, and 2014 from a website that pulled data from the Forbes website. We also manually pulled in information from the Fortune 500 website. Furthermore, we were able to get information on companies’ profits for the years 2012 and 2013.

Data Cleaning

Once we had our data, we needed to cross reference the companies in the Fortune 500 2012 ranking list with the list of companies that donated to legislators from the OpenSecrets data. Then we divided the 500 companies into 2 groups: companies that donated in 2012 to at least one legislator and companies that didn’t donate (or at least weren’t on the OpenSecrets list).

Our process for getting and cleaning the data was as follows:

getData.py pulls in data about the top 10 industries that contributed to each winning candidate in an election cycle and writes it out as a JSON file.

getNewData.py pulls in data about the top 10 companies that contributed to each winning candidate in the 2012 election cycle and writes it out as a JSON file.

(We had to run our script in batches of legislators because we're limited to 200 calls/API key and because there were some legislator ids that didn't match any in thecandContrib method. We had to manually rerun the script and keep track of which ids didn't exist each time this happened.)

mergeData.py merges the JSON files getData and getNewData writes out because of the issue of running the scripts in batches.

cleanData.py wrote csvs for our midterm report visualizations from the legislators.json file (which contains the data mapping top 10 industries to the winning candidates).

cleanCompanies.py wrote a csv file, clean_companies2012.csv, from the comapnies2012Final.json file, which contains the mapping of top 10 companies that contributed to each winning candidate in 2012.

getMoreRankings.py cleaned the new 501-700 rankings we manually copied and pasted from the Fortune 500 website for 2013 and 2014 so that it could be merged into our primary fortune 500 csv files.

divideFortune500.py is our main data processing script. It reads in the Fortune 500 data files (fortune_500_2012.csv, fortune_500_2013.csv, fortune_500_2014.csv, as well as the clean_companies2012.csv file. Then it performs entity resolution of the company names of all the files and statistical analysis of the changes in rankings from 2012-13. It also writes out the CSV files for our final project visualizations: donated_percentages.csv, notDonated_percentages.csv, profitVsDonations.csv, rankingsVsAmount.csv, donatedCompanies.csv, and notDonatedCompanies.csv.

Integration and Entity Resolution

Integrating the company names from the 2 data sources proved to be a significant challenge that we dealt with in our divideFortune500.py file. Many of the company names were not consistent across datasets. Some of the more annoying challenges we faced looked like this:

Walt Disney, Walt Disney Co

WilmerHale LLP, Wilmerhale Llp

The Gap, Gap

Metlife, Metlife Inc

Merck, Merck & Co

McDonald’s, McDonald’s Corp

Macy’s, Macys

Lowe’s, Lowe’s Companies

Humana, Humana Inc.

Guardian Life Ins. Co. of America, Guardian Life Insurance

H.J. Heinz, HJ Heinz Co

Estée Lauder, Estee Lauder

Goldman Sachs, Goldman Sachs Group

Coca-Cola, Coca-Cola Co, Coca-Cola Enterprises (We learned that Coca-Cola and Coca-Cola Enterprises are different companies. Accordingly, we treated them as separate entities.)

To resolve these differences, we performed the following on company names:

Lowercased everything

Removed stop words such as “the”, “co”, “company”, etc. with a list of stop words we came up with.

Removed the following punctuation: apostrophes, commas, ampersands, and periods.

Stripped the whitespace before and after the company names

Turned slashes into spaces.

There were also more subtle issues that we had to manually resolve when we found companies that were suspiciously dropping off the list from 2012 to 2013. Some companies changed their names from 2012 to 2013. For example, the company Limited Brands became L Brands. In addition to this, we removed companies that were acquired or merged as we had no way of keeping track of their performances under their new names. One company (Catalyst Health Solutions) became private and was not included on the Forbes list the next year as a result, so we also excluded that data point. Finally, one company (Aon) dropped off the list because it relocated to London–and the Forbes 500 is only comprised of companies based in the US. That data point was removed as well.

Results and Analysis

Once we had our data integrated and resolved, we analyzed it to see whether companies’ contributions were correlated with their financial growth.

To do this, we completed a T-test on the average movement in rankings from 2012-13 of Fortune 500 companies that donated to winning candidates in the 2012 election (Group 1) versus average movement in rankings of companies that didn’t donate (Group 2).

We conducted a 2-sided t-test for the difference of means between Group 1 and Group 2.

Difference between means:

M1-M2=0.4755--1.1922=1.66766

sd=66.328; se=3.0217

95% CI of difference:

-4.2547 < 1.66766 < 7.59

t-difference: 0.552

df-t: 481.3; p= 0.7094

(two sided p: 0.5812)

With a p-value of 0.5812, there is no significant difference in change in rankings from 2012-13 between companies that contributed to the 2012 election cycle and companies that did not at an alpha level of 0.05. Essentially there is no correlation between short term company performance and whether or not they contributed to winning candidates in an election. This was bad news for our hypothesis, but we feel that it’s good news overall. Perhaps we can still hold some hope for democracy, despite the fact that many companies hold an extraordinarily large amount of influence in Congress through lobbyists and donations. However, our project only takes into account money that can be tracked publicly. There is still the matter of "dark money," which refers to money given to nonprofits or PACs where the donors and amounts cannot be tracked.

Another factor contributing to why there may not be a statistically significant difference could be the fact we still have a relatively small sample size. The standard deviations for our data were extremely high. If there were 500,000 data points versus our 500 data points, there might be a more significant conclusion.

We visualized the companies’ rankings from 2012 to 2014. For the purposes of the visualization, if a company drops off the list, we show it as dropping to 501. We thought that it was cool to see that a lot more companies from the top 50 donated money than companies in the bottom 100 of the list. There are also a lot more companies that dropped off the Fortune 500 list from the bottom half of the list than the top half.

We analyzed the distribution of company rankings between companies that donated versus those that didn’t donate below. Some of the rows don’t add up to their bucket size because they had a faulty data point as explained in the Integration and Entity Resolution section above.

Ranking

# of Companies that donated

# of Companies that didn’t donate

1-10

8

1

11-20

7

3

21-30

8

2

31-40

8

2

41-50

8

2

51-100

29

21

101-500

140

261

Total

204

281

Analyzing the data further, we discovered that of the companies that donated (Group 1), 44% of them rose in rankings, 4% stayed the same, and 51% dropped in rankings. Of the companies that didn’t donate (Group 2), 47% rose in rankings, 3% stayed the same, and 49% dropped. This was even more surprising to us as we thought that a greater percentage of companies that donated would rise in rankings than companies that didn’t donate.

We found the companies that dropped and rose the most in rankings to be really interesting. In Group 1, Calpine (a natural gas and geothermal energy company) donated to 3 legislators and dropped 95 spots. The biggest winner out of Group 1 was also an energy company: Energy Transfer Equity (natural gas company in Texas), which went from 312 to 161 and donated to a single legislator in Texas.

In Group 2, the Great Atlantic Pacific Tea company dropped the most as it went bankrupt. Rock-Tenn, a packaging supply store, rose the most from 449 to 291.

Group 1

Group 2

Min:

-95 spotsCalpine364 -> 459

-383 spotsGreat Atlantic Pacific Tea317 -> bankrupt

Max:

+151 spotsEnergy Transfer Equity312 -> 161

+158 spotsRock-Tenn449 -> 291

In addition, we compared companies’ profit changes from 2012 to 2013 with the amount they contributed.

This was quite interesting as it can be seen that companies that donated lower amounts tended to lose more money, probably because companies that donated more were doing better overall. AIG and HP were 2 of the biggest losers and they both donated less than $25,000.
We were also curious whether the ranking of a company was correlated to how much they donated.

As you can see in the visualization above, almost all of the companies that donated $400,000 or more were top 100 companies, which definitely makes sense as they probably had more resources to spend. Northrup Grumman, Comcast, and Goldman Sachs were the top 3 donors overall.

Overview of Process, Moving Forward

Below is a summary of our process.

Get all Fortune 500 data for 4 years (2012 - 2014)

Get top 10 campaign contributors (companies) for each candidate for 2 election cycles (2012, 2014)

Divide Fortune 500 into two groups: (1) companies that donated to winning candidates and (2) companies that didn’t donate to winning candidates

Do a T-test on the average movement in rankings of group 1 versus average movement in rankings in group 2

Keep track of amount donated and the number of candidates donated to–these factors should impact our analysis

Further exploration might include analysis by country–do the results change in different countries, where corruption might be higher? We could also examine how donations influence politician's stances on issues–i.e. if a politician receives a contribution from a biased party, is it possible to see if that is correlated with changes in stance on certain issues?