Analysis Methodology

Statistical analyses of legislation and legislators provide context for the legislative process.
Of all of the 10,000+ bills pending at any given time, our unique analyses help GovTrack visitors know
what is relevant and what to pay attention to.

Ideology

The Ideology Analysis compares the sponsorship and cosponsorship patterns of Members of Congress to put them on a scale roughly from liberal to conservative. Read More »

Prognosis

The Prognosis Analysis looks at the factors that help or hurt a bill’s chance of getting out of committee and being enacted. It is based on a regression model. Read More »

Leadership

The Leadership Analysis looks at who is cosponsoring whose bills to see who the legislative leaders are. It’s a little like if you scratch my back will I scratch yours? The analysis is based on Google PageRank, the algorithm Google uses to order search results. Read More »

Ideology Analysis of Members of Congress

The ideology analysis assigns a liberal–conservative score to each Member of Congress based on his or her pattern of cosponsorship.

In a nutshell, Members of Congress who cosponsor similar sets of bills will get scores close together, while Members of Congress who sponsor different sets of bills will have scores far apart. Members of Congress with similar political views will tend to cosponsor the same set of bills, or bills by the same set of authors, and inversely Members of Congress with different political views will tend to cosponsor different bills.

You can find this analysis on the pages for current Members of Congress.

The charts to the right plot the ideology score on the horizontal axis and the leadership score on the vertical axis. Look at the extremes. For instance, Sen. Jim Inhofe appears as the most extreme Republican in the Senate chart and he is widely regarded as one of the most conservative senators.

Overview

The data that goes into this analysis is a list of who sponsored or cosponsored which bills. The process doesn’t look at the content of the bills or the party affiliation or anything else about the Members of Congress, but it is able to infer underlying behavioral patterns, some of which correspond to real-world concepts like left-right ideology.

You’ll see in the charts on the right that the ideology analysis does a good job at separating the Democrats from the Republicans, and within each party the moderates from the extremes. If you wanted to know how your representatives stood in relation to their peers ideologically, this chart is a good place to start.

We first began publishing this analysis in 2004, then calling it a political spectrum. A similar analysis by Professor Keith Poole using voting records rather than cosponsorship produces similar results: see voteview.com. (As far as we know, we were the first to apply this sort of analysis to cosponsorship behavior.)

Methodology

The statistical method behind this analysis is Principal Components Analysis, also known as dimensionality reduction. Principal Components Analysis is a statistical technique that reveals underlying patterns in data.

Here’s how it works: Form a matrix (a grid of numbers) with columns representing Members of Congress and rows also representing Members of Congress. Do this for the House and Senate separately. We include (co)sponsorship from the current and previous two Congresses, so between four and six years of data. For the Senate, you have a 100x100 table. In each cell of the table, put the number of times the senator for the row cosponsored a bill introduced by the senator for the column. Or if it's the same senator in the row and column, put in the number of bills he or she introduced. Then compute the singular value decomposition of the matrix (which is how Principal Components Analysis is often done).

Every square matrix has a singular value decomposition. The magic is in how you interpret it. The singular value decomposition takes one matrix and gives you back three: called u, s, and v-transpose. V-transpose can be interpreted as a set of scores for each Member of Congress on a new set of dimensions. The dimensions are ranked in order by how much of the original data they explain. We have found that the second dimension best corresponds with ideology. We use the scores from that dimension in our charts.

Each score is a number. It’s entirely arbitrary whether liberal or conservative is positive or negative — the original matrix is blind to actual information like that. In fact, there’s no guarantee that these numbers even have anything to do with liberal- and conversative-ness. All it tells us is how to separate Members of Congress into two groups, or more precisely how to spread them out along a spectrum in a way that explains their record of cosponsorship. But in practice it captures ideology very well.

(In the original version of this analysis called the political spectrum, the rows were Members of Congress and the columns were bills. That is, form a matrix with a 1 in each cell where the Member of Congress corresponding to the row sponsored or cosponsored the bill corresponding to the column. The change was made only to reuse the source code with the leadership analysis, which needs a member-member matrix.)

Data

The ideology scores can be found in two CSV files sponsorshipanalysis_h.txt and sponsorshipanalysis_s.txt (House and Senate) over here.

Source Code

Running this analysis is pretty simple in Python. It is literally two lines. Assuming you have the cosponsorship matrix in P:

Leadership Analysis of Members of Congress

A leadership score is computed for each Member of Congress by looking at how often other Members of Congress cosponsor their bills — more or less. The analysis is based on PageRank, Google’s algorithm for ranking pages on the web.

The idea behind a leadership score is that if X cosponsors Y’s bills but Y does not cosponsor X’s bills, then X is a follower relative to Y being a leader.

You can find this analysis on the pages for current Members of Congress.

The charts to the right plot the leadership score on the vertical axis and the ideology score on the horizontal axis.

There are some interesting things in this chart. There’s a distinct V-shape. Congressional leaders appear to be more extreme. There are some confounding effects to consider here. Leaders tend to be more senior members of Congress, they tend to be older, and they have had more time to participate in legislating. But somewhere among those factors there’s an interesting correlation to having an extreme political ideology.

These leadership and ideology scores give us a view into Congress that is normally hidden to us. We can’t observe leadership. We’re not there, in Congress, to see it. We’re not in the meetings where you can see relationships form. But those relationships are known to the representatives and senators. It’s obvious to them. They know whether they lead or follow. Their staff know. This is a sort of social knowledge that is locked within the institution of Congress, unless we get a little creative with how we try to observe it.

Overview

The data that goes into this analysis is a list of who sponsored or cosponsored which bills. The process doesn’t look at the content of the bills or anything else about the Members of Congress, but it is able to infer underlying behavioral patterns, some of which correspond to real-world concepts like leadership.

We first began publishing leadership scores in 2010. As far as we know, this analysis is unique to GovTrack.

Methodology

The inspiration for this analysis comes from Google’s PageRank algorithm, which governs how Google ranks the order of pages in its search results. Google’s method is widely known: the more links you get to your website from other websites, and the more links those other websites have, the higher your PageRank and the higher up in search results you appear.

Here’s how we apply it to Congress: the more Members of Congress that cosponsor Member X’s bills, and the more cosponsors those other Members of Congress have, the higher X’s leadership score.

We start by forming a matrix (a grid of numbers) with cosponsorship data. It is the same matrix as in the ideology analysis, so see the methodology section there for details. Then we run the PageRank algorithm on the matrix, which yields a new number for each Member of Congress. That is the leadership score.

This analysis came from a suggestion from Joseph Barillari (who GovTrack’s creator knew in college). (The original formulation of the score for Member of Congress X was the mean across all other Members of Congress Y of the log of the number of bills sponsored by X and cosponsored by Y divided by the number of bills sponsored by Y and cosponsored by X.)

Data

The leadership scores can be found in two CSV files sponsorshipanalysis_h.txt and sponsorshipanalysis_s.txt (House and Senate) over here.

Source Code

Here is pseudo-code in Python. Assuming you have the cosponsorship matrix in P:

References

Text Incorporation

An analysis we incorporated into GovTrack in 2016 reveals when provisions of bills are incorporated into other bills. Our new tool will reveal much more about what Congress is doing, and what laws are being made, than has ever been known to the general public.

All too often Congress cuts bills apart and pastes them back together — sometimes into an “omnibus.” The bills that finally get a vote are an amalgam of provisions from other bills that either can’t or won’t get a standalone vote themselves. The most important legislation is crafted this way.

Congress and the President may not be enacting many new laws by the numbers, but those new laws come from an intricate web of connections that the general public has not been able to see until now. This isn’t just a matter of discovery. It is a window into how Congress really works, the processes that only insiders are normally able to see.

Our text incorporation analysis finds provisions of bills that are incorporated into enacted legislation. You can trace enacted bills back to the original legislation where provisions were introduced and you can now see when bills that appear to have died have instead been incorported into other legislative vehicles.

Only about 3% of bills will be enacted through the signature of the President or a veto override. Another 1% are identical to those bills, so-called “companion bills,” which are identified by the Congressional Research Service. Our new analysis reveals almost another 3% of bills which had substantial parts incorporated into an enacted bill in 2015–2016. To miss that last 3% is to be practically 100% wrong about how many bills are being enacted by Congress.

The following information pertained to our prognosis analysis until October 2016, when we began showing predictions by Skopos Labs. You may find the description of our old analysis below informative, but it is no longer the methodology used on GovTrack.

Bill Prognosis Analysis

GovTrack computes a prognosis for each bill, which is the probability that the bill will be enacted. Our computation is based on factors that are correlated with successful or failed bills in the past, such as whether the sponsor is a committee chair.

What is the point of this?

More than 10,000 bills will be considered by each Congress. About 7% will become law. Which bills should we focus on?

Representatives and senators, their staff, and lobbyists all know what bills are important because they have the institutional knowledge of what makes a bill important. The prognosis highlights the factors that make a bill successful.

The prognosis scores can be found on the pages for bills throughout the site.

Overview

The data that goes into this analysis are factors that we compute for bills, such as whether the sponsor is a committee chair (see right for a full list), and whether the bill was successful. We “train” the model on bills from the 113th Congress (2013-2015) to compute probabilities for bills in the current Congress.

We first began publishing prognosis scores in 2012. As far as we know, we were the first to apply this analysis to Congressional bills.

Methodology

This analysis is based on a logistic regression. Logistic regression is similar to simple linear regression but it is more appropriate when modeling probabilities. We create eight separate models: For each of the four types of legislative measures (bills, joint resolutions, concurrent resolutions, and simple resolutions), we compute one model that predicts whether the bill/resolution will get out of committee and a separate model that computes, for bills/resolutions out of committee, whether the bill/resolution will be enacted or agreed to.

The independent variables are the binary factors mentioned above and listed in the factors table at the right.

The dependent variable is how successful the bill or resolution was. When predicting whether a bill or resolution will make it out of committee, it is a binary variable. When predicting whether a bill will be enacted or a resolution agreed to, this is a continuous variable computed as the percentage of paragraphs in the bill that appear in any enacted bill (and similarly for resolutions). We do this because there are often identical bills in Congress (so-called companion bills) and often bills are incorporated into other bills (such as omnibus bills), and we want to give the original bills credit for being successful even if the original bill itself is not enacted per se.

The output of the logistic regression models are weights assigned to the factors, called β in the table at the right. The prognosis score for a bill is computed by multiplying all of the factors together that apply to the bill (more or less, see logistic regression on Wikipedia for details). The result is a number that can be interpreted as a probability.

In choosing the factors for model, we select from a large set of plausible factors those which appear to be statistically significant on their own (using a binomial distribution). After the logistic regression, we remove factors that appear statistically non-significant and re-compute the model.

Results

The following tables show how various factors help or hurt a bill or resolution’s chance of making it out of committee and getting enacted (or agreed to). Two tables are given for each of the four bill types.

In the tables, N is the number of bills/resolutions that had the indicated factor in the training corpus; %S is of bills with this factor, the percent that were successful (past committee or enacted); and β is the regression coefficient (weight) from the prognosis analysis. Higher weights increase the bill or resolution’s probability of success.

Bills sent out of committee to the floor

Overall, about 15% of the 8,905 bills in 2013-2015 were sent out of committee to the floor. The following factors help or hurt that:

N

%S

β

Factor

67

72%

2.5

Title starts with "To designate the facility of the United States Postal".

27

48%

1.9

Title starts with "A bill to designate the".

534

55%

1.8

Sponsor is a relevant committee chair.

286

60%

1.6

Got past committee in a previous Congress.

30

53%

1.6

Referred to Senate Appropriations (incl. companion).

799

46%

1.4

A cosponsor is a relevant committee chair.

70

49%

1.2

Referred to Senate Indian Affairs (incl. companion).

158

22%

1.0

Referred to House Appropriations (incl. companion).

802

34%

0.9

Referred to House Natural Resources (incl. companion).

151

23%

0.9

On a companion bill: A cosponsor is a relevant committee chair.

412

34%

0.7

Referred to Senate Energy and Natural Resources (incl. companion).

999

28%

0.7

A cosponsor is a relevant committee ranking member.

439

23%

0.6

Has a companion bill sponsored by a member of the other party.

2,312

20%

0.6

Has cosponsors from both parties.

725

28%

0.5

Sponsor is in majority party and 1/3rd+ of cosponsors are in minority party.

2,497

26%

0.5

Sponsor is on a relevant committee & in majority party.

1,650

24%

0.3

Cosponsor has high leadership score (majority party).

3,335

20%

-0.2

2 or more cosponsors are on a relevant committee.

345

11%

-0.4

Introduced in the last 90 days of the Congress (incl. companion bills).

Simple resolutions agreed to

Overall, about 96% of the 634 simple resolutions that got past committee in 2013-2015 were agreed to. The following factors help or hurt that:

N

%S

β

Factor

54

82%

-1.9

Sponsor is a relevant committee chair.

96

84%

-2.3

2 or more cosponsors are on a relevant committee.

Joint resolutions sent out of committee to the floor

Overall, about 19% of the 178 joint resolutions in 2013-2015 were sent out of committee to the floor. The following factors help or hurt that:

N

%S

β

Factor

21

57%

4.6

Sponsor is a relevant committee chair.

38

50%

3.8

Sponsor is on a relevant committee & in majority party.

57

2%

-4.2

Introduced in the first 90 days of the Congress (incl. companion bills).

21

0%

-37.9

A cosponsor is a relevant committee ranking member.

56

0%

-40.3

Title starts with "Proposing an amendment to the Constitution of the United".

Concurrent resolutions sent out of committee to the floor

Overall, about 39% of the 169 concurrent resolutions in 2013-2015 were sent out of committee to the floor. The following factors help or hurt that:

N

%S

β

Factor

15

93%

3.1

Got past committee in a previous Congress.

65

18%

-1.7

Sponsor is a member of the minority party.

18

11%

-2.0

Has a companion bill in the other chamber.

17

0%

-35.6

Referred to House Oversight and Government Reform (incl. companion).

25

0%

-36.6

Title starts with "Expressing the sense of Congress that".

Concurrent resolutions agreed to

Overall, about 83% of the 66 concurrent resolutions that got past committee in 2013-2015 were agreed to. The following factors help or hurt that:

There were no statistically significant factors in the model.

Joint resolutions enacted or passed

Overall, about 42% of the 33 joint resolutions that got past committee in 2013-2015 were enacted or passed. The following factors help or hurt that:

There were no statistically significant factors in the model.

Did it work? The following charts compare the prognoses computed for bills to their actual
rate of success. The prognosis model for these charts was trained on the 112th Congress and tested on the 113th Congress.

For each regression model, the bills are divided into 10 bins by prognosis. The median prognosis is plotted on the horizontal axis and the percentage of successful bills in the bin is plotted on the vertical axis.

The prognosis closely estimates the actual chances of a bill getting out of committee. Though the accuracy is much less for other predictions, the
rough upward slope in most of the charts shows that the prognosis was often predictive of a bill’s future.

Bills sent out of committee to the floor

Simple resolutions sent out of committee to the floor

Bills enacted

Simple resolutions agreed to

Joint resolutions sent out of committee to the floor

Concurrent resolutions sent out of committee to the floor

Concurrent resolutions agreed to

Joint resolutions enacted or passed

Here are some additional charts for machine learning researchers.

The charts below show precision vs. recall plotted parametrically for various values of
a success-fail threshold t. Bills with prognosis above t are predicted
successes for the purposes of these charts. The prognosis model for these charts was trained on the 112th Congress and tested on the 113th Congress.

Join GovTrack’s Advisory Community

We’re looking to learn more about who uses GovTrack and what features you find helpful or think could be improved. If you can, please take a few minutes to help us improve GovTrack for users like you.

Start by telling us more about yourself:

I’m a lobbyist, advocate, or other professional.
I’m a young person (younger than 26 years old).
I’m a member of a minority or disadvantaged group.
I’m a teacher, librarian, or other educator.
Other

We hope to make GovTrack more useful to policy professionals like you. Please sign up for our advisory group to be a part of making GovTrack a better tool for what you do.

Young Americans have historically been the least involved in politics, despite the huge consequences policies can have on them. By joining our advisory group, you can help us make GovTrack more useful and engaging to young voters like you.

Our mission is to empower every American with the tools to understand and impact Congress. We hope that with your input we can make GovTrack more accessible to minority and disadvantaged communities who we may currently struggle to reach. Please join our advisory group to let us know what more we can do.

We love educating Americans about how their government works too! Please help us make GovTrack better address the needs of educators by joining our advisory group.

Would you like to join our advisory group to work with us on the future of GovTrack?

Email address where we can reach you:

Thank you for joining the GovTrack Advisory Community! We’ll be in touch.

There’s never been a better time for civic engagement.

You’ve cast your vote. Now what? Join 10 million other Americans using GovTrack to learn about and contact your representative and senators and track what Congress is doing each day.

And starting in 2019 we’ll be tracking Congress’s oversight investigations of the executive branch.

You’re more than a vote, so support GovTrack today with a tip of any amount: