These Are The Phrases That Sanders And Clinton Repeat Most

Ahead of the most recent GOP presidential debate, I created a drinking game (OK, linguistic analysis) using a metric called tf-idf that identified each candidate’s most oft-repeated lines. I hope I didn’t cause any hospitalizations — John Kasich delivered a whopping 29 drink lines at the March 10 debate in Miami, many directed at his constituents in the great state of Ohio. The barrage may have helped his campaign efforts in his home state — strategists, take note — and now Kasich is trying to expand his appeal beyond states where he is a popular two-term governor.

And while it’s hard to match the theatricality of Donald Trump firing contestants on “America’s Next Top Republican,” the sober policy discussions on the Democratic side seem hardly less scripted. We have a bit less data to work with, since the Democratic National Committee green-lit fewer episodes than the Republican National Committee, but the eight Democratic debates to date give us a fairly good sense of what’s scrawled on these candidates’ whiteboards and index cards.

The differences here are stark, and they line up well with the dominant narrative of this race. Clinton is the candidate of action, stressing strong verbs and first-person pronouns; Sanders is the protest candidate, listing problems that he thinks need more attention. Read from top to bottom, the former secretary of state’s talking points resemble a cover letter put through a blender; the Vermont senator’s list reads like a socialist stream of consciousness of American problems with Scandinavian solutions.

For all the talk of Clinton’s stilted performances and Sanders’s from-the-gut delivery, the latter is the more repetitive candidate by far. I expected to see “millionaires and billionaires” in Sanders’s top 20 but had to scroll through dozens of higher-scoring phrases like “corporate America” and “Goldman Sachs” before finding it at No. 72.

As I mentioned in my previous article, tf-idf is a relative measure, so a score of 25 means only “higher than 20.” Longer and more repeated phrases score higher, and phrases that have also been used by other candidates score lower.

Of all the phrases by the Democratic candidates that scored at least 10,1 Sanders is responsible for 67.2 percent.

Which is to say that if Sanders had a delegate for every time he said “income and wealth” in a debate, he’d still be losing by a nearly insurmountable margin.

Footnotes

I used 20 as the cutoff for the Republicans; the discrepancy is a result of the smaller corpus size here. In both cases, I used a cutoff that was about as low as you can go without including long phrases that were said only once — yes, unrepeated phrases get scored too.

Milo Beckman is a freelance writer for FiveThirtyEight. His work can be found at milobeckman.com. He also constructs crossword puzzles for The New York Times. @milobela