After more than 3/4 of a mil­lion guesses, in over 50,000 games played in 67 coun­tries, the results are clear: Sci­ence sounds like gobbledygook.

arXiv vs. snarXiv has been live for 6 months now, and it’s time to take a look at the results. Here’s how the game works. The user sees two titles: one is the title of an actual the­o­ret­i­cal high energy physics paper on the arXiv, and the other is a com­pletely fake title ran­domly gen­er­ated by the snarXiv. The user guesses which one is real, finds out if they’re right or wrong, and then starts over with a new pair of titles.

I’ve been record­ing the result of each guess, orig­i­nally just out of curios­ity. I never expected to get rea­son­able sta­tis­tics on the over 120,000 high energy the­ory papers on the arXiv. But after more than 750,000 guesses, that’s exactly what I’ve got, which means we can do some fun stuff.

The Most Fake-Sounding Papers

First, let’s take a look at the most fake-sounding papers on the arXiv. These are the papers whose titles get the low­est per­cent­age of cor­rect guesses when users try to dis­tin­guish them from a ran­domly gen­er­ated title. I designed arXiv vs. snarXiv to cycle such papers through the game more often, gen­er­at­ing bet­ter sta­tis­tics for them. Here are the 15 most fake-sounding papers with at least 30 guesses.

A tip for future arXiv vs. snarXiv play­ers: the snarXiv is more gram­mat­i­cal than the arXiv. If you see “Het­erotic on Half-flat” and think “uh… Half-flat what?” then you can be nearly cer­tain it’s a real sci­en­tific paper that was writ­ten to advance the bound­aries of human knowledge.

As a bonus, here are some up-and-comers: papers with between 10 and 30 guesses, a spec­tac­u­larly low per­cent­age of which were correct.

Stump the Experts

Peo­ple with all sorts of back­grounds play arXiv vs. snarXiv. My guess is that non-physicists are extremely sus­pi­cious of ridiculous-sounding words that have been co-opted for tech­ni­cal pur­poses — “‘Mirage Medi­a­tion?’ that can’t be real!” High energy physi­cists, on the other hand, are used to their own unfor­tu­nate ver­biage. Unfor­tu­nately for them, how­ever, there are still plenty of papers on the arXiv that sound like they were writ­ten by a computer.

Let’s define an “expert” game to have least 5 guesses and a score of 80% or higher. So far, there have been 3,916 expert games out of 49,258 total (as of Sep­tem­ber 16, 2010). Tal­ly­ing up all the guesses in these games, we can get a sense for which papers stymie even those who excel in arXiv vs. snarXiv. Here are the top 10.

I have to say, some of these def­i­nitely sound like they came straight from the snarXiv. I really don’t know what Glasma is, though. The snarXiv does not have Glasma.

Famous Physi­cists

Ok, so the papers above sound espe­cially ridicu­lous. How­ever, the aver­age across all 750,000 guesses on all papers is still only 59% cor­rect. While bet­ter than a mon­key, this is not par­tic­u­larly good. Who’s respon­si­ble? Surely not the world’s top minds?

Here’s a rank­ing of some of the most-highly cited physi­cists on the arXiv (H-index of 40 or higher, with a few other notable folks thrown in), accord­ing to the per­cent­age of cor­rect guesses on their papers.[1] A smaller per­cent­age means that their papers sound more like com­plete flap­doo­dle. I should note for the sake of my career that this has absolutely noth­ing to do with the qual­ity of said papers.

I’d espe­cially like to con­grat­u­late Fred­erik on his anom­alously low 49 per­cent. You make us all worse than a mon­key, Fred­erik.[2]

The Blo­gos­phere

Now let’s turn to even more famous peo­ple: physics blog­gers (and authors). Here are some of the most promi­nent, ranked from most fake-sounding papers (small­est per­cent­age) to least fake-sounding papers (largest per­cent­age). I think there’s a les­son here some­where, though it’s hard to be sure in some cases, due to small statistics.

Fake-Sounding and Real-Sounding Words

Sup­pose you’re writ­ing a sci­en­tific paper, and you want to ensure that the gen­eral pub­lic doesn’t think it’s com­plete malarkey. How do you do it? Here are the 10 words with the low­est per­cent­age of cor­rect guesses (most fake-sounding) for titles con­tain­ing those words (to ensure no sin­gle paper dom­i­nates this per­cent­age, I’m requir­ing that each word appear in at least 5 titles).

174/521

33%

Sat­urn

76/195

38%

mul­ti­skyrmions

66/196

33%

half-flat

100/252

39%

secret

69/189

36%

charg­ing

99/249

39%

per­turb­ing

54/147

36%

caus­tic

78/194

40%

pol­lu­tion

80/208

38%

high­lights

87/214

40%

enough

Avoid these words! Turns out peo­ple don’t believe in “mul­ti­skyrmions.” Also, you shouldn’t men­tion “Sat­urn,” or use nor­mal eng­lish words like “secret” or “enough.” By con­trast, here’s a list of the 10 words with the high­est per­cent­age of cor­rect guesses (most realistic-sounding) for titles con­tain­ing those words.

76/100

76%

cp-even

87/122

71%

spin-spin

75/101

74%

Argon

90/127

70%

anomaly-free

140/191

73%

two-particle

127/180

70%

atlas

74/102

72%

self-coupling

70/100

70%

supersymmetry-breaking

84/117

71%

unusual

128/183

69%

naked

In other words, if you want to be taken seri­ously as a sci­en­tist, you should call your next paper Unusual Naked, but Anomaly-Free.

Inci­dence of Appar­ent Hooey in Var­i­ous Subfields

Papers on the arXiv can be asso­ci­ated with one or more physics sub­fields. Here’s a rank­ing of sub­fields with at least 50 guesses from most fake-sounding to least fake-sounding.

39/81

48%

Adap­ta­tion and Self-Organizing Systems

134/226

59%

Com­bi­na­torics

48/98

48%

Pop­u­lar Physics

638/1075

59%

Other

172/340

50%

Data Analy­sis, Sta­tis­tics and Probability

116/195

59%

Accel­er­a­tor Physics

212/414

51%

His­tory of Physics

140/235

59%

Soft Con­densed Matter

167/307

54%

Oper­a­tor Algebras

1898/3172

59%

Dif­fer­en­tial Geometry

136/248

54%

Rings and Algebras

114/190

60%

Group The­ory

101/184

54%

Dis­or­dered Sys­tems and Neural Networks

195/324

60%

Fluid Dynam­ics

282/512

55%

Pat­tern For­ma­tion and Solitons

216/358

60%

Func­tional Analysis

128/227

56%

Clas­si­cal Analy­sis and ODEs

94/155

60%

Dynam­i­cal Systems

295/523

56%

Rep­re­sen­ta­tion Theory

96/158

60%

Alge­braic Topology

39/69

56%

Prob­a­bil­ity

500/820

60%

Atomic Physics

67/118

56%

Geo­physics

180/295

61%

Geo­met­ric Topology

138/242

57%

Num­ber Theory

206/337

61%

Sym­plec­tic Geometry

757/1327

57%

Strongly Cor­re­lated Electrons

41/67

61%

Sym­bolic Computation

4724/8190

57%

Quan­tum Algebra

103/168

61%

Clas­si­cal Physics

2696/4666

57%

Exactly Solv­able and Inte­grable Systems

54/88

61%

Com­plex Variables

766/1317

58%

Super­con­duc­tiv­ity

758/1229

61%

Meso­scopic Sys­tems and Quan­tum Hall Effect

88/151

58%

Com­pu­ta­tional Physics

50/81

61%

Mate­ri­als Science

1763/3011

58%

Alge­braic Geometry

50/80

62%

Cat­e­gory Theory

8306/14161

58%

Math­e­mat­i­cal Physics

76/120

63%

K-Theory and Homology

2072/3511

59%

Sta­tis­ti­cal Mechanics

59/93

63%

Instru­men­ta­tion and Detectors

279/472

59%

Chaotic Dynam­ics

55/86

63%

Spec­tral Theory

58/98

59%

Analy­sis of PDEs

82/121

67%

Optics

99/167

59%

Plasma Physics

Per­for­mance by Country

These last few sta­tis­tics have less to do with the arXiv, and more to do with arXiv vs. snarXiv itself. I have loca­tion data for the most recent quarter-million guesses.[3] So let’s look at how per­for­mance varies across the the globe. Here’s a rank­ing of cor­rect guesses from coun­tries with at least 2000 total guesses.[4]

1400/2256

62%

Aus­tria

2393/4189

57%

Japan

10665/17467

61%

Ger­many

89632/158059

56%

United States

3111/5183

60%

Israel

12137/21471

56%

United King­dom

1658/2825

58%

Spain

7233/13053

55%

Canada

4134/7080

58%

Italy

2281/4183

54%

India

5804/9960

58%

France

1652/3037

54%

Fin­land

2355/4071

57%

Switzer­land

2270/4266

53%

Russ­ian Federation

3485/6083

57%

Aus­tralia

3496/6593

53%

Nether­lands

1690/2958

57%

Swe­den

1061/2001

53%

Argentina

It looks like hav­ing Eng­lish as a first lan­guage is not par­tic­u­larly helpful.

Per­for­mance by School

Finally, uni­ver­si­ties account for about 1/8th of the total num­ber of guesses on arXiv vs. snarXiv. Alto­gether, their per­for­mance is almost exactly aver­age (59%). How­ever, there are vari­a­tions… Here’s a rank­ing of schools with at least 400 total guesses.

1145/1388

82%

Uni­ver­sity of Col­orado at Boulder

560/963

58%

The Uni­ver­sity of Chicago

553/785

70%

Uni­ver­sity of Regensburg

278/481

57%

UC Santa Barbara

317/481

65%

Uni­ver­sity of Washington

264/461

57%

Madi­son

718/1097

65%

Penn State

542/967

56%

Uni­ver­sity of Cambridge

1001/1538

65%

Berke­ley

237/426

55%

Cor­nell University

549/849

64%

Prince­ton University

476/861

55%

UC Davis

1277/1981

64%

MIT

471/855

55%

Colum­bia University

444/691

64%

Impe­r­ial Col­lege London

859/1577

54%

Cal­i­for­nia Insti­tute of Technology

349/544

64%

Monash Uni­ver­sity

393/723

54%

Har­vard University

376/597

62%

Uni­ver­sity of Illi­nois at Urbana-Champaign

363/671

54%

Stan­ford University

287/457

62%

Hebrew Uni­ver­sity of Jerusalem

219/423

51%

Yale Uni­ver­sity

284/461

61%

The Uni­ver­sity of Edinburgh

308/599

51%

Uni­ver­sity of Minnesota

261/435

60%

Boston Uni­ver­sity

281/551

50%

Uni­ver­sity of Warwick

Con­grat­u­la­tions to the Uni­ver­sity of Col­orado at Boul­der, which is the clear win­ner here.[5] Also, I just wanted to say: seri­ously Har­vard? Seriously?

Dis­claimer

Finally, before head­ing into the com­ments, let me do a crapload of dis­claim­ing. This is obvi­ously the least sci­en­tific sur­vey of sci­ence ever con­ducted. The rank­ing of a paper as “fake-sounding” or “realistic-sounding” has as much to do with the pecu­liar­i­ties of the snarXiv as with the arXiv itself.[6] Also, although 750,000 guesses is a lot in total — such that I’m fairly cer­tain that the 59% over­all aver­age isn’t going any­where — the sta­tis­tics get dicey when chopped into small bits (see what I did there?). To be sure of any­thing, I guess we’ll just have to wait until the blo­gos­phere writes more papers.

I couldn’t find an h-index rank­ing of physi­cists that was more recent than this one from 2005. I’m prob­a­bly miss­ing lots of names. Let me know and I’ll add them. [↩]

In addi­tion to being a top-notch physi­cist, Fred­erik also hap­pens to be one of the world’s best arXiv vs. snarXiv play­ers. [↩]

When I told my software-startup friend about arXiv vs. snarXiv and men­tioned that I wasn’t log­ging ip addresses, he looked at me very seri­ously and said: “You’re not log­ging ip addresses? You should always log ip addresses.” [↩]

The high-scores leader “Ed” is from Por­tu­gal, which would have done very well in the rank­ings had I included his guesses. Unfor­tu­nately, “Ed” was cheat­ing (prob­a­bly already obvi­ous to every­one, though I can also prove it with cer­ti­tude), so I’ve removed all his guesses from this analy­sis. [↩]

Or rather, con­grat­u­la­tions to the two dudes at the Uni­ver­sity of Col­orado at Boul­der who together played over 118 games with a total of 1215 guesses and an aver­age score of 85%. [↩]

I already got slammed for this on marginalrevolution.com, and I’ll surely get slammed again. [↩]

Excel­lent study. One thing as an exper­i­men­tal­ist: Why don’t you add some sta­tis­ti­cal error to the indi­vid­ual scores on arxiv vs. snarxiv? While I was play­ing it, I felt my score was slowly con­verg­ing on to a par­tic­u­lar value and thus I felt it would be nice to see the uncer­tainty. I am assum­ing that my skill is indeed mea­sur­able. The best esti­ma­tor would be indeed by n_correct/n_total. At the very least we can assume bino­mial errors on this quan­tity. So when the game reports 3/4=75%, it could instead say 75+-21%.

Also given that you have the “stump the experts” sec­tion, you must be keep­ing data that allows extract­ing more detailed sta­tis­tics of the play­ers? Would you mind pub­lish­ing his­tograms of how many guesses per game are played? Dis­tri­b­u­tion of the results? (ie. do the scores dis­trib­ute like a Gauss­ian? what is the rms? etc.)

Any­way, cool stuff. I am upset that I heard about the whole thing so late…

You might want to check the dates on the sub­mis­sions from Boul­der, I sus­pect you’ll find that they occurred dur­ing TASI 2010. There was a lit­tle bit of a con­test going on to see who could get the longest streak of con­sec­u­tive cor­rect guesses.

The first 10 attempts I answered wrong, but after a while you start to see a pat­tern and improve. Maybe you can only show the score after every N answers, and not show right/wrong to pre­vent learn­ing the pat­tern of fake paper names.

At least for the ones I received, it seemed the real arti­cle titles tended to be more declar­a­tive of exper­i­men­tal results and spe­cific quan­ti­ties than the­o­ret­i­cal dis­cus­sion. I also leaned more towards sim­pler titles with less jar­gon or jargon-y words (not that I’d have any chance of dis­tin­guish­ing what are and aren’t real terms). While the sam­pling isn’t ran­dom accord­ing to the stats page, it makes intu­itive sense to me that the fake sub­mis­sions would strongly tend towards long, jargon-filled titles. Not that physics papers are known for sim­ple, con­cise titles, but I’d also think that the real titles would be at least slightly biased towards sim­plic­ity and concision.

This line of thought makes me won­der if some famil­iar­ity with the sub­ject mat­ter (but below actual exper­tise) might actu­ally hurt someone’s per­for­mance as they might focus more on the titles’ sub­ject mat­ter rather than fac­tors that might be bet­ter indica­tive of if an indi­vid­ual title is real or not.

It’s prob­a­bly just cog­ni­tive bias, but I think I might be able to at least keep up with the actual physi­cists (or at least beat that damn mon­key) as a layper­son over a larger num­ber of trials.

This is hilar­i­ous! After my first 30 or so tries I was a “Physics Grad” but soon dropped off to dumber than a mon­key :-) … I wanted to see if there was a pat­tern and then just picked ran­domly. Great fun! I’m a retired sci­ence librar­ian with BA in mam­malian zool­ogy but loved physics once I was work­ing in a sci/tech library. I am happy to see I still have some tal­ent at pick­ing out snark. Must be thanks to the astro­physi­cists I worked for at the end of my career :-)