On “Geek” Versus “Nerd”

To many people, “geek” and “nerd” are synonyms, but in fact they are a little different. Consider the phrase “sports geek” — an occasional substitute for “jock” and perhaps the arch-rival of a “nerd” in high-school folklore. If “geek” and “nerd” are synonyms, then “sports geek” might be an oxymoron. (Furthermore, “sports nerd” either doesn’t compute or means something else.)

In my mind, “geek” and “nerd” are related, but capture different dimensions of an intense dedication to a subject:

geek – An enthusiast of a particular topic or field. Geeks are “collection” oriented, gathering facts and mementos related to their subject of interest. They are obsessed with the newest, coolest, trendiest things that their subject has to offer.

nerd – A studious intellectual, although again of a particular topic or field. Nerds are “achievement” oriented, and focus their efforts on acquiring knowledge and skill over trivia and memorabilia.

Both are dedicated to their subjects, and sometimes socially awkward. The distinction is that geeks are fans of their subjects, and nerds are practitioners of them. A computer geek might read Wired and tap the Silicon Valley rumor-mill for leads on the next hot-new-thing, while a computer nerd might read CLRS and keep an eye out for clever new ways of applying Dijkstra’s algorithm. Note that, while not synonyms, they are not necessarily distinct either: many geeks are also nerds (and vice versa).

An Experiment

Do I have any evidence for this contrast? (By the way, this viewpoint dates back to a grad-school conversation with fellow geek/nerd Bryan Barnes, now a physicist at NIST.) The Wiktionary entries for “geek” and “nerd” lend some credence to my position, but I’d like something a bit more empirical…

“You shall know a word by the company it keeps” ~ J.R. Firth (1957)

To characterize the similarities and differences between “geek” and “nerd,” maybe we can find the other words that tend to keep them company, and see if these linguistic companions support my point of view?

Data and Method

(Note: If you’re neither a geek nor a nerd, don’t be scared by the math. It’s not too bad… or you can probably just skip to the “Results” subsection below…)

I analyzed two sources of Twitter data, since it’s readily available and pretty geeky/nerdy to boot. This includes a background corpus of 2.6 million tweets via the streaming API from between December 6, 2012, and January 3, 2013. I also sampled tweets via the search API matching the query terms “geek” and “nerd” during the same time period (38.8k and 30.6k total, respectively). Yes, yes, yes… I collected all the data six months ago but just now got around to crunching the numbers. It’s been a busy year!

A great little statistic for measuring how much company two words tend to keep is pointwise mutual information (PMI). It’s commonly used in the information retrieval literature to measure the cooccurrence of words and phrases in text, and it also turns out to be a good predictor of how humans evaluate semantic word similarity (Recchia & Jones, 2009) and topic model quality (Newman & al., 2010).

For two words w and v, the PMI is given by:

,

where in this case is the probability of the word(s) in question appearing in a random tweet, as estimated from the data. For instance, if we let v = “geek,” we compute the log-probability of a word w in the “geek” search corpus, and subtract the log-probability of w in the background corpus.

Results

The PMI statistic measures a kind of correlation: a positive PMI score for two words means they “keep great company,” a negative score means they tend to keep their distance, and a score close to zero means they bump into each other more or less at random.

With that in mind, here is a scatterplot of various words according to their PMI scores for both “geek” and “nerd” on different axes (ignoring words with negative PMI, and treating #hashtags as distinct):

Many people have asked for a high-res PDF of this plot, so here you go.

Moving up the vertical axis, words become more geeky (“#music” → “#gadget” → “#cosplay”), and moving left to right they become more nerdy (“education” → “grammar” → “neuroscience”). Words along the diagonal are similarly geeky and nerdy, including social (“#awkward”, “weirdo”), mainstream tech (“#computers”, “#microsoft”), and sci-fi/fantasy terms (“doctorwho,” “#thehobbit”). Words in the lower-left (“chores,” “vegetables,” “boobies”) aren’t really associated with either, while those in the upper-right (“#avengers”, “#gamer”, “#glasses”) are strongly tied to both. Orange words are more geeky than nerdy, and blue words are the opposite. Some observations:

Collections are geeky. All derivatives of the word “collect” (“collection,” “collectables”, etc.) are orange. As are “boxset” and “#original,” which imply a taste for completeness and authenticity.

The science & technology words differ. General terms (“#computers,” “#bigdata”) are on the diagonal — similarly geeky and nerdy. As you splay up toward more geeky, though, you see products, startups, brands, and more cultish technologies (“#apple”, “#linux”). As you splay down toward more nerdy you see more methodologies (“calculus”).

#Hashtags are geeky. OK, sure, hashtags are all over the place. But they do tend toward the upper-left. And since hashtags are “#trendy,” I take it to mean that geeks are into trends. (I take this one back. The average PMI score for all hashtags is 0.74 with “geek” but 0.73 with “nerd.” The difference isn’t statistically significant using a paired t-test or Wilcoxon test, or practically significant using a common-sense test.)

Hobbies: compare the more geeky pastimes (“#toys,” “#manga”) with the more nerdy ones (“chess,” “sudoku”).

Brains: the word “intelligence” may be geeky, but “education,” “intellectual,” and “#smartypants” are nerdy.

Reading: “#books” are nerdy, but “ebooks” and “ibooks” are geeky.

Pop culture vs. high culture: “#shiny” and “#trendy” are super-geeky, but (curiously) “cellist” is the nerdiest…

The list goes on. If you want to poke around yourself, download the raw PMI scores (4.2mb) and let me know in the comments what you find. Since many people have asked: I computed PMI for all words appearing in the search tweets with “geek” and “nerd” (millions) and then manually scanned roughly 7,500 words with positive PMI scores for both. The scatterplot contains about 300 words that I hand-picked because they made sense.

(Update: I learned that Olivia Culpo — a self-described “cellist nerd” — was crowned Miss Universe on December 20, 2012. The event was heavily tweeted smack in the middle of my data collection, so that probably explains the correlation between “cellist” and “nerd” here. It also underscores the limitations of time-sensitive data.)

Conclusion

In broad strokes, it seems to me that geeky words are more about stuff (e.g., “#stuff”), while nerdy words are more about ideas (e.g., “hypothesis”). Geeks are fans, and fans collect stuff; nerds are practitioners, and practitioners play with ideas. Of course, geeks can collect ideas and nerds play with stuff, too. Plus, they aren’t two distinct personalities as much as different aspects of personality. Generally, the data seem to affirm my thinking.

I wonder how similar the results would be if you applied this method to the Google Books Ngrams corpus, or something more general instead of a niche media like Twitter. I also wonder what other questions might be answered with this kind of analysis (for example, my wife and I have a perennial disagreement over which word is wetter: “moist” vs. “damp.”).

Finally, when I mentioned to a friend that I was going to write up this post, she said “Well, I guess we know which one you are.” But do we really? I may be a science nerd, but I’m probably a music geek…

Update (June 25, 2013): Woah. This has gotten more attention than I ever anticipated. A few impressions. (1) Prior to writing this, I had no idea there was a “geek vs. nerd” holy war in certain corners of the Internet; fueling these flamewars was certainly not my intent. Lighten up! (2) I fear I’ll be better known for this diversion than for any of my “real” research. To be clear: this was a fun way to kill a few hours on a Saturday afternoon, not necessarily my best science. I think the writeup here is sound and self-evident, but I’m the first to acknowledge that there are better corpora, methods, and analysis techniques — which could use a grant, grad student, and/or more than an afternoon — for uncovering this all-important “Truth.” (3) For those interested in the etymologies of “geek” and “nerd,” I found this cool writeup.

True, this is not generalizable to the entire population since there is a selection bias going on. But on another level, people who aren’t geeks/nerds probably don’t care about the difference to begin with!

That would not be as useful as the scatter plot is. It shows trends upward into geeky and forward into nerdy. But it’s impossible in this case to definitively say one word is geek only and other is nerd only, etc. Even the demarcation that appears in this chart due to the color choices is not entirely accurate. It would be better served with a gradient from top left to lower right where in the middle along the slope the words are most likely to be associated with either geek or nerd.

The real problem starts with your definitions of geek and nerd. They both use inaccurate and overlapping words which don’t uniquely define one or the other. Perhaps you should try to redefine geek and nerd with precision and not make dubious statistical associations without a one to one correspondence. I have redefined about 7000 words in the English language using logical definitions with clear one to one correspondences. I have eliminated synonyms, antonyms, and multiple meanings and the words do not need ballpark statistical associations to give them meaning. The book is called SCIENTIFIC THESAURUS and it currently has no definition of nerd or geek. If you come up with a logical accurate definition I will add it to my list of logical words and give you credit for it. Best wishes. Uldis

Because I absolutely loved this blog post, I did a bit more research on the differences between the two words & used it as the basis for a video version of the discussion on BBC’s headsqueeze over on youtube. Thought you may like to see it:

To see how the PMI relates to relative probabilities suppose we choose the word “genetics” which has PMI(nerd)=5.4 and PMI(geek)=4 (I am reading off the graph here to the left of the word). Then the relative probability that a tweet which contains “genetics” comes from a nerd tweet to a geek tweet or P(nerd)/P(geek)=exp(PMI(nerd)-PMI(geek)) where exp() is the exponential function (which all nerds will know and love). So for “genetics” P(nerd)/P(geek)=exp(1.4)=4 so it is 4 times more likely to come from a nerd tweet as a geek tweet (also assuming there are equal numbers of nerd and geek tweets). If you want to do these sums and don’t have a calculator handy then type “exp(1.4)” into Google and it will give you the answer. Nerds of course know this already.

So basically for every PMI interval of 1 you move up or across you are increasing the relative probability, on that axis, by a factor of exp(1)=2.7.

If you understand all that then you are probably a nerd – if you are a geek and don’t understand that I suggest you print it out, frame it and add it to your collection.

In my corner of the world (Santa Cruz, CA), a geek is a technophile with an extreme interest in scifi and/or fantasy and science who is fairly cool and hip. A nerd may have similar interests but isn’t cool or hip. A dork is someone who is not into these interests and is also not cool or hip.

I’m trying to recreate your experiment for two other words (apple and windows), I have the background corpus. So the raw information, I’m trying to figure out how you got to the PMI….
I have a JSON of about 500,000 tweets, now how do I do the search for my word (apple) to get a PMI for that word and another word?
Did you use regex script?

OK, so I just read through all the comments, you used your own software to compute the PMI. Is there any way you’d consider sending me your tool so I can compute the PMI for other words?
Of course, I’ll give you credit.

I’m not quite sure where the Python scripts are that I used to do this, after some hard drive reorganization. But I’ll have a look and throw them up on Github if I find them.

At any rate, I computed the and terms separately by processing the “search” and background copora (respectively) with different scripts. These converted JSON files from the Twitter API to a tab-delimited “word & log-prob” format, one word per line. Then I ran another script to glue these together into the PMI values in the spreadsheet linked to above.

A simple script in Python using defaultdict objects as counters should work fine.

as a word nerd, I’ve always felt that “geek” referred to a “nerd” whose subject(s) is(are) either pop culture or new technology related. in essence: all geeks are nerds, but not all nerds are geeks. this clears up the tendency to use “tech” and “geek” interchangeably. it also clears up the fact that I am not a practitioner in a scientific field, but consider myself to be “nerdy”, not “geeky”, about several scientific areas of study. I can’t claim to be “geeky” about anything. it just feels wrong–even when applied to contemporary poetry or politics (both of which I follow, am a practitioner of, and have collections related to).

I believe that while geek and nerd have negative connotations neither of them are bad things. Most of the time these are both just used as insults that are used to someone that tends to do well in certain fields. I do not believe it is important to know the difference between these two words because both of them should be seen as compliments, saying that you excel in that field.

Inre your follow-up indicating stoking the Geeks vs. Nerds flame wars, I’ve never understood this whole XBox VS. Playstation, Northern California VS. SoCal, Patriots VS. Giants thing. I grok it intellectually and understand that it has to do with ego and projection identification (if your self-image sucks, there’s an easy solution: just identify with a team or device or region or movement that already has a huge following, and WHAM, instant ego bolster!), but it always seems so arbitrarily, needlessly limiting and wasteful. With the exception of “cellist” (although I would have been a violinist if I had the talent), I identify with/am represented by all of these words. Being non-exclusionary when it comes to a home console means I can have twice (oh, wait, there’s Nintendo, too – thrice) the fun of the warring, raging, flaming fanboys around me. Can’t we all just geek/nerd along?

I find it interesting that your data shows people more likely to either self-identify or not feel bad when identified as a geek rather than a nerd. I’m the opposite – I do consider myself a nerd but am more ambivalent on the possibility of being a geek, and being called a geek rubs a little wrong in a way that being called a nerd doesn’t. It seems that in my main social circles – Washington State and my corner of Tumblr – agree that “nerds” are better than “geeks”.

That said, I’ve heard some explain “nerd” as primarily fandom-oriented and “geek” as primarily real-life-science oriented (in which case, I’m both). This seems to be the definition my social circles tend to use. But I’ve never seen any data, let alone this extensive sort of thing, applied to that assertation.

Out of curiosity, how do you think a person who creates fanworks would fall? An author who writes fanfiction, for example? Would that be a nerd, because we “practice” our interest in the form of making new stories about our favorite characters?

“Or, to put it pictorially à la The Simpsons”…ROTFL That’s pretty much how I’ve always envisioned it.But honestly, I don’t think one is better or worse than the other…who cares what you like??? As long as you’re comfortable being your true self.