Chemistry likes to think of itself as the “central science.” Is that true? Intuitively it makes sense. But how can we measure that more rigorously? In comes the Stanford Dissertation Browser:

The Stanford Dissertation Browser is an experimental interface for document collections that enables richer interaction than search. Stanford’s PhD dissertation abstracts from 1993-2008 are presented through the lens of a text model that distills high-level similarity and word usage patterns in the data. You’ll see each Stanford department as a circle, colored by school and sized by the number of PhD students graduating from that department.

When you click a department, it becomes the focus of the browser and every other department moves to show its relative similarity to the centered department. The similarity scores are computed using a supervised mixture model based on Labeled LDA: every dissertation is taken as a weighted mixture of a unigram language model associated with every Stanford department. This lets us infer, that, say, dissertation X is 60% computer science, 20% physics, and so on. These scores are averaged within a department to compute department-level statistics (the similarities shown), and need not be symmetric. For instance, Economics dissertations at Stanford use more words from Political Science than vice versa. Essentially, the visualization shows word overlap between departments measured by letting the dissertations in one department borrow words from another department. Which departments borrow the most words from which others? The statistics are computed for each year in the data.

You can play around with the browser here. I’m assuming at some point in the near future this sort of analysis is going to get much, much, easier, because of the sea of data which powerful software can extract and visualize patterns out of. Below are the fold are five screen shots I thought were of interest. Genetics, biology, and chemistry dissertations in 2008. And Anthropology in 2007 and 1998.

Awesome, truly awesome. The comparison of anthro’98 and anthro ’07 is just what one would expect.

I only wonder whether this really tells us what you seem to be implying it does. For example, taking poli sci and econ, if econ borrows more words from poli sci, does this really mean that poli sci is a more fundamental field than econ, or does it mean that the two fields overlap by necessity but that the candidates in econ are more conversant with the relevant poli sci than the poli sci candidates are conversant in the relevant econ? If the latter, does this just mean that econ is a more rigorous and demanding topic, attracting better students? Or could it reflect changes in the fields? For example, is the change in anthro an ideological change within that discipline or does it reflect advances in biology?

http://jbashir.wordpress.com Bashir

This is excellent. I’m sure there are issues regarding the data and what conclusions to make but it seems like a good start with a lot of potential. On issue is that the departmental research focus varies across institutions. It would be great to see this averaged over a few more schools.

I’d never heard that “central science” comment. I would have said biology.

omar

Very cool. But what is going on between civil engineering and biology (and genetics and developmental biology)?

http://rxnm.wordpress.com/ miko

That thing thinks neurobiology is closer to electrical engineering than to biology. It is easy to see why that might be so based on key vocabulary terms (voltage, potential, conductance, ion), but this “closeness” would certainly fall apart immediately based on something like “who is interested in who’s seminar.”

omar

Aha, so there are terms that are common between civil engineering and biology but not between civil engineering and religion or art history? Just out of curiosity, what would those terms be?

Brian Too

I always thought physics was the most fundamental science. Not sure about the term “central”, I’ve never heard it expressed that way before.

None of this is a statement of the value of any particular discipline of course.

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at http://www.razib.com