Jobs Kill, BIG Time

I’ve saved the most interesting result in Ken Lee’s thesis till today. The subject is how death rates vary with jobs. The big result: death rates depend on job details more than on race, gender, marriage status, rural vs. urban, education, and income combined! Now for the details.

The US Department of Labor has described each of 807 occupations with over 200 detailed features on how jobs are done, skills required, etc.. Lee looked at seven domains of such features, each containing 16 to 57 features, and for each domain Lee did a factor analysis of those features to find the top 2-4 factors. This gave Lee a total of 22 domain factors. Lee also found four overall factors to describe his total set of 225 job and 9 demographic features. (These four factors explain 32%, 15%, 7%, and 4% of total variance.)

Lee then tried to use these 26 job factors, along with his other standard predictors (age, race, gender, married, rural, education, income) to predict deaths in the 302,890 people for whom he had job data. Lee found that his standard predictors didn’t change much, and found these job factor risk ratios (Table 34, column 2):

Ten of the 26 estimates are 5% significant, and five are 1% significant – this isn’t random noise (*** p<0.01, ** p<0.05, * p<0.1). Each factor is scaled to range in value from 0 to 1 across the 806 occupations; its risk ratio is an estimated ratio of death rates when that factor has its max value of one, relative to death rates when that factor has its min value of zero. And these are huge risk ratios!

If you take all of Lee’s standard non-age predictors (race, gender, married, rural, education, income), and multiply together their risk ratios, you’ll find that a poor badly-schooled unmarried urban black male dies 17.7 times as often as a rich well-educated married rural asian woman (of the same age), with a lifespan roughly thirty years shorter on average. (A risk ratio of 1.57 costs roughly five years of life.)

Yet big as this effect is, the top five job factor risk ratios give a total ratio of 19.7, bigger that all the other non-age effects put together! And the top ten job factor ratios give a total risk ratio of over 100! (All twenty six factors together give a total risk ratio of 563.) Jobs are clearly a huge and neglected influence on who lives and who dies.

If you cared about preventing death, rather than just signaling your concern, these results suggest you stop wasting your efforts on tiny effects like medical insurance, auto accidents, crime, recreational drugs, radiation, or food safety, and focus on: jobs. Yes a lot of job-death variation must come from different types of people doing different types of jobs, but a great deal of this variation is also likely causal – some jobs kill folks much more than others.

At the very least we should try to tell people about the huge life and death consequences of their job choices. Then workers could demand higher wages for more deadly jobs, which should induce employers to seek ways to substitute less deadly for more deadly jobs. Alas I suspect most folks will just shrug their shoulders – these sort of effects seem too abstract to elicit much concern. If you look at a person doing a job they don’t look like they are dying. Not like if snakes were killing people on planes …

FYI, here are some sample jobs rated high and low on the four overall job factors (from Table 49):

If jobs kill, why not prove your point with statistics for mortality by job type? Going through the factor analysis to get to your conclusion is the convoluted route; the table for what is implied to be ‘the most dangerous jobs’ is unconvincing for that reason. Physicists have the most dangerous jobs? Really?

http://cephalicfurrow.wordpress.com PeterW

No; there is a negative correlation with complexity and death.

I’d like to see this controlled for IQ, which is known to have a high correlation with life expectancy.

The people vs. things makes sense, and it’s nice to have experimental verification of that correlation.

http://daedalus2u.blogspot.com/ daedalus2u

snarles is right. Factor analysis like this with so many degrees of freedom is nonsense. It is indeterminant. There is no unique solution, there are arbitrarily many solutions.

Usual practice when fitting data to a model is to use part of the data to set up the model and then use the balance of the data to test the model. In this case, divide the data into every other year. Use half of it to generate the model and then the other half to see how well it fits. When there is no known underlying theoretical understanding of the correlation and causation between the data and the outcomes this is especially important.

I think if that is done without bias, these correlations will disappear. Because factor analysis is indeterminant, weighting factors can be chosen that will produce these correlations, but weighting factors can also be chosen that will produce the opposite of these correlations.

If the correlations are real, then they should show up as high death rates in specific jobs. Coat room attendant, maids, housekeeping workers should have very high death rates. Do they? The life insurance industry would certainly have a very strong incentive to incorporate differential death rates like this in their actuarial instruments. Do they charge coat room attendants much higher premiums? I don’t think so.

http://hanson.gmu.edu Robin Hanson

The factor analysis is based on data that doesn’t include health measures.

Captain Oblivious

Agreed – Robin, pretend for a moment that we aren’t all statisticians who have nothing better to do than analyze statistical significance calculations and/or read/understand 273-page theses.

What are the “safe” jobs, and what are the “dangerous” jobs?

http://webtrough.wordpress.com DW

Socially challenging is the more bizarre one up there by far.

If that means sales is super-dangerous to your health, I completely buy it. But it’s still shocking.

Sister Y

It seems bizarre until you see the literature on ostracism/social pain, e.g. the work of Kip Williams and Naomi Eisenberger.

http://www.nancybuttons.com Nancy Lebovitz

I’d want to check on exposure to noise/loud sounds.

And sales is socially challenging, but it’s also at least partially paid on commission. I wonder if erratic income is a factor.

Would I live longer without a job? Or is there a particular kind of job I should be aiming for?

William H. Stoddard

This business of multiplying together the top five job risk factors seems like meaningless statistical manipulation. The only way multiplication makes sense is if all the things you’re multiplying have zero correlation with each other; and since they derive from separate factors analyses of seven different domains—not from a single factor analysis of the whole set of variables together—I don’t see how we could know that.

But there’s a simple check on this. You get a ratio of 19.7 for the top five, or over 100 for the top ten? Okay. Let’s look at individual jobs. Are there any two jobs in the occupational table for which the more dangerous has 100x the mortality of the less dangerous? Or even 19.7x? If you can’t get two jobs that differ by 100:1, then multiplying the various factors can’t be meaningful, which casts doubt on the meaningfulness of the 19.7 ratio as well. What are the actual data on the actual jobs?

On the other hand, if you do get that 100:1 ratio, I will put my doubt aside.

http://hanson.gmu.edu Robin Hanson

The point is just to have a measure of the overall importance of a set of influences; its not importnat just how many people have all the extreme values together.

William H. Stoddard

But if mulitplying all the risk ratios together is not valid—as I believe it would not be if the different risk factors are correlated—then your “overall importance” obtained by that multiplication is going to be exaggerated.

I’m not proposing looking for risk ratios for actual pairs of jobs because the jobs themselves are important, but as a way of checking whether the order of magnitude of effect you project even remotely makes sense.

http://hanson.gmu.edu Robin Hanson

It is just not true that regression analysis is invalid if variables are correlated with each other.

William H. Stoddard

I certainly did not suggest otherwise. But for proper analysis you need to compensate for situations where several of your variables measure the same underlying dimension of variation. It’s not clear to me that you’ve done that in your “100:1″ figure.

Of course I could be entirely wrong. Can you find any two jobs in the American economy that have 100:1 differential mortality, or even close to it? If there is no such pair, that would strongly suggest that your number manipulation is empirically meaningless. If there are a bunch, then likely enough I’m wrong.

Eric Falkenstein

The categories I find hard to intuit: business knowledge is positively correlated with mortality? Intuitively, these are middle managers, a rather low-risk (I think) occupation. A ‘cooperative’ job could be different depending on how well one does it, say, a taxicab driver who is indifferent to his passengers would be uncooperative, an actuary who is really good works cooperatively with his customers and colleagues. Starbucks barristas actually have a rather cognitively demanding jobs, because orders like ” tall half-skinny half-1 percent extra hot split latte with whip”. Might there be other groupings, such as ‘construction, taxicab, retail clerk, middle manager’, that would be similarly small (22ish) but more meaningful?

Burger Flipper

Instead of predicting this will get ignored, why not work to keep this prediction from becoming self-fulfilled and give some more concrete examples and add a little layman level explanation.

People of note do follow this blog. A comment I made here a few years ago about a poker player determined to beat the election market on Intrade led to a NYT story. (granted, a puff piece).

This sounds important. It could also generate a lot of interest. But you are almost certainly right that it will not as is.

http://hanson.gmu.edu Robin Hanson

I’ve given what concrete examples I have.

http://pancrit.org/ Chris Hibbert

In the first table, Ken lists some job attributes and their risk ratios, but they all have the same sign. I would have expected some of these attributes to be beneficial and others to be harmful. How do we tell which is which?

And echoing earlier sentiments, if you have five factors which produce a cumulative ratio of 19.7, can you go down the list and find jobs which have the beneficial alternative for all five, and another that has the deleterious option on all five? If so, you should have the jobs that have the comparison you mention. Otherwise, we can see that even if the factors have the separate effects Ken finds, they never actually line up to produce the huge effects.

http://hanson.gmu.edu Robin Hanson

The table entries are death risk ratios. Ratios above one hurt, while ratios below one help.

Margus Niitsoo

This might have been emphasized a bit more, because on first read, the table made no sense what so ever, and I had to look into the comments section to figure out this fact.

Granted, I have no economics background, but I did just get my PhD in theoretical computer science, so I thought it worth mentioning if you were indeed aiming for the general audience.

Mitchell Porter

“If you cared about preventing death, rather than just signaling your concern, these results suggest you” […]

The headline “Jobs Kill” is easily determined by looking at the risk ratios for particular jobs. If you want to educate people about the mortality risks of job choice, you could do so by just giving them such numbers. What interventions are helped by this factor analysis?

http://www.isteve.blogspot.com Steve Sailer

Robin writes:

“At the very least we should try to tell people about the huge life and death consequences of their job choices.”

There’s a whole genre of reality TV programs that follow men doing dangerous jobs, such as “Most Dangerous Catch” about Alaska deep sea fishermen. I’ve read dozens of articles about death rates in different jobs and what the wage premiums are. USA Today runs that kind of article frequently.

http://hanson.gmu.edu Robin Hanson

Douglas and Steve, most discussion of jobs killing considers only on the job deaths, which is only a tiny fraction of the overall effect. Estimating fixed effects for 800 jobs could easily run into data limitations and let people just assume some hidden selection of people into jobs explains those fixed effects. WIth just five or ten factors, we have a much better chance of estimating and understanding the effects, and so also better understanding the degree to which they might be explained by selection.

Captain Oblivious

a poor badly-schooled unmarried urban black male dies 17.7 times as often as a rich well-educated married rural asian woman (of the same age), with a lifespan roughly thirty years shorter on average.

WTF? A lifespan 30 years shorter would seem to correlate to dying maybe two or at worst 3 times as often (per capita) as someone else. If rich well-educated asian women life to, say, 100 – do poor badly-schooled urban black males live, on average, to the ripe old age of 100/17.7 = 5.6 years old? As harsh as poor urban life can be, I find it hard to believe it’s THAT bad!

Again, please assume we’re not all statisticians and/or actuaries, and tell us what numbers like “17.7” actually mean!

http://shagbark.livejournal.com Phil Goetz

Interesting – but the amount of data needed to fit a model scales as some base to the power of the number of variables in the model. How could he possibly have had enough data to compute risk ratios for 26 variables?

I’m guessing he just individually looked into correlation without controlling for secondary factors?

P.S. The article’s language is confusing, for example Black men and Asian women die with equally frequently- once per person. (I know what you’ll say, but see 2 posts above for why your first reaction to that line wasn’t the right one either) It could be really helped by just providing clear, real life examples of what is being discussed.

Bryan Lundeen

Looks statistically correct but not a reason to ask for higher wages. The market determines the wages for such jobs. If it is a dangerous job then people also need to consider where that job leads in the future such as will they get promoted to a desk in the future? should they get education for a less stressful position? Can’t use these statistics to justify asking for higher wages especially in a bad economy where it is the employers market.

So in this table… does “higher ranked” = “more likely to die on the job” or “less likely…”?
…and what is the significance of having 4 categories? What conclusions are laypeople supposed to draw from that?
Forgive the stupid questions, but is it saying that some professions cause people to “reason themselves to death” or “people themselves to death” or “attention to detail themselves to death”?!
(the physical category makes some kind of sense… I suppose the others are sort of talking about stress killing people or something?)

Help anyone? A translation? Imagine you have an audience of postgrads from every other department in your uni, and try and convert all this stats-talk into something they can relate to.