Archive for data

My latest video is from a talk I gave back in July at the RightOnline conference. I had 5 minutes to give a talk and I had something all planned out… until President Obama gave this speech in Cleveland. In this speech he stated:

Our businesses have gone back to basics and created over 4 million jobs in the last 27 months — (applause) — more private sector jobs than were created during the entire seven years before this crisis — in a little over two years

I decided to check him on his jobs claims and I summarized my findings in my talk, which I reproduced for this video.

There is a more comprehensive jobs number (employment) that tells a very different story.

Deception Through Selection

And here is where I give a little more detail on what numbers I used. First a little background:

President Obama gave this speech on June 14, 2012, so at that time we were using the most recent BLS jobs report which had number up to May. Counting backward from there, that means Obama was counting from March 2010 to May 2012.

March 2010 – 106,914,000 private sector payrolls

May 2012 – 111,040,000 private sector payrolls (revised up 32,000 in later reports to 111,072,000)

I was assuming that when Obama said “before the crisis” he meant before we started losing jobs. That would put the “7 year” number from February 2001 to February 2008.

February 2001 – 111,623,000 private sector payrolls

February 2008 – 115,511,000 private sector payrolls

Difference in 7 years – 3.88 million private sector payrolls

As you can see, the Obama graph is a nice simply upward slope including only the part of his presidency where he gained jobs. In fact, he starts counting only after the jobs number completely bottomed out. If we look at the jobs record during his entire time in office, we get this chart

Is there any thing wrong with not counting those initial job losses? I don’t think so. I think it is a perfectly reasonable thing to do to say “let’s look at the strength of the recovery alone” and use that metric to count. But it is incredibly disingenuous of the Obama team to completely discount job losses for themselves but then turn around and count them in the comparison data point.

In the video, I point out that using “6 years before the crisis” or “5 years before the crisis” result in vastly larger numbers (6.4 million and 7.1 million respectively), but what I’m really interested in here (and what I’d like to expand upon) is comparing private sector payroll growth that Obama is touting to the private sector payroll growth under Bush.

I looked at this a couple months ago and was a little shocked to see the following chart, but here it is. Starting at the low point of private sector jobs growth, if we chart what I will (for simplicity sake) call the Bush recovery (starting in July 2003) and the Obama recovery (starting in March 2009) using the latest data, we get:

As you can see… the weird thing about this current recovery is how closely it is tracking to the previous recovery in terms of private payroll increases. For Obama to pretend he is substantially better than Bush on this metric is nothing short of fantasy.

The Larger Jobs Number (Employment)

The first one is the establishment data (B Tables) and this is a survey counts jobs by industry. Think of it as someone calling a bunch of businesses and asking “How many people do you have on payroll?” They directly sample over 100,000 businesses and it has a margin of error of about 100K jobs.

The second one is household data (A Tables) and this is a survey of households. Think of it as someone calling a bunch of people and asking “Do you have a job?” It samples about 60,000 households and has a much larger margin of error (400K jobs).

The establishment data is usually used for month-to-month job counts in part because it tends to be a much less volatile metric (household data can swing somewhat wildly). That’s why, when you hear about “X jobs gained last month”, they use the number from the establishment survey.

However, a weird thing happened in the 00’s with the household survey. If we take the private payrolls and compare them to what I’m going to call “private employment” (the A table employment number minus government jobs), we see a massive difference in the job count.

That’s a 3 million job difference between private payrolls and private employment. This is way outside the margin of error. Something happened there, althoughI’m not sure what. Maybe self-employment increased, or people made ends meet w/ irregular non-payroll income or farm employment jumped. I honestly don’t know and anything I say here is pure speculation. But there it is, clear as day.

This is why Obama focuses so much on private payrolls as the metric he uses. Most fact-check organizations are not savvy enough to notice that there is this huge discrepancy in the jobs data from survey to survey. They only think to check Obama’s statements against the private payrolls data, not the overall employment.

As you can see, the change in both jobs numbers are nearly identical. If we add in government job losses, we actually get a negative number on employment change since his inauguration. This shows that something was happening in the last recovery that isn’t happening in this one.

And I’m damn tired of picking it apart 140 characters at a time, so I put together this sarcastic infographic showing exactly how sloppy this piece really is.

(Correction: An earlier version of this infographic incorrectly identified the $3.8 Trillion 2013 as a CBO projection. That is the spending request from President Obama 2013 budget.)

UPDATED (05/24/12, 3PM):

There are three things in this infographic that should be called out more explicitly.

First, much of the debate here centers around who exactly should catch the blame for FY 2009 spending. This is actually a very tricky question and I think compelling cases can be made for both sides of this debate.

My personal position is that it’s really complicated. But one thing is for certain: in hindsight the CBO January 2009 estimate is so obviously wrong that using it should be called out and mocked.

The January 2009 CBO estimate might have been a “best estimate of what Obama inherited”, but only in January 2009 when spending data was *very* hard to predict. January 2009 marked the worst part of the recession and the uncertainty was very high. Only a few months later, Obama’s budget estimated 2009 spending would be $400 billion higher than the CBO estimate.

But now we can look at the data, not the estimates. And we should. The spending data ended up $20 billion lower than the CBO estimate… and that included the stimulus spending (which Nutting says was $140 billion, but I’m still trying to track that number down). If that is the case, the high-end estimate for Bush’s fiscal year is $3.38 trillion. If we compare that to Obama’s 2013 budget proposal ($3.80 trillion), that’s an increase of 12.5% (3.1% annualized). Which isn’t that high, but it’s also using a baseline that is still filled with a lot of what were supposed to be 1 time expenses (TARP, Cash for Clunkers, the auto bailout, the housing credit, etc).

Second, Nutting uses the CBO baseline in place of Obama’s spending. This is easily verified and I can’t think of a serious economic pundit who would say this is OK. I can think of two reasons for doing this: Either a) Nutting is a monstrously biased ass who (rightly) figured no one in the liberal world would fact check him so he could use whatever the hell number he wanted to use or b) Nutting had no idea that the CBO baseline isn’t a budget proposal. I’m actually leaning toward the second explanation. Nutting uses so many disparate sources it seems clear he doesn’t know his way around federal finance.

Finally, my biggest goal here was to point out the inconsistencies in the analysis. Nutting wants to use the 2009 CBO estimates, but only one column (only for attacking Bush on spending). He wants to compare estimates from one year to actual spending from other years to the CBO baseline from this year. And, as if he is a magical cherry-picking elf, he manages to pick just the right numbers to give him just the right data. This could be an accident. Stranger things have happened. But it seems more likely that he intended to squash a talking point by any means necessary and he went looking for the best data to do that.

I will be accused of massaging the data by people who don’t understand what I’m doing here. I’m pointing out the data massaging on Nutting’s side and calling him on it. I’m saying “If you’re going to use the CBO estimate, use the f***ing CBO estimate!” Don’t use just the part you want and then pretend like the rest of it doesn’t exist. Commit yourself to the data you’re using and follow it, even if it doesn’t go where you want it to go.

Every month when the BLS releases the employment report, I dig into the data and tweet about it at length using the hashtag #BLSFriday. (Follow me on Twitter to catch this incredibly exciting data dive. The next one is on June 1st.)

If you’ve been following the job numbers closely, you’ll know that this recession we’ve seen a particularly sharp drop in labor force participation. Labor force participation measures how many people either have a job or are looking for a job as a percentage of the population. As of March 2012 labor force participation has dropped to 63.6%, the lowest point since December 1981.

Because the unemployment rate doesn’t measure people who aren’t in the Labor Force, many (especially conservatives) have noted that the unemployment rate is “artificially” low and that many have left the labor force, basically giving up even looking for a job.

One Twitter friend, @rizzuhjj, pointed out that the Chicago Fed has a paper that claims that half of the post-1999 decline in the labor force is due to long-term demographic trends, specifically, Baby Boomers aging.

Here is a chart of the labor force participation rate since it the last time it was this low. You can see that we’re at the point where Boomers are starting to retire, so surely that would be driving the massive drop in labor force participation and not due to the recession, right?

To test this, I decided to sift through the employment data by age, as provided by the BLS. In January 2008, the participation rate by age looked like this (click to enlarge).

(The outline is a rough approximation of where Baby Boomers land in the data. Which is OK because the Baby Boomers are an approximate age group anyway.)

You can see that the boomers are largely entering the age ranges where participation in the labor force drops off significantly. So, on the surface, this explanation makes sense.

This was my test: Take the participation rates for post-Baby Boomers (16-49 year old) and multiply them for the corresponding populations for those ages. That way we’ve isolated just the post-Baby Boomer labor force and can see if it is smaller now than it was 3 years ago. This is what I found.

Or, to make it a little clearer, this is the change in labor force participation by age since January 2008.

Apply the January 2008 participation rates to current population and this means we are missing 3.4 million post-Baby Boom workers from the labor force. These post-Boomers account for 68% of the “missing” work force.

If labor force participation was dropping only due to Baby Boomer retirement, the rate should have dropped from 66.2% in January 2008 to 64.8% today. Instead, it is 63.6%. There is certainly a good deal of room for improvement to get younger people back into the labor force. We shouldn’t simply push the problem off to being Boomer retirement or we risk ignoring a whole generation that is unemployed and flying under the radar.

I wanted to give an informal critique of this infographic. I honestly believe creating infographics is a form of art and that we need to give deep and careful consideration to all aspects of this art.

who is the target audience?

What they should want out of this infographic is to have the viewer see themselves in the family budget. They should be targeting a) people who are independents and b) people who might care about the federal budget.

I’m going to go out on a limb and say that the average family of four making under $25K a year doesn’t give a crap about the federal deficit. And complaining about it to them is probably not the best tactic to win their vote.

make the numbers mean something to the audience

On a quick look, the median income for a family of four in the US is about $67K. This is going to be a number people are a little more familiar with. People who do care about the deficit are going to look at the numbers in the infographic and feel a certain disconnect because the income is so far away from what they are familiar with.

When a typical man or woman supporting a family of four sees this infographic, they will start this train of thought:

“Well, if I had an income of $24,686, we’d have to move to another house. Gosh, where would we go? Probably rent somewhere, it would have to be under $700 a month. We’d have to sell a car and the kids… wow, we’d have to cancel most of their activities. Would I even be able to afford my iPhone? I’m under contract for another year, so I’d have to wait that out but I don’t think I can function properly without a smartphone…”

Can you see what they’re not thinking about?

THE FEDERAL BUDGET!

Instead, they should have realized that you want the audience to slip easily into the role of the family. To this end, recalculate all the numbers for a median family of 4. I’ve done it here:

Family income – $65,500

Family spending – $100,708

New Debt – $35,208

Total Debt – $434,081

Note: My first calculation was for $65,000, but I saw that this number brought the “spending” number to just to just under $100,000, which is an psychologically important hump. So I bumped the income up another $500 to hit that psychological mark. These kinds of details should be in the mind of every infographic creator.

These numbers are going to target an audience that cares about the topic at hand, and ultimately make more of the impact we want.

the graphic is not “share-sized”

What you see above is only 25% the size of the original. The original version of this thing is a half megabyte and comes in at 2112 x 3731 resolution. Holy cow.

Everyone knows the new iPad has a monster resolution, right? Here’s how this graphic would look at full resolution on a new retina-display iPad.

And on an iPad 2

A lot of viewing these days is done on mobile devices with screen sizes much smaller than an iPad 2. By having such a monster infographic, we’ve cut our potential viewing audience way down.

And they have no options for sharing it at a smaller size. There is a link to “download and print” it, but who is going to do that? Infographics are seen online. If you’re going to print them, fit them onto an 8 x 11 piece of paper. This infographic does neither.

I’m glad the Romney team has made infographics a part of their media platform. But they have a long way to go to create infographics that make the kind of impact that they potentially can make.

I’ve decided to do the same thing with government finance data. This data is actually much easier to find, but getting all the relevent information into the same spreadsheet can be something of a chore. So I’ve went ahead and did it in a “once and for all” sort of way, combining separate data sets into one handy spreadsheet.

Before you go ahead and download this, please know that this information ins free because a) it didn’t cost me anything but also b) I think it is important that everyone is able to look through government finance data. That being said, it does take significant time and effort to grab the data, re-format it properly, and combine it in the most useful way. If you’re using this data in any kind of professional capacity or if you appreciate the work it takes to do this, please consider donating to my efforts.

Raw Monthly Data 1947 – Present

This data has not been changed in any way. It is a straight translation of the monthly official government data for:

January 2012 BLS Files

Brief Interruption To Beg

This took a not-insignificant amount of time and if you use it in anything resembling a professional capacity, I’d really appreciate a beer as a way of saying thank you.

BLS-To-Excel Application

For those of you who are a little more interested in the data and willing to follow a lot of directions, I’ve decided to publish the program I use for this so that you’re not reliant on me to publish this every month. I do mostly Microsoft development, so you’ll need Windows to run the project

The code is a disaster in a large part because the BLS data is something of a disaster. However, the app itself contains some helpful tutorials on how to get the data and make everything work.

It looks awful. But if you follow the directions, it works.

This will never be a professional application, but I’ll update it as I can. If you happen to have any talent in design, my “thing” is translating designs to reality. So if you want to send me even a screenshot of how you think this app should work, I’m happy to incorporate that into the next version.

Yglesias points to a couple of charts, but I’ve helpfully replicated his data set into a single chart, because that’s just the kind of guy I am.

As you can see, using January 2009 as our point of reference, private jobs have rebounded from a drop of 3.79% in 2010 to a drop of 1.63% in August (my data is slightly out of date, but good enough for gov’t work… get it?!?). Local gov’t employment has fallen 3.6% in that same time frame. I also added federal gov’t employment (which has fallen 2.75% since January 2009) for the heck of it.

In the comments section, Peter Schaeffer complains that Yglesias is cherry picking the data and points out that gov’t employment saw +10% gains in the decade leading up to the crash and 3-4% losses from the peak while the private sector saw slightly less than 5% gains in that time period and slightly more than 5% losses from the peak.

I thought that Schaeffer had a good point, but needed some visuals to drive it home, so I thought I’d show Yglesias’ jobs data in Schaeffer’s context.

As you can see, Yglesias’ data starts at a really handy place for his argument, since it begins measuring job losses and growth at a time when we had already seen drastic private sector losses, but no public sector losses.

Of course, the funny aspect to this data is that one could use it to say that President Obama is reigning in the public sector that George W. Bush let grow out of control. I think the only reason no one is saying this is because everyone on President Obama’s side would consider that a bad thing and everyone who opposes President Obama would consider that a good thing. Neither side really wants to attribute this trend to President Obama. In fact, President Obama is working actively to reverse this trend.

Ah, the little ironies of life.

Note: In the spirit of “never attribute to malice what can be explained by incompetence”, I wouldn’t be surprised if Yglesias unwittingly cherry-picked the data. “The Obama years” is a perfectly rational place to start looking at data and, if that was the only data you looked at, it would support his conclusion. On the other hand, Yglesias has always had a better grasp of the data than this particular post suggests, so I suspect he kind-of-sort-of knew that this was a cherry picked sample set but was OK with using it because it bolstered his argument.

Every time a national unemployment report comes out, I tweet the many details from @politicalmath. Frequently I get a lot of the same questions, so I thought I’d jot down a quick summary on unemployment reports and numbers and where they come from.

There are 2 kinds of employment numbers, summarized here:

Establishment Data (Current Employment Statistics or CES) – this survey covers 400,000 businesses and counts the number of payroll positions that are filled.

Household Data (Current Population Survey or CPS) – this survey covers 60,000 households and counts the number of people who are employed and unemployed.

When an employment report comes out from the Bureau of Labor Statistics (BLS), they usually report:

The unemployment rate, which is calculated using household data

The number of jobs added, which comes from the establishment data

Sometimes this data can seem contradictory. For example, between March and June 2011, we gained 290,000 jobs but the unemployment rate went up .4% (from 8.8% to 9.2%).

There can be a couple reasons for this. The first one is that, the “jobs added” number comes from subtracting last month’s establishment jobs number from this month’s establishment jobs number, but we never use either of those numbers to calculate the unemployment data.

Why?

Because the essence of the establishment jobs number is asking employers: “How many people work for you?” It gives a nice accurate number, but it doesn’t tell us anything about how many people don’t work for them. We don’t have any number on the unemployed, only a number for jobs.

For unemployment, we have to go to individuals and ask them: “Are you employed or unemployed?” Then we take the unemployed number and divide it by the total number of people who are in the labor force, which counts both the employed and the unemployed.

But even the differences between the establishment jobs number and the household jobs number can be big. According to the household jobs number (which is supposed to exclude farm workers and the self-employed), we had 139.6 million jobs in August 2011. According to the establishment jobs number, we had 131.1 million.

That’s a difference of 8.5 million jobs, and that kind pf spread is pretty normal. The variation changes a little month-to-month, but we could get a report of jobs created from the household number and jobs lost from the establishment number. In fact, we saw something similar in August where the household number said we gained 331,000 jobs, but the establishment number said we gained 0.

So why is the establishment number reported?

Because the establishment survey is so much larger, more reliable and gives more consistent results. In the graph below , we can see that even though the establishment data counts fewer jobs, it is a less erratic count.

So… that is a quick explanation of the employment report. I dig into this data once a month, so I’m pretty familiar and I’m delighted to answer questions or explain in greater detail in the comments.

It all boils down to some pretty interesting data points (what do white/black/asian/Hispanic people like or have in common, what is the focus of men and women of different races, etc). The one thing I was a little shocked by was this graphic at the end.

I was a little surprised that the Protestant reading level was so low. And when I am surprised by data, I try to see if I can replicate it. (This strikes me as an eminently scientific thing to do and yet I am bemused when people think I’m “attacking” their data. Whatever.)

Note: I should mention at the outset that it does kind of irk me that OKCupid displays this data and then kind of assumes that it holds for all people across the board when it is obvious to anyone who devotes more than 5 seconds to thinking about it that the data really only holds for OKCupid users who (I’m going out on a limb here) are probably disproportionately young, tech savy and single.

Moving along. To determine reading levels they ran the Coleman-Liau Index on the profiles so I went and typed up two sample religious profile summaries, one Christian and one atheist. They’re only a couple sentences long which I figure is fine since OKCupid profile summaries aren’t exactly known for their complex narrative arcs.

Here are the profiles that I typed up, attempting to mimic what I thought would be a fair religious summary from a similar reading level.

Atheist: I am an atheist. I believe that there is no God and that most people only believe religion because they are taught to do so by society and possibly also their parents.

Christian: I am a Christian. I believe that Jesus died on the cross for my sins and that he was raised again on the third day. I think that the Bible teaches us the truth and that God loves us very much.

Really? Those little blurbs are so radically different that the Christian one is 4.68 grades stupider based on nothing more than a readability analysis? Sounds like BS to me.

Let’s try adding some evolution in there:

Atheist: Same as before + “When it comes to the world around us, evolution is the most likely explanation for everything.”

Score: 12.23

Christian: Same as before + “When it comes to the world around us, I think there are probably gaps in evolutionary theory and that evolution can’t explain everything.”

Score: 8.90

Well, that closes the gap by 1.35 points, but we’re still looking at a 3.33 grade gap between two positions that are transparently written with an identical textual style.

I progressively tried to add more and more to the Christian profile to counter-act the low score I got from starting with the basics of Christian belief.

Finally I ended up with:

I am a Christian. I believe that Jesus died on the cross for my sins and that he was raised again on the third day. I think that the Bible is true and that God loves us very much. I think there are probably gaps in evolutionary theory and that evolution can’t explain everything. Additionally, the philosophical underpinnings for views that argue against Christianity frequently neglect to apply the same standard of ideological rigor to their own faith based assumptions. Consequently, they hold Christianity to a double standard assuming that their position is the default one and that there is no need to defend it.

Score: 12.33

This “Christian profile view” scores about as well as an absurdly simple statement of atheism with a supporting line about evolutionary theory. Basically, the algorithm they used translates “6th grade atheism” at the same level of textual complexity as “basic Christian beliefs + a philosophy degree”. (I flatter myself somewhat, but the final two lines are clearly a college level writing style.)

Here is not what I’m saying: I don’t think there is any level of conspiracy theory behind any of this. No one designed the algorithm so that Christians would look stupid.

However!

It seems likely that a simple statement of Christian belief like the one entered above anchors the score at the low end. The more someone communicates their Christian belief in the language that has been familiar in churches for centuries, the less likely they are to score well regardless of the remaining textual analysis of their profile. This anchoring effect might get lost if the profile was a three page essay. But profiles tend toward being short, simple statements meant to clearly indicate basic beliefs, inclinations, or personality traits.

Note: I promise I’ll pull back on religious topics. I just get irritated when people pull “evidence” of religious people being inferior in some way shape or form. It usually strikes me as hackery that the creators or purveyors of whatever data set are perfectly happy to accept and so they neglect to do any sort of skeptical follow-up.

You can actually see the same thing with a lot of war-based data. Half the time, the people pointing to the data didn’t even get the data right and a good chunk of the remaining examples strip context out of the data. Bugs the hell out of me.

Charles Blow’s most recent New York Times op-ed is something of a boon for visualization enthusiasts. He replaces almost his entire article with a visualization. This illustrates that he recognizes power of visual communication to make and reinforce a point in a way that is self-obvious and can stick with the reader better than words.

Unfortunately, he has decided to use data that misleads his audience to such an extent that I can only conclude that he is unconcerned with the truth insofar as it undermines his desired objective.

Blow’s main point is that the US is an outlier in the world because we’re religious but also rich while “religiosity was highly correlated to poverty”.

I’ve reproduced the chart in question below. (Click to enlarge)

Now, keep in mind that this is not charting religion as it is listed in the CIA World Factbook, but according to the specific question: “Is religion an important part of your daily life?” That will be important in a little bit.

This chart seems to prove his point. Until you realize what isn’t on the map.

Here is a list of the countries that didn’t manage to make their way onto the map due to the fact that Gallup didn’t poll them:

Problem number one – Charles Blow has a duty to inform his audience of these omissions. The countries without data represent nearly 25% of the world population and skew heavily toward non-religious. They are too large and too important to the data set and visual reference to simply ignore. Yet Mr. Blow doesn’t seem interested in mentioning them.

Problem number two – Mr. Blow heavily implies that there is a causal relationship between religiosity and wealth. But (as we all know) correlation doesn’t imply causation. Western European countries (and countries filled with people from Western Europe) are richer, as are developed Asian countries. Eastern European and South American countries are less rich. Middle eastern, and African countries tend to be much poorer. There’s a correlation in geo-political histories here that is stronger than religion.

Of course Mr. Blow could always go to rural India and inform them that their poverty is related to their devotion to Hindu and has nothing to do with British imperialism. Or perhaps to the deep south where he can proclaim to the +90% Christian black population that their economic woes are related to their religious tendencies.

Problem number 3 – But the final problem is the worst one because it involves an outright lie:

Singapore is more religious and richer than the United States. And Mr. Blow didn’t map it. At all.

It’s possible that Mr. Blow is actually so numerically illiterate that he didn’t know he was supposed to tell people about key missing data points. But taking out data that doesn’t align with his point is disgusting manipulation. The end result of his deception (conscious or otherwise) is “If you take out all the poor atheists and take out all the rich religious people, then this pattern emerges…”

Mr. Blow should put Singapore back in to the data set and add a correction to his article that announces how his data set has enormous gaping holes. And he should probably never be allowed to touch charting software again.

* The CIA Factbook has Taiwan listed at 93% Buddhist, but I’m not sure how they would answer the specific question that Gallup asked. I’ve heard some atheists claim Buddhism as an “atheistic religion” (no personal god) so it could be that the citizens of Taiwan wouldn’t say that religion plays a big role. I simply don’t know.