Friday, May 31, 2013

Data visualization can serve two purposes and two audiences, says Lisa Strausfeld, Global Head of Data Visualization at Bloomberg LP. For the novice, it can serve as an explanation; for the expert, data visualization can guide exploration. Bloomberg Billionaires - that's a screenshot above - does a bit of both. It's interactive - that's billionaires plotted by industry above. But you can also change the day, and filter sets of data. For example, the screenshot below is the net worth rankings as of March 14, 2012:

Those handy little pop-up flags link to recent stories.

You can also filter by industry, citizenship, gender, and source of wealth (inherited or self-made). Interestingly, Bloomberg himself does not appear on the list.

Thursday, May 30, 2013

This spectacular image comes from Infinity Imagined - and shows a day's worth of weather. Photos are from the GOES-14 weather satellite - taken last week, May 22d. The individual pictures are here. Take a look at the full blog for more spectacular images.

Tuesday, May 28, 2013

One more effect of climate change is becoming evident: plants that were frozen under ice for centuries are reviving and growing. Biologists from the University of Alberta have discovered bryophytes that last grew before the Little Ice Age (1550-1850). They are newly uncovered, and appeared to be growing in the wild. They grew in the lab, too. The abstract and article are here. A news report from the BBC is here. Here's what Catherine La Farge, the lead biologist, had to say:

"We ended up walking along the edge of the glacier margin and we saw
these huge populations coming out from underneath the glacier that
seemed to have a greenish tint," said Catherine La Farge, lead author of
the study.

. . .

"When we looked at them in detail and brought them to the
lab, I could see some of the stems actually had new growth of green
lateral branches, and that said to me that these guys are regenerating
in the field, and that blew my mind," she told BBC News.

"If you think of ice sheets covering the landscape, we've
always thought that plants have to come in from refugia around the
margins of an ice system, never considering land plants as coming out
from underneath a glacier."

Friday, May 24, 2013

I had mentioned in a post that I was looking forward to reading "Big Data: A Revolution that will Transform How We Live, Work, and Think" by Viktor Mayer-Schonberger and Kenneth Cukier. Mayer-Schoberger and Cukier live (or at least write) by the law of threes, and they have three good points to make about the big data culture whose development we are witnessing:

We will have so much more data that we won't need to sample;

More data means that we won't need to worry so much about exactitude; and

We will make decisions based on correlation, not causality.

Naturally, each of these ideas, when it gets a chapter of its own, gets developed a little further. (Each chapter has a clever one-word title.) In the chapter "More" the authors point out that, while a truly random sample can be quite small, it can be difficult to obtain one. Systematic biases, for example, often taint the collection process. Further, if you are analyzing a small, random sample, you often do not have enough data points to drill down further into the data. Modern computing power and the huge amount of data now available mean that analysts don't have to limit themselves to samples. Much bigger datasets allow us to "spot connections and details that are otherwise cloaked in the vastness of the information." So, they conclude, with data, bigger really is better.

Large datasets, they go on in a chapter entitled "Messy," will have several types of errors: some measurements will be wrong; combining different datasets that don't always match up exactly will give approximations, rather than exact numbers. But the tradeoff, say the authors, is worth it. They provide as an example language translation programs - simple programs and more data are better at accurate translation than complex models with less data. They are careful to add that the results are not exact. "Big data transforms figures into something more probabilistic than precise."

The chapter "Correlation" explains why it's not so important to know "why" when you can know, through correlations, "what" happens, or, to put it more precisely, what is more likely to happen. As the authors put it, with correlations, "there is no certainty, only probability." As a result, we need to be very chary of coincidence. (We often think we see causality when in fact we have observed correlation. Or coincidence.) They add that correlations can point the way to test for causal relationships.

So far, so good. The authors go on to chapters about the turning of information into data, and the creation or capture of value. The book is written in a breezy, accessible style; it never mentions the term "Bayesian," for example, although that is clearly what the authors are talking about. But towards the end the energy peters out, and the final chapters feel like filler. The chapter "Risks," which raises some entirely speculative concerns - that we might be punished simply for our "propensity" to behave in a certain way, for example - feels rushed and empty. Its over-simplification of the US criminal justice system made me wonder what else might have been altered beyond recognition. So read the first part of the book for its useful outline of what big data entails, but go elsewhere for a more serious discussion of the policy implications.Image via Amazon.com

Wednesday, May 22, 2013

That screenshot? It's the tracks of all the tornadoes over the last 60 years, at least those that caused enough damage to be recorded. It's intimidating, but a little misleading, since that's all of them since 1951. You can see a video of the tracks by year below, produced by IDV Solutions. Each trail (or dot) is an individual tornado; the fiercer the winds the brighter the trail.

Tuesday, May 21, 2013

Thanks to our friends at Climate Central for this handy map showing active wildfires in the US. If you click in you can get the name of the fire, the fire's size in acres, and other information. The map is updated daily.

Monday, May 20, 2013

The graph comes from Thomson Datastream via Derek Thompson of TheAtlantic.com - and it shows that the US economy's performance over the last five years was better than that of comparable developed countries: a shallower recession with a faster recovery. Thompson attributes this performance to the facts that:

(a) control our own currency and (b) used aggressive monetary policy to
save the banks and lower interest rates while running high deficits.

Thursday, May 16, 2013

The New York Times ran an article in today's paper describing the differences in methodology used to develop suicide rates for the military, and the methodology used to develop the rate for the civilian population. The Times says that Pentagon medical statisticians use

a total population figure that includes all
Guard members or reservists who spent any period of time on active duty
in a given year, even if it was only a few days. According to that
approach, the total active military population was about 1.67 million
for all of 2009, a review of Pentagon data shows.

But at almost any given moment, the United States military is much
smaller than that. Another office of the Pentagon, the Defense Manpower
Data Center, the personnel record-keeping office, used a total
population number of about 1.42 million service members in 2009. That
figure was calculated by including only National Guard and reserve
troops who had been on active duty for at least six months in a given
year.

Therefore, because the denominator is too large, the military has been understating the suicide rate. (You can find a reasonably explanation of calculation of a rate here.) Why is this important? Because when the military rate and the civilian rates are comparable, there's less of a problem.

Tuesday, May 14, 2013

The Guardian is running a powerful series on the impact of global warming on life in the indigenous villages in Alaska. Nearly 200 are under threat - that's a threat of washing away:

A study by the US Army Corps of Engineers
on the effects of climate change on native Alaskan villages, the one
that predicted the school would be underwater by 2017, found no remedies
for the loss of land in Newtok.

The land was too fragile and low-lying to support sea walls or
other structures that could keep the water out, the report said, adding
that if the village did not move, the land would eventually be overrun
with water.

The second screenshot shows the extent of Arctic sea ice melt. Climate change is happening fast in Alaska - in addition to villages at risk, animal habitats are changing. The series continues tomorrow.

Update, May 15: See this article in Scientific American about the possible impact of sea level rising along the East Coast: the five foot rise in sea level over the next century will mean that a storm of Sandy's impact could occur much more often.

Monday, May 13, 2013

Update, May 14: You can read Climate Central's take on why this is an important measure here.

Two weeks ago I wrote a post about the Keeling Curve, which measures the concentration of carbon dioxide in the atmosphere and explains why that's important. If you're wondering why NOAA reported that the Earth had, on Friday, the threshold level of 400ppm of carbon dioxide in the atmosphere, but the Scripps Institute of Oceanography did not, there's a simple explanation - time zones. As a note on the Scripps site puts it:

May 10 Comment: NOAA has reported 400.03 for yesterday, but
Scripps has reported 399.73. The difference is similar to other
differences we have reported. The difference partly reflects time zone
differences. NOAA uses UTC, whereas we use local time in Hawaii to
define the start and stop of a given day. Changing to UTC excludes the
lower CO2 period from the baseline on the May 9, shifting it to May 10.

399.73 or 400.03 - both are bad. There's a good roundup of this and other climate news on the blog "Scrapbook of a Climate Hawk," here.

Friday, May 10, 2013

Nutritionists tell you that an easy way to make sure you are eating a balanced diet is to make sure you have a lot of colors on your plates. Here, from the photography website Fstoppers, is a look at a week's worth of - often colorful - groceries from around the world. It's fascinating. The US and UK families have a lot of colors - in their packaging. (That's a screenshot of the UK family, above.)

Wednesday, May 8, 2013

McKinsey (free when you register) has posted an interesting interview with Eric Schmidt, Executive Chairman of Google, on disruptive technologies - those "likely to have the greatest impact on economies, business models, and people." (You can also read a transcript if you prefer.)

Schmidt points out that the main issue is the explosion in knowledge technology:

We’re going, in a single lifetime, from a small elite having access to information to essentially everyone in the world having access to all of the world’s information. That has huge implications for privacy, communications, security, the way people behave, the way information is spread, censorship, how governments behave, and so forth.

The McKinsey editors focus on four areas of Schmidt's discussion:

1. Biology is going digital - in the past few years, much of what was analog in biology, like how proteins are folded or how DNA works, can now be modeled. Proteins are one example - proteins have complex structures that are hard to predict. (If you haven't seen it, the website foldit challenges users to find the best way to fold different structures and predict the most like structure of a particular protein.) Digital tools in biology should improve health care, though medical care is likely to continue its rapid change.

2. New materials, new ways of manufacture - Schmidt points out that new materials can now be manufactured at a large scale, and new means of production, like 3-D printers, are rapidly becoming available. He's not making specific predictions, but the general statement he makes is compelling:

So that revolution, plus the arrival of three-dimensional printing, where you can essentially build your own thing, means that—during the rest of our lifetimes, anyway—it’ll be possible to build very interesting things from very interesting, new materials, which have all sorts of new properties.

This might be both good and bad - there have been reports recently of guns made using 3-D printers - but it is worth thinking about.

3. Using computers to support decision-making. We can think about using computers in all sorts of ways beyond gaming and communicating. Schmidt talks about different interfaces (Siri, anyone?) but captures the essence when he says this:

And the ultimate model is that the computer does what it does well, which is these complicated, analytical needle-in-a-haystack problems, and has perfect memory. And humans do what we do well, which is judgment, and having fun, and thinking about things. The relationship is symbiotic. The computer is making suggestions that are pretty good, they’re pretty helpful, but you’re ultimately in charge.

4. Education is important - machines are taking over what low-wage workers once did - Schmidt's example is supermarket checkouts. That leaves plenty of formerly low-wage workers without jobs. They need better education, Schmidt argues. He follows that with a pitch for more immigration of high-skilled workers. "'[Y]ou want an unfair share of highly educated people."

It's an interesting interview and a great starting point for thinking about the issues Schmidt raises. What do you think of his points? My examples?

Monday, May 6, 2013

These might seem like unrelated subjects, but they're not entirely - both are about how to apply statistical analysis in a context that might not seem like a reasonable candidate. The first, "The Evolution of King James" by Kirk Goldsberry, is about LeBron James and his improvement in scoring:

Over the years, James has attempted thousands of field goals, but
those shots are going in at much higher rates recently. In James's
rookie year he shot 42 percent from the field and 29 percent from beyond
the arc. This year those numbers are 56 percent and 39 percent,
respectively. There are two reasons for that substantial improvement in
his field goal percentage: (1) He's a much better shooter now, and (2)
also a larger share of his shots are close to the basket now.

How did he make the change? And almost more important, how did he know that he wanted to make a change and what change to make? James listened to some commentary. He thought hard about his game. And he changed it, going from a 3-point and wing shooter to a post shooter. Here are his most common shot locations, during James's first and second years in Miami:

The story is about hard work - grueling work - and about using numbers, and context, to guide that work.

The other story, "Solving Equation of a Hit Film Script, With Data," by Brooks Barnes in today's New York Times has generated a large number of comments and shot almost to the top of the most emailed list. when I read the article this morning I was initially in the camp of "you can't measure art" until I got to these paragraphs:

Mr. Bruzzese emphasized that his script analysis is not done by
machines. His reports rely on statistics and survey results, but before
evaluating a script he meets with the writer or writers to “hear and
understand the creative vision, so our analysis can be contextualized,”
he said.

But he is also unapologetic about his focus on financial outcomes. “I
understand that writing is an art, and I deeply respect that,” he said.
“But the earlier you get in with testing and research, the more
successful movies you will make.”

The service actually gives writers more control over their work, said
Mark Gill, president of Millennium Films and a client. In traditional
testing, the kind done when a film is almost complete, the writer is
typically no longer involved. With script testing, the writer can still
control changes.

One Oscar-winning writer who, at the insistence of a producer, had a
script analyzed by Mr. Bruzzese said his initial worries proved
unfounded.

“It was a complete shock, the best notes on a draft that I have ever
received,” said the writer, who spoke on the condition of anonymity,
citing his reputation.

It's partly the comment about context. But it's also partly, I think, the acknowledgment that Bruzzese is doing some interpreting too. What do you think?

Friday, May 3, 2013

Did the water at your East Coast beach seem warmer than usual last summer? That's because it was: sea surface temperatures for the Northeast Shelf Ecosystem, which reaches from Cape Hatteras, North Carolina to the Gulf of Maine, reached a record high of 14 degrees Celsius in 2012, higher than the average of 12.4 degrees Celsius for the past 30 years. That's according to a new report from NOAA's Northeast Fisheries Science Center.

And it's not just the surface temperatures that are increasing - the warm water thermal habitat was at a record high, while cold water habitat was at a record low. Warm water went deeper than usual, and the habitat is changing. What is the impact? According to NOAA,

Temperature is also affecting distributions of fish and shellfish on
the Northeast Shelf. The advisory provides data on changes in
distribution, or shifts in the center of the population, of seven key
fishery species over time. The four southern species - black sea bass,
summer flounder, longfin squid and butterfish - all showed a
northeastward or upshelf shift. American lobster has shifted upshelf
over time but at a slower rate than the southern species. Atlantic cod
and haddock have shifted downshelf.”

You can see the movement in the chart at the top of the post. Or, as Grist.org puts it, "record-breaking temperatures . . . are driving the fish away from fast-heating waters to more hospitable depths and latitudes."

The warming won't affect the appearance of mung seaweed on Cape Cod, at least not according to this National Park Service information sheet. That apparently drifts in from points farther north.

I am quite taken with the way the chart incorporates geographical information to show the movement of species. Do you agree?

Thursday, May 2, 2013

Here's a little more information about the Oregon Health Study's first published results, reported today in the New York Times. Unfortunately, the full article, in the New England Journal of Medicine, here, is behind a paywall. But here's the best take away, from the study's web site:

For uninsured low-income adults, Medicaid significantly increased the probability of being diagnosed with diabetes, though it had no statistically significant effect on measured blood pressure or cholesterol. Medicaid reduced observed rates of depression by 30 percent and increased self-reported mental health. Medicaid virtually eliminated out-of-pocket catastrophic medical expenditures, and increased use of physician services, prescription drugs, and hospitalizations.

This is not nothing, and those commenters who think it is are overstating. But as always, it's important to interpret statistical studies carefully. To its credit, the Times Economix Blog has a post
"What the Oregon Health Study Can't Tell" by reporter Annie Lowrey that does so:

Where it says something, it says a lot: it provides strong evidence
that Medicaid recipients will spend more, use more tests, experience
less depression, have fewer bills sent to collection agencies, and so
on. It shows health insurance working just the way insurance is supposed
to work: protecting the financial stability of the people purchasing
it.

The biometric results are compelling, too. The authors chose a
handful of conditions that were common, important, easy to test for and
treatable to include in the study. Medicaid does not seem to do much to
improve health outcomes related to those conditions in two years.

But there are many more questions that the Oregon Health Study simply
cannot answer, despite the overheated rhetoric out there today. Does
Medicaid improve health over a decade? What might Medicaid do for
lifetime health costs? We do not know, even if the study provides some
clues. Nor could this study answer the question of whether the Medicaid
expansion will be “worth it,” and why. What study could?