This blog is about statistics, evolution, nutrition, lifestyle, and health issues. A combination of these issues. The focus is on quantitative research and how it can be applied in practice. But you may see other types of posts here (e.g., recipes, ideas, concepts, theories) from time to time.

Monday, October 28, 2013

The table below is from a study by Hayat and colleagues (). It illustrates one common trend regarding cancer – it increases dramatically in incidence among those who are older. With some exceptions, such as Hodgkin's lymphoma, there is a significant increase in risk particularly after 50 years of age.

So I decided to get state data from the US Census web site (), on the percentage of seniors (age 65 or older) by state and cancer diagnoses per 1,000 people. I was able to get some recent data, for 2011.

I analyzed the data with WarpPLS (version 4.0 has been just released: ), generating the types of coefficients that would normally be reported by researchers who wanted to make an effect appear very strong.

In this case, the effect would be essentially of population aging on cancer incidence (assessed indirectly), summarized in the graph below. The graph was generated by WarpPLS. The scales are standardized, and so are the coefficients of association in the two segments shown. As you can see, the coefficients of association increase as we move along the horizontal scale, because this is a nonlinear relationship. The overall coefficient of association, which is a weighted average of the two betas shown, is 0.84. The probability that this is a false positive is less than 1 percent.

A beta coefficient of 0.84 essentially means that a 1 standard deviation variation in the percentage of seniors in a state is associated with an overall 84 percent increase in cancer diagnoses, taking the standardized unit of the number of cancer diagnoses as the baseline. This sounds very strong and would usually be presented as an enormous effect. Since the standard deviation for the percentage of seniors in various states is 1.67, one could say that for each 1.67 increment in the percentage of seniors in a state the number of cancer diagnoses goes up by 84 percent.

Effects expressed in percentages can sometimes give a very misleading picture. For example, let us consider an increase in mortality due to a disease from 1 to 2 cases for each 1 million people. This essentially is a 100 percent increase! Moreover, the closer the baseline is from zero, the more impressive the effect becomes, since the percentage increase is calculated by dividing the increment by the baseline number. As the baseline number approaches zero, the percentage increase from the baseline approaches infinity.

Now let us take a look at the graph below, also generated by WarpPLS. Here the scales are unstandardized, which means that they refer to the original measures in their respective original scales. (Standardization makes the variables dimensionless, which is sometimes useful when the original measurement scales are not comparable – e.g., dollars vs. meters.) As you can see here, the number of cancer diagnoses per 1,000 people goes from a low of 3.74 in Utah to a high of 6.64 in Maine.

One may be tempted to explain the increase in cancer diagnoses that we see on this graph based on various factors (e.g., lifestyle), but the percentage of seniors in a state seems like a very good and reasonable predictor. You may say: This is very depressing. You may be even more depressed if I tell you that controlling for state obesity rates does not change this picture at all.

But look at what these numbers really mean. What we see here is an increase in cancer diagnoses per 1,000 people of less than 3. In other words, there is a minute increase of less than 3 diagnoses for each group of 1,000 people considered. It certainly feels terrible if you are one of the 3 diagnosed, but it is still a minute increase.

Also note that one of the scales, for diagnoses, refers to increments of 1 in 1,000; while the other, for seniors, refers to increments of 1 in 100. This leads to an interesting effect. If you move from Alaska to Florida you will see a significant increase in the number of seniors around, as the difference in the percentage of seniors between these two states is about 10. However, the difference in the number of cancer diagnoses will not be even close to the difference in the presence of seniors.

The situation above is very common in medical research. An effect that is fundamentally tiny is stated in such a way that the general public has the impression that the effect is enormous. Often the reason is not to promote a drug, but to attract media attention to a research group or organization.

When you look at the actual numbers, the magnitude of the effect is such that it would go unnoticed in real life. By real life I mean: John, since we moved from Alaska to Maine I have been seeing a lot more people of my age being diagnosed with cancer. An effect of the order of 3 in 1,000 would not normally be noticed in real life by someone whose immediate circle of regular acquaintances included fewer than 333 people (about 1,000 divided by 3).

But thanks to Facebook, things are changing … to be fair, the traditional news media (particularly television) tends to increase perceived effects a lot more than social media, often in a very stressful way.

5 comments:

Very nice analysis and interpretation. One thing that is interesting, though perhaps it is not a big effect (?) is that the states above the regression line in the second graph tend to be northern states (e.g., Alaska, Maine, etc.), whereas the states that fall below the regression line tend to be southern (e.g., Florida, Utah, Texas, etc.). In fact, comparing states that are roughly comparable in terms of number of seniors per 1000 individuals, latitude appears to have the largest effect on rates of cancer. Is this possibly an effect of sunlight or photoperiod?

Interesting point Aaron, thanks. This could be tested by including state latitude in the model (north-south only might lead to collinearity), and check the multivariate coefficients after the adjustment (also making sure that collinearity has not gone up to unacceptable levels).

Acute anxiety disorder means 'Acute' means sudden, unexpected, severe, and quick and lasts a short time period. A panic attack, Feeling the extreme fear of a phobia. Acute stress disorder is acute reactions to anxiety or fear or agoraphobia.

Ned Kock

About Me

I strongly believe that lifestyle, nutrition and exercise habits that are compatible with our evolutionary past are the key to optimal health. On the other hand, I do not believe that closely mimicking life in the Paleolithic is optimal for health, or even viable. I am a researcher, software developer, consultant, and college professor. Two of my main areas of research are nonlinear variance-based structural equation modeling, and evolutionary biology as it applies to the study of human-technology interaction. My degrees are in engineering (B.E.E.), computer science (M.S.), and business (Ph.D.). I am interested in the application of science, statistics, and technology to the understanding of human health and behavior. I blog about evolution, health, statistics, and technology. My personal web site contains links to my contact information and freely available articles related to the topics of my blogs: nedkock.com.

Copyright

The contents of this blog may be used with proper attribution. This blog makes limited use of copyrighted material (including tables and figures) for commentary, always with proper attribution and in ways that comply with fair use law.