As anyone with an eye for detail knows, statistics can be very slippery without some background in knowing how to interpret them. Read the steps below to learn how to grasp tricky and misleading statistics and use that knowledge to your advantage.

Steps

Method1

Lying with Averages

1

Understand the terminology. The word “average” gets thrown around an awful lot when statistical data are being discussed. At first glance, the term sounds straightforward enough: the average is the amount that falls roughly in the middle. However, there are actually few different types of averages, all of which can be misleading if not properly understood.

The mean average is reached by adding up all the numbers in a data set and dividing them by the number of entries in the set. In other words, if you have the numbers 3, 3, 5, 4, and 7, the mean average can be reached by adding them together (to get 22) and then dividing the sum by 5 (since there are 5 numbers in the set).

In this example, the mean average is 4.4.

The median average is the number in a data set that falls midway between the lower numbers and the higher numbers. Using the same data as before (3, 3, 5, 4, and 7), the median average is 4, since 2 of the numbers are lesser and 2 are greater.

The mode average is a representation of the most common number in the data set. Using our example set, the mode average is 3, since it appears twice.

2

Lie with mean averages. The mean average might seem like the most foolproof of all the methods described above, but that actually isn't the case. This is because abnormally high or low numbers in the data set can significantly swing the average. To lie with a mean average, gather outlying data and use it in your equation.

For example, imagine you survey 50 households in a neighborhood for their income. Most households make between $40,000 and $60,000 a year, but one household makes $5 million a year. When you compute the mean average, the number will be significantly higher than the “real” average income in that area, because the $5 million number is so much bigger than the others.

In a similar way, if you had data showing that 9 people each had $1,000 in their bank accounts, but a tenth person only has $1, the median average would work out to $900.10 – almost 10% less than the most common amount.

Reputable surveys often throw out the very highest and very lowest numbers before computing the mean average. However, not every survey you see in the news is reputable. Unless you either have access to the entire data set yourself, or see a written assurance that the outliers were removed, it's safer to assume they weren't.

3

Lie with median averages. The median average is actually the toughest number to “lie” with, because it can never be too high or too low compared to most data sets. It must lie in the center by necessity. However, you can use the median average to hide a very large or small number. For instance, if your data set is 1, 1, 2, 3, 4, 5, 3000, the median average is 3.

When you have an even amount of entries, you can reach the median average by finding the mean of the two entries in the middle. This still doesn't account for outliers.

Beware of median averages being used to describe changes over time. A company that raises the price of its services by 3% every year could raise them by 20% this year and hide it by presenting a median average of 3% over the last 9 years.

4

Lie with mode averages. In some things, mode averages are almost impossible to lie with – the average number of tickets purchased per person for a ball game, for example, is almost always going to be accurately reflected by the mode. Nevertheless, mode averages, too, can exclude important data, especially in smaller data sets.

For instance, if you have a data set of all numbers ranging from 1 to 100, but the number 1 is included 3 times, 1 will be the mode average of the set, even though the mean (and in this case, more sensible) average is much closer to 50.

Any survey that rates on a broad scale can be manipulated to emphasize the mode. If you survey 100 people on a scale of 1 to 10 about their feelings on a subject, and more people rate it “10” than any other number, then even if only one more person gave a 10 rating than gave a 1 rating, 10 is the mode average.

5

Lie with representational numbers. If you have a set of data that's defined by abstract, rather than concrete numbers (for example, a customer satisfaction survey), it's almost frighteningly easy to lie with that set. If you ask people to rate their satisfaction on a scale from 1 to 3, that doesn't necessarily prove that customers who chose 3 are three times as happy as those who chose 1. This fact is used to skew mean averages in particular, but can also be applied to median and sometimes, even mode averages.

Method2

Lying With Data Sets

1

Use a small set. Any good statistician knows that the only way to approach a useful average or spot a real trend is to gather data from as broad a set as possible. If you can get information from 100 people, that's good; 10,000 is even better. The more items of information you put into your data set, the more likely it is to end up with accurate averages. By using a set of, say, 3 or 5 data, you can produce results that don't accurately reflect the state of affairs.

For instance, if you find two people who have recently been injured by something silly – like a pillow – and use them as your whole data set, you can make an argument that pillows are categorically dangerous to everybody. No matter which averages you choose to show, as long as you don't reveal your sample size of only 2 people, there's no clear way to refute your claim.

2

Use a controlled set. The most accurate data sets are not only large, they're also broad. A geologist surveying the types of minerals in a desert will have a more accurate list if she collects many samples from every part of the desert, rather than collecting 1,000 samples from the same spot. By limiting the scope of your data set, you can significantly influence the results.

Sometimes, this is useful and done on purpose. People who research using demographic data, for example, might want to find out specifically about the types of jobs that men tend to hold, and therefore will only survey men. As long as this is clearly stated in the data, there's nothing shady about it.

Data from small college research projects in particular tends to get misused to equate a controlled data set with a general result. This is because many research projects at the college level don't have the time or resources to use a broad, random sample of average citizens, and rely only on college students instead. Again, this is fine as long as that information is clearly stated, but news organizations looking for sensational headlines have often obscured the details of a small college study to make it seem much more sweeping.

3

Use an imbalanced set. This technique is especially sly, as it can lie even with a lot of detail provided for the viewer. The trick here is to use data that can't be fairly compared, and treat them as though they're on equal footing. For example, if you have a city of 100,000 that gained 10,000 residents in 10 years, and you compare it to a town of 10 that gained 10 more residents over the past 10 years, the percentages for each gain will seem to show that the small town grew much more rapidly.

This is sometimes used by people who analyze market data to present a misleading picture of sales figures. Let's say you're tracking sales of apples and oranges, but halfway through the study, there aren't any oranges left because there's a shortage. If you continue to compare data for the rest of the study, there'll be a huge spike in apple sales relative to orange sales, even though apples probably didn't suddenly get more popular.

Method3

Lying With Graphics

1

Leave the y-axis blank. Nothing gives a clearer picture of data than a graph or chart, but even those can be subtly manipulated to give different effects. This is because people tend to look at the shapes and sizes on the graphs before bothering to check the numerical specifics attached to them. The simplest way to manipulate the y-axis is to simply not label it.

If you have a set of 5 bars on the x-axis, but no indicator of how tall they are relative to one another, there's no way to gauge whether or not there's any actual significant difference between them.

2

Use very large or small numbers on the y-axis. Let's say your data set ranges between 1 and 50. To hide the differences, measure your y-axis in increments of 100; to accentuate them unfairly, measure the y-axis in increments of 1/10th. A difference between 3 and 10 looks huge when measured in tenths (it's 70 units apart!), but is barely even noticeable on a graph where 100 is the first increment (it's much, much less than 1 unit apart!).

3

Start the y-axis partway through the range. If your data ranges from 11 to 51, you can make the lowest number look even lower, and the highest number look even higher, by labeling your y-axis so that it starts at 10. This makes the bar representing 11 just barely higher than the x-axis. It'll appear as almost nothing unless someone is savvy enough to actually look closely and see that the graph was started from 10 instead of 0.

The bar representing 51 becomes 50 times higher than the bar representing 11 on such a graph, since the smaller bar is only 1 unit high. If the graph had started at 0, the bar representing 51 would have been less than 5 times the height of the bar representing 11.

4

Use improper scaling. Any time you see the words “not to scale” in the fine print, chances are you've run across an example of this. It isn't always done maliciously; sometimes, the numbers involved are so vastly different that there isn't a way to accurately represent them on the same page. However, it can easily be used for unsavory purposes.

For example, a visual representation of size could be drawn to height scale but not width scale, making a taller object (such as a building) also seem much thinner or wider than it actually is.

5

Use graphics to omit data. This is commonly seen in broad surveys that divide results by certain categories, such as the famous chart showing which term for a carbonated beverage is most popular in what county all across the United States. At first glance, such information seems very detailed, but questions soon arise: how broad is the survey data? What is the threshold for determining the result? Is mean, median, or mode average used?

If you were to only use one result from every area you surveyed, and threw out all the rest, you could easily control the results by area without ever divulging that your sample size per area was tiny. Again, it's a lack of concrete information that makes the results so hard to quantify.