A big part of statistics is comparisons, and perhaps more importantly, to figure out what to compare things to. Perspective changes with the baseline.

The left lane on the freeway, commonly known as the fast lane, is sometimes mistaken as the slowest lane on the planet. It's especially weird when there aren't many cars on the road, and you're driving in the fast lane only to find yourself slowed down by the person in front of you who moves at 60% of the speed limit.

The decent thing to do is for the slow person to switch to the right lane so that you can pass. But instead, he carries on with his slowness, so after a mile or two, you switch lanes to pass. You speed up, and then all of a sudden he speeds way up, only to make you the slow one.

I bet I'm the slow one in front more than I am the annoyed one in the back.

Some people are just jerks, but speaking from experience, I think this happens because your baseline for the speed limit is compoased of inanimate objects, such as the road, trees, and signs. It feels like you're going fast until you see someone drive much faster behind you. The baseline for your speed moves up, and your current speed suddenly feels slow.

I've noticed this baseline shift a lot recently with a baby in the house. I used to sleep around 1am, and now 9:30pm seems late; a television with the volume at 15 seemed just right, and now the baseline is at 9; and an everyday errand like grabbing donuts morphed into an adventure.

Nothing changed physically. The clock still ticks at the same speed, the television volume hasn't gone haywire, and the donut shop is in the same place at it's always been. But, everything looks and feels different.

It's kind of like the classic Powers of Ten clip that starts tiny and zooms out farther and farther. Everything looks significant when you look at it from the right angle.

Although the source data is time series in the examples that follow, this is applicable to other data types.When you look at data, it's important to consider this baseline — this imaginary place or point you want to compare to. Of course, the right answer is different for various datasets, with variable context, but let's look at some practical examples in R.

You don't have R on your computer yet?You can just follow along loosely, or you can download and install R, download the source linked above, and follow the code snippets.

So first you have to load the data, which is in CSV format. Use read.csv() to bring it in. We're going to look at the cost of gas, eggs, and the Consumer Price Index, as published by the Bureau of Labor Statistics.

As you might expect, the price rises with a dip in the 2000s. Your concept of the current dollar and historical prices make up your baseline.

Maybe you only care about the monthly percentage changes though more than you do about the actual price. You want to shift the baseline to zero and look at percentages. The code below takes the gas prices, except for the first value (curr), then the prices except for the last value (prev), and then subtracts and divides. If the change is negative — the price dropped from the previous month — a bar is colored green. Bars are gray otherwise.

Black bars, or a positive difference, show when gas was more expensive relative to the present.

There's a problem though. When you compare historical prices, you have to account for inflation. The baseline is not only how much gas costs now, but how much a dollar is worth. A dollar today isn't worth the same as a dollar thirty years ago.

This is where the Consumer Price Index comes into play. It represents how much households have to pay for goods and services. Divide the CPI today with the CPI during a different time and you get a multiplication factor to estimate the adjusted price per gallon of gas. In other words, you want to know how much as gallon of gas during a past year would cost in today's dollars.

The price per gallon of gas is relatively higher these days, but now you see something else in previous decades. Gas was relatively more expensive for a short while. Price hasn't been just a steady increase.

Inflation adjustment isn't the only way to gauge the magnitude of change though. You just need something to compare against. The price of gas increased. Did everything else increase in cost? Try a comparison of gas price and the price of a dozen of eggs.

The 1.0 baseline makes it easy to spot when gas was more expensive and vice versa.

Wrapping up

Whether you work with temporal data, categorical, rankings, etc, always consider your baseline. Does it make sense? Are your comparisons valid? The wrong baseline can lead to exaggerated results or underrepresented ones, so you must be careful. And, if we haven't even touched on uncertainty yet.

About the Author

Nathan Yau is a statistician who works primarily with visualization. He earned his PhD in statistics from UCLA, is the author of two best-selling books — Data Points and Visualize This — and runs FlowingData. Introvert. Likes food. Likes beer. Follow him @flowingdata.

Become a member. Instant access to tutorials and resources. Support FlowingData.