Designing scientific experiments requires a lot of thought, if you want to arrive at useful results. Most studies of bicycles fall into two categories, which I’ll call observation-based and measurement-based.

Observation-based studies starts with an observation. In our case, we ride a lot of bikes. Then it goes like this:

We begin to identify trends. For example, the bikes that perform best for us have flexible tubing.

We form a hypothesis: Flexible tubing makes the bike “plane” and thus perform better for us.

We design a test for the hypothesis. We have three bikes made, two that are flexible, one that is slightly stiffer. We ride them in a double-blind test. If we can tell the stiffer bike reliably from the others based on its inferior performance alone, then we have proven the hypothesis. Why three bikes and not two? Because we must make sure that there isn’t another factor that influences the results – frame alignment or bearing tightness. If the two flexible frames feel the same, and the stiffer one is different, it’s unlikely that these other factors are the cause.

Repeat measurements: We repeat the measurements reliably until we are certain that it wasn’t by pure chance that the riders identified the bikes.

Now we have proven that the flexible bikes perform better and thus “planing” exists. Two riders can experience it, so it’s a real phenomenon. (In fact, a single rider would suffice to prove the existence of a phenomenon.) The fact that one rider could not tell the differences between the slightly different frames indicates that different riders have different thresholds for the differences in frame stiffness they can detect.

The next question is how many riders prefer a more flexible frame, how many a stiffer frame, and for how many does it not matter at all? To determine this, we would repeat the double-blind tests with more riders. Unfortunately, such a study would require hundreds of participants, which is beyond our budget.

Measurement-based studies start with measurements. For example, we want to measure which tires are faster. When we design a way of measuring, we need to consider:

Model validation: We need a measurement that replicates real-world conditions. We could easily weigh each tire, and rank them by weight. However, we would have to prove that weight is the determining factor of tire performance on the road. We’d do that by riding the tires on the road and measuring their speed. Does speed correlate with weight? If yes, then we can use weight as a test for speed. Unfortunately, weight and speed are not always related, so we need to find a better test.

Repeatability: We need to make sure that we are measuring tire resistance, and not something else. We do this by running repeat experiments. If our test is good, then the same tires will always produce similar results.

Accuracy: The repeat experiments tell us how accurate our measurements are. If the measurements for the same tires always fall within 2%, then we know our measurements have a “margin of error” of about +/- 1%.

It’s interesting to compare two common methods of testing tires:

Drum tests use a large steel drum. The tire is pushed onto the drum with a force replicating the weight of the rider and bike. By measuring the additional power required to spin the drum, you can measure the resistance of the tire on the drum.

Drum tests are performed under carefully controlled conditions, so their repeatability is excellent. However, the underlying model has not been validated under real-world conditions. If we were to assign grades, we’d say:

Repeatability/Accuracy: A
Model Validation: F

Roll-down tests use a bike and rider on a short hill. The rider coasts down the hill. Measuring how quickly the bike slows down on the flat “rollout” section allows you to compare the resistance of different tires.

Roll-down tests on actual pavement, with a rider on board, occur in real riding conditions, so validation is not a problem. However, many other factors can influence the results: wind, rider position, temperature. We have to show that we were able to keep these other factors constant. We do this by running repeat measurements. If the same tires always score the same, but different tires are different, then we know that we are measuring tires and not the speed of crosswinds. Even so, we’ll probably never get up to the accuracy of a lab experiment. Thus, the grades are:

Repeatability/Accuracy: B-
Model Validation: A

Without real-road validation, drum tests are nearly useless. Roll-down tests can provide the validation of drum tests. If the same tires perform well on the drum and on the road, then the drum tests could be used to obtain greater accuracy in the measurements. Unfortunately, the real-road tests show that drum tests overlook a crucial component: suspension losses that occur in the rider’s body. So we use roll-down tests instead. They may require very careful testing and multiple runs, but at least they provide useful data.

When looking at studies of bicycle performance, I often am surprised that many don’t go the extra step to make their results truly useful:

All tests need repeat measurements. An insider told me once that at the Texas A&M wind tunnel, which was used for much bicycle research, changes in air temperature result in very poor repeatability. The time of day has almost as much influence on your results as the actual aerodynamic performance.

Other studies test bicycles without riders. To make those results useful, the model first must be validated by proving that the rider has no influence on the results.

Rims and other components are designed to improve laminar airflow, yet there is a lot of evidence that the vibrations of real-road riding make it impossible to achieve laminar flow. Again, the model must be validated, for example, by putting a vibrating wheel in the wind tunnel.

That makes it all the more exciting when we see tests that are done well (and there are plenty of them). After all, most of us do not have superfluous time and money to spend on “improved” bicycles and components based on testing that may be well-intentioned, but is too flawed to produce reliable results.

11 Responses to Science and Bicycles: Designing Experiments

Your first experiment resulted not in proof of “planing” but in proof that you can identify more flexible tubing in a bike and that you prefer the results from it. In your description of the test you did nothing to prove that the planing (if you accept as a given that it happens) was the cause of the preference. Nor did you prove if the planing happens and if it truly affects the performance and comfort.

I know this is a long running debate but your first example is flawed and reveals some weaknesses by jumping to conclusions.

You are right, the way I outlined our experiment in the post, we just proved that we can identify frame stiffness. Describing the entire testing procedure and results would have gone way beyond the scope of this blog. However, in the actual test, the flexible frames performed better.

In one of the tests, we did repeat sprints up a constant hill. After a few repeats, our maximum power output decreased with fatigue. However, on the stiffer frame, this decrease was more rapid than on the more flexible frame. There also was a difference in the type of exhaustion: On the stiffer frame, our legs hurt and limited out performance. On the more flexible frame, our legs did not hurt, and our cardiovascular system provided a (higher) limit to our performance.

While some may think that “maximum leg hurting” is not a very accurate measurement, recent studies have shown that “perceived exertion” in fact is quite accurate and replicable in trained athletes.

This means that “planing” is a real, measurable phenomenon and not just something we imagined (“placebo effect”).

I’m wondering: For the hill repeats, were the differences in performance for the rider who could not identify the frames in the previous test the same as for the two riders who could? Once a rider is able to identify the frames the study is no longer blinded (neither single nor double).

Another point: experiments need not only be repeatable between runs but should also be replicable by other researchers. Unfortunately, you and BQ seems to be one of the only ones doing proper experiments on the performance of bikes. Hopefully that’ll change at some point.

And a final point (rather OT for this post): can you provide a citation for the army tank seat/suspension loss study you frequently refer to? I’ve been wanting to read if for a while but haven’t been able to find it or any other articles on the topic. Thanks!

The hill repeats were done only by the two riders who could tell the frames apart. The third rider in the test, who could not tell the frames apart, was more of a touring cyclists who does not do maximum-effort sprints.

The testing still was blinded, and riders went out by themselves, so they did not have the relative performance of the other bike/rider as a comparison. I remember riding the first bike and thinking: “This is pretty good. I think it’s the superlight bike.” Then I rode the second bike, and realized that it was even better. After a few runs, it became apparent that the first bike I rode was the stiffer bike. When the test was unblinded a few weeks later, that was indeed correct.

If I were given only the stiffer bike on a given day, I don’t think I could reliably detect it. However, the difference was very noticeable back-to-back with the superlight ones. Our second tester, Mark, appears to be more sensitive. He seems to be able to tell whether a bike is what we call “superlight” or “standard” even if he doesn’t have a direct comparison.

I agree that the replicability by other researchers is important. We have published our testing methodology, so anybody can do their own testing.

Another issue to try to grok is “testing for what circumstances” – the bike that rider Alice likes best for touring or rather long day rides might be different (softer, longer, whatever) than the bike rider Bob likes for 1hr to 3hr sport rides. Likewise, cyclocross bikes and tires are better for cyclocross than road bike with road tires – but would be kind of an odd choice for a really long road ride.

Further, my personal observation is that a bike can be “set up” to favor one thing or another, and it serves the rider well to set the bike up for either (a) the thing they do the most or (b) the thing they are worst at or struggle with the most. [So I want bikes that help me climb hills as much as possible, because I am very heavy.]

Finally, sometimes the really key thing is to remind people that there are sweet spots for every attribute – tires CAN be too narrow, frames CAN be too stiff, the riding position CAN be too aggressive or too upright.

You are absolutely right – the most important part of any study are the readers, who have to apply it to their own life. The wind tunnel and the experience of racers all over clearly show that a skinsuit, aero helmet and super-low position are fastest, but if I tried to adopt those for a 1200 km brevet, I’d overheat and my back would give out after less than an hour on the road…

We need to remember that a lot of what is called scientific testing is in reality just put out by the advertising department of a manufacturer. Alas, this is even true (maybe especially true) in the field of medications. Pharmaceutical companies ’employ’ doctors with impressive credentials to sign on as the researcher for some very dubious studies proving their product’s worth. Similarly, bicycle companies will pay well-known athletes for endorsements that have little real meaning for the rest of us riders. Tire companies will publish ‘scientific studies’ that lack ‘real world’ value. Thanks for doing your part to “keep ’em honest” and for helping the rest of us to be cautious about much of what passes for ‘scientific evidence’.

Perfect or not, tests like that are more credible and beneficial than the typical useless “The bike is noticeably faster” forum / magazine nonsense when describing a new “lighter, stiffer, more efficient” frame/bike or a wheel set.

Re: Paul Glassen. While I generally agree, I also believe that not everyone is trying to rip us off and there are people who are genuinely curious about how bikes work and do an excellent job of myth busting.
E.g. even the leading bike tyre manufacturers admit although very quietly that wider tyres have lower rolling resistance, Schwalbe being the best example and they do so against their own interest as most of the cycling community is still stuck in 90s with their understanding of how tyres work.

If only the wheel manufacturers could admit that the inertia is almost irrelevant and the idea of fast(climbing) wheels is nonsense. Time for BQ to step in and bust the biggest myth in the universe🙂

The inertia discussion is interesting. Like so many things in cycling, the physics are clear: A lighter wheel “spins up” faster than a heavier one. The question is how much faster?

In our “Lightweight” issues, where we looked at bicycles and weight (Vol. 6, No. 4), we used a bicycle acceleration model to look at the effect of weight. We found that on a superlight wheelset, you’d be half a wheel length ahead after a quarter mile sprint, compared to the heaviest wheelset we could imagine. Other factors will make a bigger difference.

For climbing at constant speed, wheel weight matters as much (or as little) as the overall weight of bike and rider.

Jim Papadopoulos once told me: “Cyclists are very good at determining which factors might be important, but then, they tend to argue over meaningless differences.”

That said, when I was racing, I was convinced that my racing wheelset (which had about half as much rotational inertia as my training wheelset) accelerated faster. Was it all a placebo effect? Or is there something that makes lighter wheels feel faster?

Lighter and stiffer, whether wheels or frames, will always feel faster. Lighter and stiffer more efficiently transmit road vibration to the rider. Since road vibration is proportional to speed on a given bike, more road vibration across bikes gives the sense of more speed.

I think another factor that favors lighter components’ <b<feel is the non-linear nature of wind resistance. Going down a hill the terminal velocity of two bikes doesn’t feel very proportional to their weight. However, going up a hill, the cyclist is doing all the work to lift the extra mass and so feels it every pedal stroke. In other words, one experiences the negative difference going uphill far more than one experiences the positive difference going downhill. Uphill is a double whammy, it’s both longer duration (due to slower speed) and harder effort (pedaling instead of coasting).

A double-blind experiment could look like this: Build two identical wheelsets, and put some weights under the time tape and tube on one. The difference would be invisible. You could time the rider, measure their power output and ask about their subjective impressions of riding with each wheelset.