Thursday, October 22, 2009

E-Hub Marketing : A Important Lesson In Statistics


Like the past 100 years, even today cycling products come and go. And with them, so do their marketing sound bytes...

Any intelligent cyclist must carefully inspect marketing data handed to him, and question what is missing and why its missing. Weak data can lead to weak correlations, spurious percentage differences and other logical fallacies. Until the missing numbers are accounted for, I don't advise anyone to take faith in where they put their money.

When James posted a small article yesterday on the E-Hub at the Bicycle Design blog, I got very amused and decided to take a peek at the product website. I spent a little time looking at the interesting item proudly displayed but then had an itching desire to see the numbers behind the invention. Not just plain numbers. I wanted to see if they're meaningful numbers.

This page has a statement from Dr. Alen Orbanić (a University mathematician from Slovenia) telling us that the designers behind the innovation carried out a surefire experiment to prove without doubt that using the E-hub for cycling showed the following things :

1) Increased average power output when compared to cycling with a conventional rear hub.

2) 4% reduction in average and maximal heart rates in cyclists using this product, when compared to the same figure for cycling with conventional hub.

3) 10-15% of blood lactate reduction using the E-hub versus using a conventional hub.


So What Was The Experiment?


Well, I'll tell you the part of it they conducted outdoors. They brought together a population of cyclists from 20-60 years of age. How many? Not specified. Then they categorized them as "Professionals", "Recreational" and "Amateurs". How did they define who belonged where? No indication. What were their weights, fitness levels etc? No indication.

So this population of cyclists were asked to fit themselves with a Polar heart rate measuring system who then mounted Ergomo powermeter fitted MTBs to ride a 2km track (1.24 miles) with 14 degrees of average inclination. Apparently, they did this twice, one with the E-hub and one with a classic hub after 24 hours of rest between the two. Levels of lactic acid were measured twice, immediately after each run with a hub.

Fig 1 : A snippet showing how things were measured by the authors. Typos abound. Click to zoom.

I'm surprised a tad bit by two things. 14 degrees of average inclination? Wow. That is an average of 25% grade. Second, I'm surprised recreational cyclists could manage this effort. Either Slovenian humans are exceptional, or the drive train was really dumbed down for spinning, or something is just plain wrong with this number presented to us. I have written in the past about the W/lb required to maintain a certain speed on a given grade.


So What Does The Data Look Like?


The authors go on to claim they gathered a "vast quantity of data" but for the sake of the reader's reading convenience, they picked 3 'random' data points corresponding to 3 cyclists, for each class of cyclist. I guess this is a solid example of where you can't really thank people for their kindness :).

Here are the numbers :

Fig 2 : 3 randomly selected cyclists in each class showed the above numbers with and without an e-hub. And how were they randomly chosen? No indication so could we not say this is an example of data mining?

Fig 3 : % differences in heart rate and power between the two hubs.

Fig 4 : % differences in average blood lactate between the two hubs.


Right off the bat, I see this is poorly presented data, at least for a professional level. From the surface, I can come up with 3 weaknesses :

1. Sample Points & Averages : There's a rule of thumb in good statistics. You need a minimum of 30 sample points before you do descriptive analysis on it to explain trends.

Take a look at the amount of power these cyclists are producing on this so-called 25% grade, 1.2 mile track. Professionals are producing puny average power outputs while recreational and amateurs are easily rivaling them, not only in power but also in speed.

This leads me to question firstly how the authors classified and defined these cyclists. It seems to me from this meager amount of data that all three classes were almost equal in their cycling abilities?

I also have to say that averages can fool you if data jumps all around the place wildly. For the meager sample points presented above, you can see that the average power is pretty sensitive to outliers.

Infact, if we had been handed 30 sample points or more for each class of cyclist, it is likely the data could have shown a decreased average power, which could have reduced the resultant power differences between the E-Hub and the classic hub. Any guarantee that's not the case? The authors haven't proven it here but go on to artificially bump up the averages using just 3 data points mined from here and there. Furthermore, their conclusions about the apparent efficiency increase with the E-Hub is only relevant for these 3 sample points.

2. Spread : Closely following the absence of more samples is the question, what's the spread and deviation of this "vast amount" of data? I don't have any idea of it as there's no indication of standard deviation. The data is meaningless. How can I tell if a majority of data points in this experiment are close to the average power output or not? What if outliers are pushing the average up?

3. Range : Because only one sample data point (for power, HR and lactic acid) have been presented to us going across for each cyclist, we have no idea of the true range, or the true maximum and minimum values that would be observed. The data point presented to us is just one of what could be many and they are all bound to vary, because that's how all processes are... they vary! Hence, the range could vary pretty significantly if we had more tests on the same individual.

4. Instrument & Measurement Error : Lastly, what about the instruments used? Were they calibrated properly and accurate to other power measurement systems? What's the bias in the system, if any? Are these numbers from just random variability or regression to the mean? It is often taken for granted by some that measurement systems (instrument+human operator) that produce such outstanding numbers are always pin-point accurate.

I simply have to conclude that this data, so far, to me is just meaningless. The rest of the data that follows on the webpage, done on an indoor ergometer, suffers from exactly the same types of weaknesses I have mentioned. These are basic rules to follow in statistics and I'm surprised they weren't in this case.

The product itself may be great. I cannot disagree for certain there. But the numbers don't show me much so far. Thus, I think the declaration that this hub system really improves the efficiency of a cyclist compared to what we usually use must be taken with a handful of salt.



* * *

9 comments:

Phil said...

Ron,

The great nobel prize winning chemist Ernest Rutherford once remarked : "If your experiment needs statistics, you ought to have done a better experiment."

Classic!

Smudge said...

Insightful post as always. From the look of the power and HR numbers for these cyclists, it seems to me as if they were already pretty efficient and fit to start off with, even before they put the hub in there. To believe that recreational cyclists maintain their HR at a mere 167 bpm on a 25% grade track is surely way off the charts for me.

Stan said...

"Take a look at the amount of power these cyclists are producing on this so-called 25% grade, 1.2 mile track. Professionals are producing puny average power outputs while recreational and amateurs are easily rivaling them, not only in power but also in speed."

This is true, except you overlooked that last column with lactate values shows pros produce much lesser amounts than recreational and amateur cyclists. Which means they are inefficient while producing the same power. I don't know how the e-hub affects all this though, no certainty as you say.

Stan said...

Ah, excuse. I meant recreational and amateur cyclist are more inefficient than pro's.

Anonymous said...

This is all pretty hilarious to me. The efficiency claims are true for just 3 sample data points. What's the real story, Dr. Who? Plug in all those 100's of data points you say you've got then give us the percentages. That's the real deal. :-))

Jenny said...

I think you've been a little too harsh on this product. Have you tried it yourself?

T. Myers said...

I'm a doubting Thomas when I see hoopla like this so this post comes as a great lesson! Correlation is not always causation.

tOM Trottier said...

Would seem much simpler to have the crankarm be a spring.

tOM

Anonymous said...

This reminds me of the crazy cranks & components I'd see in the Tiawan section of InterBike. One set of cranks had curved crankarms...Nothing would actually change in bb to pedal axle position/alignment (ie: pedals are still in the same alignment as normal). This didn't stop them from claiming some outrageous increase in power output. Good for a laugh, and I'm still waiting to see a set... anywhere.