So as you all should know, today there has been a lot of talk about the new hottest pepper in the world. The most prestigious hot pepper institute in the world, the New Mexico State University Chili Pepper Institute, had a sample of the Trinidad Scorpion Morouga Blend test at over 2M SHU. So what does that mean? What should you expect? Is it the hottest pepper in the world?

Well before we break this down, let's take a little peek at the numbers. Here they are:

We need to know how they got these numbers and what they mean before we analyze them. Here is how the experiment was carried out:

Four plants from each of the five super hot strains were selected at random. 25 pods from each of these plants were picked, dried, and ground. This leaves four samples of each strain, each containing the dried powder from 25 peppers. Each sample was divided up and analyzed at three different labs. One sample of the Trinidad Scorpion Morouga Blend tested at around 2 million SHU at one of the labs. To better understand what's happening here, I created a sample graph showing the results of the experiment. I estimated the data points based on the given max, min, and mean values, as well as past known results. This graph is only for the TSMB plants.

I want you to note a few things from this graph:

1. Each plant has three data points, one from each lab (estimated). They are 100 SHU apart for a total spread of 200 SHU.2. This graph goes from 0-2M SHU. Note how incredibly far apart all of these data values are, and how little 100,000 SHU looks like on the graph! This data is ALL OVER THE PLACE.3. There are two means on the graph. The top (red) one is for the TSMB mean, and the bottom is the Chocolate 7 mean. Note how close they are together in comparison to the rest of the data points.

Based on these data points we can come to ONLY ONE conclusion, and a weak one at that:

The TSMB has produced the hottest chili sample ever tested by HPLC.

That's it. Nothing more. All of the other results are inconclusive, but they are fun to look at. The CPI took four samples, and used three labs to clarify the SHU reading of each one. This reduces the amount of error that they can have by using four standardized SHU measurements, but they still failed to use enough samples to make their results look worthwhile. Simply put, they accounted for the error of the HPLC measurements, but did NOT account for the large standard deviation of pepper heat between plants/pods.

What happens to our means if we add a fifth sample? All we need to drop the mean of the TSMB measurement down below the Chocolate 7 mean is one perfectly reasonable new data point. In fact, if a new sample of TSMB came in at 900,000 SHU (average from each lab) and all of the other means stayed the same, the Chocolate 7 would have the new highest mean.

With only four samples, who's to say that if they tested 10 more plants that the Chocolate 7 wouldn't have tested 2.1M in one sample? Frankly, with a standard deviation this high we can't say anything about which one is hotter and which one can create hotter pods. However, this test study does give us a good idea of how hot these pods are. The data isn't useless. I would say that we can say with relative certainty that the Bhut Jolokia, 7 pot, and possibly even Trinidad Scorpion are not the hottest pepper in the world. I wouldn't say that we can be 100% certain, but we can at least have a decent amount of confidence. There's always the possibility that the lowest sample of the TSMB was a fluke and it in fact averages much higher and CAN be called the hottest in the world. Of course we will have to wait for future testing to see that.

For future tests, I would recommend a data set containing samples from at LEAST a dozen plants, with tests done on the Chocolate 7, TSMB, Trinidad Scorpion, and the Douglah. There should also be testing done confirming that the Brain Strain and TSMB are the same strain, otherwise the Brain Strain should be tested as well. This will give us a MUCH better picture of which pepper can in fact be called the hottest in the world. Once we have tested enough samples we can form a bell curve for each of the hottest strains and more accurately describe the range and average heat a chili plant will produce under a given set of conditions. It's a nice thought, but a lot of work.

Hopefully this helps you guys understand this study and helps you in finding the hottest chilies for yourselves. Quite frankly there are a lot at the top and we can't call any one pod the hottest at the moment. The only way to see what you like for heat and flavor is to try your own.

cool I didn't realize they were doing this, and yes, I would think if they really were testing for the hottest pepper in the world, and wanted to be anywhere close to creditable (in order to have a more conclusive result) I would go with what you said and at the very least test a dozen plants.. I don't know why they wouldn't test 10 pots from 20+ plants or so.. but oh well.. and I don't know why they wouldn't test a few more strains.. either they were lazy or they didn't have the funding.. which really doesn't make too much sense because it doesn't seem that it would cost a lot to add a few dozen peppers to the mix lol

it also would have been really cool if they had the funds to get pepper plants shipped from different parts of the world that were known for super hot peppers!

also.. IS the TSMB the same as the brain strain?, because if it isn't I have some pepper seed searching to do this spring.. I want a few more of these rare/unique super hots lol.. especially the TS chocolate!

ps. oh and last question that I was going to ask you before, do they loose heat when you dry them?

cool I didn't realize they were doing this, and yes, I would think if they really were testing for the hottest pepper in the world, and wanted to be anywhere close to creditable (in order to have a more conclusive result) I would go with what you said and at the very least test a dozen plants.. I don't know why they wouldn't test 10 pots from 20+ plants or so.. but oh well.. and I don't know why they wouldn't test a few more strains.. either they were lazy or they didn't have the funding.. which really doesn't make too much sense because it doesn't seem that it would cost a lot to add a few dozen peppers to the mix lol

also.. IS the TSMB the same as the brain strain?, because if it isn't I have some pepper seed searching to do this spring.. I want a few more of these rare/unique super hots lol.. especially the TS chocolate!

I would say funding might have had something to do with it as I think the HPLC is expensive and/or time consuming to perform? Never done more than simple paper chromatography so I can't say but I've heard that it costs around $40-50 to send out a sample for testing. They sent out 40 for testing so that's a lot of money.

The TSMB is theorized by many to be identical to the brain strain. I might take samples of powder from both the TSMB and the Brains that I grow this year and send them to the CPI for genetic testing... that'd be cool as heck!

I would say funding might have had something to do with it as I think the HPLC is expensive and/or time consuming to perform? Never done more than simple paper chromatography so I can't say but I've heard that it costs around $40-50 to send out a sample for testing. They sent out 40 for testing so that's a lot of money.

The TSMB is theorized by many to be identical to the brain strain. I might take samples of powder from both the TSMB and the Brains that I grow this year and send them to the CPI for genetic testing... that'd be cool as heck!

oh well that explains a lot, I didn't know how they exactly do the tests, (just googled it) and yeah I've done a few of those and if I remember correctly they didn't take too long doing only small samples, so I would think large samples would take a while, but mainly I think the machines are crazy expensive.. like enough where I would rather buy a car lol, at least that is what the professor made it sound like.. but anyways that makes sense.. that is interesting that they do it like that, but it makes sense.. I only wish that I was into peppers when I was at college at Eastern because I would have been growing a dozen or so chili plants in the huge/new greenhouse setup they have now in the new science building and then using the chem labs HPLC tester to do my own tests haha.. wouldn't have gotten much other work done though, but it would have been pretty sweet.. oh well, maybe at the end of this summer I can go up there with some pods and ask my bio and chem professors if I could give it a shot.. if they are setup for it, if at all possible.. I was already planning on giving my bio professor who is a botanist one of the chili plants if he wanted for he big greenhouse

nice, thanks..I was thinking when walking around the house that I remember taking a while to extract what we put into the machines at school.. it's been a few years lol..

and Matt, (MRZ), is there a link to the article, I was looking for it online but it said that it was unavailable, and couldn't find it after a quick search.. thought it would be pretty easy to find.. but I guess not

bottom line, what you are saying is that the methods used in testing would not result in a statistically valid answer...

these test, as you say, are indicators and not set in stone...

don't get me wrong here, because I am sure the folks performing these tests are a lot smarter and better trained than I am, but many questions about the testing comes to mind...

5 different varieties, four different plants/variety, 25 pods/plant/variety that were ground into powder and split three ways for testing at three different labs...lots of room for error here IMO...and did they save any of the original sample powder ?

Did each lab follow the exact same procedure? Did each lab use the exact same equipment from A to Z during their process? These two questions are what I would key in on for test evaluation, but as you know, I would never be able to get all this information...

I have to admit that it has been a very long time since I had to set sample size for testing and evaluation and the testing I did involved "Black Boxes" on fighter aircraft and probability of failure...but, I will tell you this, sample size of 4 is no where near enough for a statistically valid test...I have not taken the time to calculate an acceptable sample size for statistically valid testing but would be surprised if the sample size would be less than 70...I worked at the University of Florida's Dairy Research Unit back in the early 80s performing Radio-Imuno assays during some swine reproductive physiology research and believe we used sample size of 100 and three different test...all using the same equipment and performed by the same technician...results were not only consistent but were consistent between samples...

***as a side note, the results of my testing showed that a sow spikes progesterone (I think that's what it was) at exactly 30 days to the hour of pregnancy and that spike indicates the number of pigs she will have...i.e. the higher the spike, the more piglets running around****

I think the HPLC testing costs over $200 each from an independant lab...when you start talking 70 samples of 5 different varieties at three labs, you are talking bunches of money...using my guestimated numbers results in 1050 tests and even at $20 each...so you can see the magnitude of cost involved...and I would be surprised if any university would spend $20K on testing alone...

IMO, these numbers should be used as indicators of the potential for the different plants/pods and not cut and dried "fact"...it is a fact that one sample tested at 2 mil but so what, another sample tested a lot lower...

I have rambled enough...either way, Congrats to NMSU CPI for their continued research into the chili pepper business...

Mrz thanks for your analiysis. It does put things in perspective. I wonder if the Trini Scorp tested was the Butch T?

There is one word in AJ's post that I would like to emphasize and that is "potential". They all have the potential of being number one on the heat scale. So many different variables invloved in growing these monsters that I would dare someone to actually say they're pepper is the hottest. Truth in advertising would insist a seller of seeds to say their particular product is capable of reaching record levels.

On a personal note I've grown, should say attempted to grow, the TS Morouga Blend the past two years with little success. Just can't seem to get many pods off of them. This year I decided to try a little harder and planted more seed than ever before. Now its getting the label of the new king. I find that kind of strange. Cool, but strange.

IMO, these numbers should be used as indicators of the potential for the different plants/pods and not cut and dried "fact"...it is a fact that one sample tested at 2 mil but so what, another sample tested a lot lower...

I have rambled enough...either way, Congrats to NMSU CPI for their continued research into the chili pepper business...

Thanks AJ This is exactly what I was getting at as a conclusion. I trust the CPI and I think that they are by far the leading experts in the scoville unit testing. I like HPLC for testing, and I think that the numbers that it gives can be very valuable. I also like that they used three labs instead of one to account for error, so I think their readings of each sample were pretty spot on and I could be okay with them using the mean of the three readings as a standardized measurement. I'd be even more comfortable with that if I knew that the labs frequently spit back similar readings, like one lab always had the hottest reading while another lab always had the least hot reading of the same sample. Then you can more accurately say that it is the manner in which the testing is run and not the accuracy of the testing that is skewing their results.

My problem doesn't lie with the HPLC which I find to be a nice measurement, it is the wide variation between pods/plants that I think should be better accounted for with a larger sample size. But, like you said, it's all about the cost of running an experiment like that, which is just not worth the money. I'd almost rather have them run hundreds of data points in their own lab than waste money trying out other labs. It might be better to run 3 measurements of each sample in the CPI lab. Having a 2M SHU result might bring more money into the school though and hopefully that helps fund more precise tests in the future

Mrz thanks for your analiysis. It does put things in perspective. I wonder if the Trini Scorp tested was the Butch T?

There is one word in AJ's post that I would like to emphasize and that is "potential". They all have the potential of being number one on the heat scale. So many different variables invloved in growing these monsters that I would dare someone to actually say they're pepper is the hottest. Truth in advertising would insist a seller of seeds to say their particular product is capable of reaching record levels.

On a personal note I've grown, should say attempted to grow, the TS Morouga Blend the past two years with little success. Just can't seem to get many pods off of them. This year I decided to try a little harder and planted more seed than ever before. Now its getting the label of the new king. I find that kind of strange. Cool, but strange.

I would say that the capability of reaching record levels is exactly what this test showed. Some fun numbers to look at too, but that is the main conclusion that we can draw here. Heck they had samples of the Bhut that beat the Butch T test from several months back. I would never call the TSMB the new king. I grew it last year and while there were a couple pods on there that made me want to claw out my tongue in pain there were others that I'd have trouble calling super hot. Just a confusing plant and I think the results emphasize that fact too.

Max-- I got most of my information from this article and straining to read some of the poster. I managed to be able to read some of it which was how I learned about how they did the testing and got their results:

I don't have good answers to any of those questions because I don't have access to the precise lab procedures at each location or even the concrete data from each lab. I'll try to answer them anyway:

1. Probably not but I think they are all calibrated using the 16M benchmark for pure capsaicin.
2. Not sure what you mean here by 'validated'. I would say if you get 3 tests back within a decent margin of error then you can say that your results are pretty conclusive, but I'm not sure how spread out their results were. There is no standardized lab to 'validate' the results.
3. I can't answer this one but I would like to think that they had a good cleaning procedure to avoid contamination. I'd like to think that the folks at the leading chili pepper research institute don't have their heads up their asses.

Without seeing the concrete data from each lab and each sample I can't really answer your questions that well but when the hort science paper comes out we might be able to answer more of these questions.

2. Not sure what you mean here by 'validated'. I would say if you get 3 tests back within a decent margin of error then you can say that your results are pretty conclusive, but I'm not sure how spread out their results were. There is no standardized lab to 'validate' the results.

Method validation is a process by which you demonstrate the ability of your test method to perform to a set of agreed upon acceptance criteria. You do this by successfully executing a validation protocol, which sets out specific acceptance criteria. For quantitative methods like this one, it means setting up agreed upon acceptance criteria for things like ruggedness, robustness, repeatability, linearity, and limit of quantitation (limit of detection is also typical, but maybe not so relevant here, since we're more worried about the high end of things.)

The way it usually works, a method is written up, tried out to get a feel for its expected performance, and then a protocol is written up, with a series of tests designed to evaluate that performance, typically using "synthetic samples" to show it gives the values predicted. The protocol is executed, a report is written up. At every step, the work is checked by additional subject matter experts and others to ensure the quality and reliability of the work. This is a very rough overview of method validation following Good Laboratory Practices, or 'GLP'. Once you've done this in one lab and have it well documented, you then proceed to Method Transfer", where you show that another lab can deliver similar performance. It's a royal pain in the ass, but it's also the only way to understand how reliable your data is.

There's no evidence that they did anything like that. Method validations are tedious and expensive, and unless there's a commercial interest/regulatory requirement, you're not gonna just do it for fun.

(That shouldn't be interpreted as me saying they're bad people, or that they're doing bad science. It just means that their testing hasn't been run through the paces that you'd expect in a pharma lab.)

I really want to thank you, mrz1988, for a really interesting topic -- anything that gets people thinking critically about analytical data is a great thing.

As soon as I read about a measurement of 2 million SHU's, my spidey senses started tingling that there were a few details missing. It seems dishonest to use whatever the highest value in a dataset as the official value, since the mean is still quite a bit lower. I'd feel better about it if they'd actually had published all their values, so that they statistical significance could really be evaluated. They mention that they used Duncan's multiple range test, but for chromatography results, I would rather see the use of an F-test to evaluate the claim that one variety is hotter than another. A Student's T-test would also be handy to evaluate whether or not that 2 million value is an outlier or not. It might be, then again, it might not be.

By the scatter, we can come up with some ideas for how many datapoints would be needed in a second round of testing to really test these claims. At this point, they've done enough work to start a decent DOE, or at the very least, a second round of more detailed testing. I really hope Paul Bosland & Co. will make their poster available as a pdf in the near future, and will publish something more extensive (and peer reviewed) in the near future.

I'm always annoyed by studies like this. Yes, a lot of sample design comes down to monetary concerns, but there are certainly other questions to be asked. First of all, I don't know what the growing conditions were like here, but if we're going to be comparing these plants I would like to see the individual cultivars randomly assigned within the same plot. Sending the samples off to several labs is nice, but it only accounts for analytical error and not so much experimental uncertainty.

We've begun to answer the question of the heat distribution between each variety, yes, which is what the sellers care about. However, at least as important, I also ask: What is the heat distribution within each variety? Within each plant? Is the data normally distributed? If not, I'd also be curious to know the median values.

Now let's go to a real-world scenario. Most of us, save the crazies out there (highly concentrated in this community) are not heavily controlling the growing conditions of all of our plants. Let's take a samples of each variety from around the world from many growers, professional and amateur, and see what the distribution is like. Maybe then we can begin teasing out the answer to the question of what truly makes for a hotter pepper.