PhytoChemia Acta

Popularization

100% identification?

26 January 2019

Popularization – Alexis St-Gelais, M. Sc., chimiste

Analysis reports for essential oils profiles most often report some total of the compounds identifications. In our reports, as in scientific literature, the total almost never makes it to 100%. However, many other testing laboratories consistently report that they identified 100% of their oils in routine. How is that possible?

It all goes down to a few editorial choices made by laboratories. These will pertain to three things: how the total is calculated; how the chromatogram is integrated; and how the oil is injected. But let us start with some illustrations, since this is way more telling than a long discussion.

Here is a portion of a chromatogram of an orange essential oil.

Integrating a peak means “picking” it so it is considered in the analysis. In the software we use to do that (Unichrom), integrated peaks show in green, and the apex is also related to the time axis at the bottom by a small line. So each small line you see means that a peak has been integrated. So, looking at the above chromatogram, we can say that I have been pretty thorough in integrating. See how I picked even very small peaks, barely visible? I have a total of 14 peaks, and I am confident we could identify them all. Yay, 100% identification!

But then, what happens if we zoom in a bit?

Drat! More peaks appear. So I picked them, too – after all, they are part of the sample. Now I am at 29 peaks. Perhaps I could still identify them all, though. So the 100% identification is still within reach.

But hey, why not zooming in even more?

Ouch, now there are even more peaks… I counted 65 of them on this image! And we could zoom in even more, although you can see that we are closing in on the signal noise – that is, where things get blurry because the instrument’s signal is not completely flat (despite what we could think based on the first chromatograms). Here is one last zoom to show the process, with about 88 peaks:

If I was not limited by the power of my detector, it seems this trend would never stop. If I plot the data observed above (number of peaks as a function of “zoom”), I obtain this kind of trend:

You can see a rough S shape in this trend. At the lowest amount of peaks, if you zoom out more, you simply focus on the couple of main peaks of the essential oil, and can hardly get lower without making the analysis useless. If you zoom in more, you tend to reach a cap of peaks, because the smallest ones progressively get lost in the detector’s noise.

In the above examples, the zoom can be seen as a metaphor for the integration parameters. Basically, when dealing with chromatograms, analysts will use a series of tools built in the softwares they use to automatically pick peaks. To do so, they will choose a series of parameters to assist the algorithm in the picking. Tweaking these parameters, you can end up getting more, or less, peaks. At PhytoChemia, we then add in some manual intervention to correct discrepancies in the ingration process, at a fixed “zoom” level. Labs will normally define a setpoint below which peaks will simply be ignored. If that setpoint is high enough, chances are all integrated peaks will be identified. Then, we can reach a 100% identification, because smaller, non-identified peaks are simply dismissed. This is how integration parameters can influence an identified total.

Furthermore, basically, the more peaks you try to identify, the more “insignificant” the added peaks become. Taking the same orange oil chromatogram portion as above, I drew a graph showing this trend. Keeping only the 65 largest peaks, I set the sum of all these 65 peaks as being 100%. Then, I plotted the contribution of the first peak (limonene) to the total identified: 95.5% of the total signal for these 65 peaks is due to limonene alone. This is quite expected in orange oil. Then the second largest peaks adds up to 97.4%, and so on until the 65th peak for 100% identified.

Quickly, added peaks only marginally contribute to the total. 99% identification is reached by the 7th peak, 99.5% identification is reached by the 13th peak, and 99.9% by the 38th (keep in mind this is a portion of orange oil, I discarded part of the chromatogram for the sake of simplicity – the example is not representative of a whole orange oil). But to truly reach 100%, I must identify all 65 peaks. This is hard to do, because we are dealing with extremely weak signals at some point. Let us say we can identify 25. If I had defined 100% as being the sum of the 25 largest peaks, I would have reached 100%. Working at PhytoChemia, we “over-integrate” versus what we can identify, always leaving a number of small unassigned peaks. This would have given me a total identified of 99.77%. This is how the way you calculate the total identified comes into play: change your referential, and the numbers switch around. To some extent, this is a matter of taste.

Finally, if I was using an older GC-FID instrument, with the very same injection parameters, the detector noise would be higher and the smallest peaks would go missing. If, using the same instrument, I injected a smaller volume of oil on the column, the smallest peaks would also “vanish” into the instrument noise. Inversely, injecting more sample and using more sensitive instruments would increase the number of peaks I could theoretically integrate should I want to (it would give more height to the S graph shown earlier). This is how the oil is injected has an impact on identified total.

We have decided at PhytoChemia to be relatively ambitious in the number of peaks integrated (hence also relatively long lists of compounds in our reports, especially since early 2018). This means we integrate lots of peaks that we cannot assign, too, and explains why we never reach 100% identification. Orange oil is one of the simplest case, and we regularly integrate over 250 peaks in a single oil, many of which are unknowns. Our opinion is that these unassigned peaks are still part of the sample, and despite escaping efficient routine identification, should not be omitted for a perhaps misleading “100%” identified result. Other laboratories have different approaches to this. What do you think is the best technique?