Analogous to chemometrics, geochemometrics can be defined as the science resulting from the combination of statistics, mathematics and computation with geochemistry. This term in Spanish geoquimiometría has already been explicitly used in the literature. Here I elaborate on the numerous basic subjects or areas that the geochemometrics should cover. These include, but are not limited to, the following research topics: data quality, regressions, robust methods, outlierbased methods, significance tests, error or uncertainty propagation in diagrams through Monte Carlo simulation, correlation coefficient, petrogenetic modeling, and geothermometers. Equations for uncertainty propagation in analytical work have also been proposed; similarly, new indications are provided on how to calculate and report the sensitivity and limit of detection of analytical experiments. The conventional linear correlation coefficient, though useful for noncompositional data, is not recommended to be used for interpreting geochemical data. Because compositional data represent a closed unitsum constrained system and ternary diagrams impose a further unitsum constraint on any experimental data, these diagrams become statistically unsuitable to handle experimental data, whether compositional or of continuous variable type. Error propagation through Monte Carlo is reported for the first time to illustrate the inconvenience in using such ternary diagrams for compositional data, and an alternative logtransformed bivariate diagram is proposed to replace (or at least complement) ternary diagrams. Topics of further research have been identified, in particular, those applicable to all science and engineering fields.

According to Wikipedia, the free encyclopedia, the term chemometrics can be defined as "the science of extracting information from chemical systems by datadriven means. It is a highly interfacial discipline, using methods frequently employed in core dataanalytic disciplines such as multivariate statistics, applied mathematics, and computer science, but to investigate and address problems in chemistry, biochemistry and chemical engineering. In this way, it mirrors several other interfacial 'metrics' such as psychometrics and econometrics."

The term chemometrics was first coined by Wold almost 40 years ago in 1972. Several reviews have been written on the subject (e.g., Geladi and Esbensen, 1990; Esbensen and Geladi 1990; Lavine and Workman, 2008). Furthermore, numerous books on chemometrics (e.g., Otto, 1999; Miller and Miller, 2005; Bruns et al., 2006) and the journals "Journal of Chemometrics" and "Chemometrics and Intelligent Laboratory Systems" are dedicated to this subject. Other journals, such as "Analytica Chimica Acta", have a section on chemometrics.

Analogous to chemometrics, geochemometrics can be defined as the science resulting from the combination of statistics, mathematics and computation with geochemistry. This term in Spanish geoquimiometría has already been explicitly used in the literature (see Verma, 2005).

In the present paper, my aim was to identify the main areas of thrust to illustrate geochemometrics and point out future investigations that could lead to improvements in Earth sciences. The most important topics are as follows: Monte Carlo simulation, data quality, instrumental calibration, sensitivity and limits of detection, error propagation in ternary diagrams, discrimination diagrams, and geothermometers.

MONTE CARLO SIMULATION

The term "Monte Carlo" was first coined in 1940s after the Monte Carlo casino in Monaco. Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. According to Hammersley and Handscomb (1964) the name and systematic development of Monte Carlo methods date back to about 1944. Because of their reliance on repeated computation of random or pseudorandom numbers, these methods are most suited to calculations by a computer. The only quality usually necessary to make good simulations is for the pseudorandom sequence to appear "random enough" as discussed by Law and Kelton (2000) and exemplified by Verma and QuirozRuiz (2006a). Actually, Monte Carlo methods began to be investigated only after the availability of electronic computers from 1945 onwards. In the 1950s the early development took place in relation to the hydrogen bomb and soon became incorporated in the areas of physics, physical chemistry and operational research as well as many other scientific and engineering fields. Because Monte Carlo methods require very long sequences of random numbers, pseudorandom number generators began to be developed, which were quicker to use than the tables of random numbers previously used for statistical sampling. Güell and Holcombe (1990) presented an account of Monte Carlo techniques (experiments on random numbers) that might be useful for analytical applications.

In the area of small size sampling (up to 30) from a normal distribution, Dixon (1950, 1951, 1953) and Grubbs (1950) pioneered the field of Monte Carlo simulation by estimating critical values for their respective discordancy algorithms (Barnett and Lewis, 1994; Verma, 1997, 2005). This initial work (Dixon, 1950, 1951; Grubbs, 1950) was carried out in U.S.A. for military purposes. Nevertheless, those critical values were obviously approximate and were quoted to two to three decimal places. Later, with the availability of faster computers Grubbs and Beck (1972) extended the earlier critical values of Grubbs test to larger sample sizes of up to 147. Other workers (e.g., Rosner, 1975, 1977; Prescott, 1979; Jain, 1981) also simulated critical values for single to multipleoutlier discordancy tests.

More recently in Mexico, Monte Carlo simulation has provided significant advancement for obtaining new critical values of discordancy tests for very large sample sizes up to 30,000. Thus, in a series of papers (Verma and QuirozRuiz, 2006a, 2006b, 2008, 2011; Verma et al., 2008a), Verma and coworkers reported highly precise and accurate critical values for 33 discordancy test variants.

]]>
Monte Carlo methods should be useful for regressions, instrumental calibrations, and other purposes. As an example, EspinosaParedes et al. (2010) applied this approach for the evaluation of functioning of nuclear reactors. Another novel application of Monte Carlo simulation is presented below in the section of evaluation of errors in "Ternary diagrams".

DATA QUALITY

Data quality in Earth sciences in general and geochemistry in particular should be an important area of research that could be considered an integral part of the science of geochemometrics. When we talk of data quality, we are in fact considering two main aspects precision and accuracy of results. The first parameter depends on the quality of instrumental calibration, for which different regression models have been used, as well as of the measurement of the "unknown" material. These aspects will be further discussed in the next section. The second parameter (accuracy), on the other hand, requires adequate reference frame for its quantification. The other important parameters such as uncertainty and traceability are closely related to the precision and accuracy. The uncertainty of a parameter depends on the analytical error expressed in terms of the standard deviation, the number of measurements used for computing the standard deviation, and the Student t value for the appropriate degrees of freedom of the experiment. The section of "REGRESSIONS" below gives more information on this topic. The traceability refers to ongoing validations that the measurement of the final product conforms to the original standard of measurement; therefore, an adequate reference material or materials or synthetic standards are required.

Because geological materials, in general, represent the most complex matrices to be analyzed (theoretically, for example, rocks contain all stable and unstable or longlived radioactive elements of the Periodic Table), inherent in the data quality are the analytical aspects such as instrumental sensitivities and limits of detection (LODs) for different elements. Sensitivities are seldom reported by researchers, but LOD reports are more common. Unfortunately, there is no consensus regarding how to quantify the LOD values (e.g., see IUPAC, 1978; Ferrús and Egea, 1994; Faber and Kowalski, 1997; Kump, 1997; Mocak et al., 1997; Zorn et al., 1999; del Río Bocio et al., 2003; Miller and Miller, 2005). According to the Vocabulary of Metrology (VIM), LOD can be defined as follows: "Measured quantity value, obtained by a given measurement procedure, for which the probability of falsely claiming the absence of a component in a material is p, given a probability a of falsely claiming its presence. IUPAC recommends default values for a and P equal to 0.05". Nevertheless, the estimation of LODs is generally not carried out chemometrically, i.e., a sufficiently large number n of measurements are not involved (at least n=30 has been recommended  see the following paragraph).

During the first decade of this century, Verma and coworkers, in a series of papers (Verma et al., 2002, 2009a; Santoyo and Verma, 2003; Verma and Santoyo, 2003a, 2003b, 2003c, 2005; Santoyo et al., 2007), demonstrated that the LODs for practically the entire Periodic Table, especially the rareearth elements, show a systematic behavior likely governed by elemental abundance in the universe and indirectly by the well known oddeven effect of nuclear stability (the OddoHarkins rule; see Kaplan, 1963 or Verma et al., 2009a), and that the LOD should be determined from n=30 or more measurements. Thirty measurements as a minimum are justified from the consideration of the Student t value (see table A1 in Verma, 2005), which must be taken into account for obtaining uncertainty estimates  see the section of "Sensitivity and limit of detection (LOD)".

The oddeven systematic behavior of LODs has been independently confirmed by other workers (e.g., Tsakanika et al., 2004; RodríguezRíos et al., 2007) and discussed in a review article by Bacon et al. (2006). More work on these lines particularly that leading to a theoretical explanation of such a systematic behavior, is highly desirable. The suggestions on how to calculate LOD are given in the next section.

To obtain accuracy estimates or traceability in the analysis of geological materials, it is mandatory to analyze appropriate, preferably certified, reference materials. Unfortunately, in spite of a large number of reference materials available for geochemical analysis (e.g., see Potts et al., 1992; Govindaraju, 1994; Jochum and Bruckner, 2008; Jochum and Nohl, 2008), none of them is certified for all chemical components, not even for all components of interest in a geochemical study. In fact, this was the objective set forth by one of the pioneers K. Govindaraju during more than two decades (19701995), but unfortunately, this aim was never achieved. The international community should pay proper attention to this geochemometrics aspect of fundamental research. In my opinion, instead of proposing new reference materials, we should first concentrate on a few selected reference materials to try to certify them for practically the entire Periodic Table, or at least all those chemical elements that would be useful in geochemometrics, for example, petrogenetic modeling, geothermics, or multidimensional discrimination diagrams.

To start with, if we could count on at least a few wellcertified geochemical reference materials for most elements of geochemometric interest, we might be able to make it mandatory that the Earth science community report their analytical data adjusted to some of these certified reference materials as is customary in isotope geochemistry for reporting, for example, Sr and Nd isotopic compositions. At least, we could reach a consensus on a few geochemical reference materials, which should always be analyzed and reported in geochemical studies for quality control purposes. The very diverse matrices to be analyzed in geochemistry make this simple proposal less viable. Nevertheless, if, for example, all laboratories analyzing basic and ultrabasic magmas were invited to report data (mean, standard deviation, and number of measurements) on just one reference material out of BIR1, BHVO1, or BHVO2 from the U.S. Geological Survey (U.S.A.), or JB1a or JB3 from the Geological Survey of Japan (Japan) and if that particular material were well characterized (certified) for most, if not all, elements of geochemical interest, this would be a very important step forward in geochemometrics for minimizing systematic errors and adequately handling large databases such as those attempted by Agrawal et al. (2004) or Verma et al. (2006). Thus, the individual data from different laboratories or publications could be adjusted for possible bias before their use in interpretations.

]]>
REGRESSIONS

Geochemometrics could also address the topic of regressions related to instrumental calibrations for analysis of geological materials such as rock, ash, soil, mineral, water, and gas, among others. Traditionally, such calibrations have been achieved through an ordinary leastsquares linear regression (OLR) model (for more details on regression techniques see the classic book by Draper and Smith, 1998). However, weighted leastsquares linear regression (WLR) models should be considered more appropriate for this purpose (e.g., Mahon, 1996; Baumann, 1997; Zorn et al., 1997; Asuero and González, 2007). Alternatively, robust regressions might be more desirable than OLR models (e.g., Hinich and Talwar, 1975; Rousseeuw and Leroy, 1987).

More recently, this adverse situation of calibrating instruments through OLR is changing especially for the calibration stage of data collection, because WLR models are being increasing applied prior to geochemical analyses (e.g., Santoyo and Verma, 2003; Guevara et al., 2005).

Instrumental calibrations for chemical concentration measurements can generally be expressed as the following general linear equation:

where a is the intercept term with sea being its standard error and b is the slope (or sensitivity term) with seb being its standard error. The concentration term (xaxis, Conc) will also have standard error of sec, or standard deviation of sc associated to its estimation for individual calibrator or reference material (RM), which should always be quantified. Similarly, the response term (yaxis, Resp) will have standard error of ser, or standard deviation of sr for each calibrator, which can also be estimated.

I propose that we should try to switch from the error concept to uncertainty or confidence limits in all such considerations although it is difficult to do so, because error propagation is a very commonly used term; in reality, we are instead dealing with uncertainty propagation.

Weighted leastsquares linear regression (WLR)

Our aim is to present the equations useful for calibration in the concentrationresponse (CR) space, where C and R are, respectively, the x (independent) and y (dependent) variables of xy regression line. Let us assume that we have a total number n of calibration reference materials (RMs) with concentration Ci and standard deviation SCi estimated from mi measurements (or replicates).

I recall that before the calculation of the central tendency (e.g., here Ci) and dispersion (e.g., SCi) parameters, it is mandatory to ascertain that all replicate (in this case, mi) measurements be free from discordant outliers, which can be easily done by computer program DODESSYS (Discordant Outlier DEtection and Separation SFStem; Verma and DíazGonzález, 2012). Both normal and lognormal distributions can be handled by the present version of DODESSYS that can handle data arrays of sizes up to 1000; future version of DODESSYS will be able to handle larger sample sizes of up to 30,000. This applies to all such situations described in this paper.

]]>
The uncertainty uCi in the concentration of ith RM can be calculated as follows:

where t(mi1) is the Student t critical value for (mi1) degrees of freedom for the desired confidence level (generally 99% or 95%, twosided), or significance level of 1% or 5% (a of 0.01 or 0.05). If the t value for the required degrees of freedom were not tabulated, it can be estimated from the interpolation equations put forth by Verma (2009), or obtained from other sources such as R Development Core Team (2009).

Similarly, let the response Ri for the ith RM have standard deviation of SRi estimated from qi measurements. Its uncertainty uRi can then be calculated from:

where t(qi1) is the Student t critical value for (qi1) degrees of freedom and the chosen confidence or significance level (99% or 95%, but this level should be the same as that used for equation (2) and other equations below).

In this way, we have n values of RM concentrations (Ci) and responses (Ri) with their respective uncertainties (uCi and uRi ). First, we obtain the equation using OLR model as follows:

where a and b are the intercept and slope (sensitivity) terms, with respective uncertainties ua and ub.

Now, our aim is to estimate the total uncertainty ui of each data point CiRi used in the calibration. The basic idea is to assign the uncertainty in the xaxis variable to the yaxis variable, thus making the xaxis variable as "errorfree" and yaxis variable as having the total uncertainty in the data point under evaluation. Therefore, this can be achieved approximately as follows:

]]>

Because the OLR (equation 4) is not the statistically appropriate model (for the reasons see e.g., Guevara et al., 2005 or Verma, 2005), the weighing factors for the WLR model can be estimated from the following equation:

That is, the sum of all weighing factors is equal to the total number of paired data n (CiRi). Thus, the WLR differs from the OLR in such a way that the weighing factors are redistributed as inversely of the total variance of the respective paired data, i.e., the data point with the lowest uncertainty receives the highest weight and vice versa.

The WLR equation is:

where the slope bw, its uncertainty ubw, intercept aw, and its uncertainty uaw can be calculated as follows:

For using the above equations, the following three equations are additionally required.

The weighted centroid of the concentration variable,

]]>

The weighted centroid of the response variable,

The response Rw of the ith concentration data point (CiRi) used in the calibration, which would actually correspond to the WLR equation,

The WLR calibration can then be used for measuring the response Rd of an "unknown" sample and its total uncertainty uRd and calculating its concentration Cd from the regression equation as well as its total uncertainty uCd, and not just its replication error, as is erroneously customary in chemistry or geochemistry. We recall that if we carry out r measurements of the response Rd and obtain its standard deviation as SRd, the total uncertainty will have to be first estimated from equation (16).

where t(r1) is the Student t critical value for (r1) degrees of freedom and the chosen confidence level, generally twosided 99% or 95%. This level should be the same as that used for WLR calibration.

Equation (17) below is postulated for the unknown and rearranged as equation (18) to obtain Cd and equation (19) or (20) for its total uncertainty uCd.

]]>
A better, more appropriate alternative to the uncertainty propagation equation (19) would be to resort to Monte Carlo simulation of equation (18). This is due to the fact that the covariance terms, difficult to determine, are not included in equations (19) or (20)  see the approximate equality sign (≈) in these equations. Thus, once the complete experiment involving instrumental calibration and measurement of the unknown is geochemometrically done, the approximate error propagation equations (Bevington and Robinson, 2003; Verma, 2005) are no longer required in this geochemometric proposal based on Monte Carlo approach.

The statistical significance of the total uncertainty can be expressed in the following inequality, i.e., the population mean µCd of the unknown sample would lie within the confidence interval of inequality expression (21) at the chosen confidence level (99% or 95%), as follows:

For interpreting geochemical data mostly OLR has been used, except in geochronology, for which WLR models, with errors on both variables, have been generally applied (e.g., York, 1966, 1969; McIntyre et al., 1966; Brooks et al., 1972; Mahon, 1996) and are in wide use even today at the data acquisition and calculation stages. If the above equations, for example, equation (8) for WLR calibration and equation (18) for the unknown sample, were routinely used along with Monte Carlo simulations, we should document a significant advancement in geochemometrics, because then we can use appropriate WLR models for data interpretation as well.

On the other hand, in some applications, such as for interpolation or extrapolation purposes, more complex quadratic to higherorder polynomial models might provide better fit to the data. Besides, logtransformation of xaxis (independent or explanatory variable) may be useful in some applications as has been documented for interpolation and extrapolation of Student t critical values, in which the xaxis was the degrees of freedom (Verma, 2009). This kind of transformation could become a more common technique to be used in regressions.

Sensitivity and limit of detection (LOD)

If WLR model is appropriately applied for calibrations, the sensitivity bw and its uncertainty ubw can be routinely reported from the regression equation (8).

The estimation of LOD from WLR can now be suggested as the most appropriate geochemometric method, modified after Mocak et al. (1997) and Miller and Miller (2005). Let us assume that a "blank" sample (containing no or "very little amount" of analyte) is used for estimating LOD. In most analytical instruments the blank need not contain any preestablished amount of the analyte of interest (i.e., its concentration can be assumed to be zero, C0=0 ), but in some instruments, such as chromatography, the software does not generally allow the integration of a background or blank signal, therefore, in may be necessary to use a solution containing a small amount of analyte (C0>0); nevertheless, only the smallest amount or concentration that will give a measurable signal should be used. Let the blank be measured k times (where k is recommended to be at least 30 by Verma and Santoyo, 2005; for more explanation see Student t value in table A1 of Verma, 2005 and the uncertainty equations in this work, for example, equation 22; the basic idea is that the uncertainty should be obtained from a large number k of measurements so that the t value approaches to that of infinity). The response R0 and standard deviation SR0 are estimated in the instrument under the same conditions as the calibration experiment (remember that R0 and SR0 should be calculated from discordant outlierfree data array).

The uncertainty uR0 of the blank response R0 can be calculated as follows:

]]>
Then, the uncertainty uC0 of the blank concentration C0 can be calculated as follows:

Because we are dealing with the uncertainty concept, I propose that the LOD could be defined as follows:

As stated earlier, for most instruments, C0 can be assumed as zero, in which case, LOD will be simply estimated from the total uncertainty uC0. Once again, instead of using the approximate equation (23) Monte Carlo simulation of equation (18) can be undertaken to better determine the LOD. Finally, a comparison of this newly proposed method with the conventional methods already in wide use should be undertaken, which will reinforce the new science of Geochemometrics.

ROBUST METHODS

Robust methods theoretically provide means of handling experimental data in the presence of discordant outliers, because they are considered robust against them (e.g., Barnett and Lewis, 1994; Maronna et al., 2006). Proponents of robust methods always claim their superiority over the outlierbased methods. For central tendency parameter estimates, many different robust statistics have been proposed such as median, mode, mean quartile, Gastwirth mean, trimean, trimmed mean, and Winsorized mean, among others (e.g., Verma, 2005).

It is not clear which robust parameters should be used for a particular application. When these different statistics provide "consistent" estimates for the central tendency parameter, it is immaterial which one is used, but in practice, they may differ significantly from each other, in which case no simple answer can be given as to which statistic is better than the other. Nevertheless, if robust estimators are used for central tendency parameter, adequate robust estimators such as median deviation (also called MADmedian absolute deviation) or interquartile range should also be used for the dispersion parameter as well. The relationship of robust dispersion estimates with the respective "population" standard deviation should also be established.

Thus, what is really required is an objective evaluation of the performance of robust and outlierbased methods, so that the user can independently decide which method or statistic to rely upon and under what circumstances. In other words, is it immaterial or does it matter to use robust or outlierbased methods for handling experimental data of truly or apparently continuous variable type? Not only geochemometrics but also all other scientific and engineering areas will benefit from this proposal. Unpublished preliminary Monte Carlo simulation results by Verma and coworkers point to serious problems in indiscriminately using the median as an unbiased estimate of central tendency of asymmetrically contaminated smallsized statistical samples; these findings will be documented elsewhere.

]]>

OUTLIERBASED METHODS

Such methods have been proposed to handle experimental data as alternative means to robust methods (Barnett and Lewis, 1994). For their statistically correct application, discordancy tests provide an important tool, such as the multipletest method (MTM) proposed and practiced by Verma (1997). Unfortunately, most people are not even aware of the fact that the most popular statistical parameters of mean and standard deviation belong to this outlierbased category. Therefore, the common practice of calculating these two parameters, without ascertaining that the statistical sample be drawn from a normal population, should be discouraged, and the readers should be made acquainted with the advantages of MTM of Verma (1997) and convenience of a suitable computer program (Verma et al., 1998; Verma and DíazGonzález, 2012).

Discordancy tests with new critical values

A large number of statistical tests have been proposed in the literature to detect discordant outliers in univariate data (Barnett and Lewis, 1994). An outlier is an observation that is extreme in an ordered array of a set of univariate observations. For structured data, an outlier is similarly defined as a structurebreaking observation. Nevertheless, an outlier can be a legitimate observation pertaining to the distribution of the rest of the data in the array, or it could be evaluated as discordant from certain statistical criterion. This is the reason why Verma and DíazGonzález (2012) have called their computer program as Discordant Outlier £>£tection and Separation SYStem (DODESSYS), which permits the application of 33 discordancy test variants at the strict confidence level of 99%. They also emphasized that DODESSYS should prove an important tool for applying outlierbased methods to experimental data, viz., DODESSYS must always be applied to the experimental data arrays before estimating the mean and standard deviation values. In fact, DODESSYS provides these estimates both before and after applying the selected discordancy tests, and for this reason alone, it is a highly convenient tool for correctly applying outlierbased methods. It also allows the user to have statistical estimates of discordant outliers, which can therefore be separately interpreted.

Besides masking and swamping effects (Barnett and Lewis, 1994), the detection of discordant outliers may depend on a series of factors such as error of type I when the null hypothesis is inappropriately rejected (e.g., Gawlowski et al., 1998; Efstathiou, 2006) and of type II when the null hypothesis is inappropriately retained (e.g., Miller and Miller, 2005).

What to do with a discordant outlier? In field studies, outliers when detected as discordant can be interpreted separately as manifestation of side events. In laboratory data, they can also arise from faulty instrumentation, inappropriate calibration, systematic errors, and large random errors, among other causes. Therefore, it is advisable to look for the actual cause of such discordancy. When an outlier is evaluated as discordant in a very small sample such as of three observations, I propose that one should carry out more experiments before fully accepting the outlier as discordant. If, by any chance, the new observations are compatible with the discordant outlier, the discordancy is once again evaluated, and either the first apparently discordant observation should be accepted as legitimate and the other observations are consequently declared as suspect values, or the experiment is further repeated to make sure the statistical outcome. If, on the other hand, the new observations are consistent with the initially dominant two observations, which should be more likely, the suspect observation can be definitely declared as discordant and excluded from further considerations. Thus, the cause of discordancy may also become clear. The discordant outliers should be isolated and not actually rejected but interpreted separately from the dominant distribution of data. The latter are thus used separately from the interpretation of the main event under study.

The discordancy procedure includes single as well as multiple outlier tests. The first group is called so, because the tests evaluate one observation at a time for its discordancy. Multipleoutlier tests for which new critical values are available (Verma and QuirozRuiz, 2006a, 2006b, 2008, 2011; Verma et al., 2008a) are designated as k=2, 3, or 4 types (they evaluate for discordancy two, three, or four observations at a time, respectively). Discordancy tests can be used consecutively until no more outliers are detected as discordant. Multiple outlier discordancy tests for more than four observations (k > 4; Barnett and Lewis, 1994) are also known, but new precise critical values have not been simulated.

I describe in detail the use of discordancy tests. Let us assume that we have an array of n univariate data xi for a parameter x, which can be rearranged in ascending order as an ordered array x(i) where (i) varies from 1 (lowest value) to n (highest value). The observation being tested by a singleoutlier test is either x(n) (upper outlier test  the highest observation or datum is being tested for discordancy) or x(1) (lower outlier test the lowest observation is being tested for discordancy), or any one of the two extreme observations, x^) or x(n) depending on which one is considered more distant from the central tendency parameter (extreme outlier test the extreme observation is being tested for discordancy). The upper or lower outlier tests were also called by Barnett and Lewis (1994) as onesided and the extreme outlier types as twosided. These authors, contrary to the current practice in chemistry, also opined that it is the type of test (onesided or twosided) that is more important rather than the onesided or twosided critical values. For multipleoutlier types, the observations being tested can be on one or both ends of the ordered array, for example, for k=2 type, we can test both x(1) and x(2), or x(n) and x(n1), or even x(1) and x(n). In the case of geochemistry, however, distinction should be maintained concerning the type of analytical method used, and it is not a good idea to test for x(1) and x(n) together as a group when these two observations were obtained by different analytical methods. I further suggest that in most geochemical applications, it would be safer to use singleoutlier tests instead of multipleoutlier types if one wishes to be conservative in declaring outliers as discordant.

For any statistical test, generally two hypotheses are set  null hypothesis H0 that means that the value(s) being tested was(were) derived from a single normal distribution as the remaining observations in the ordered array and alternate hypothesis H1 that the observation(s) being tested is(are) discordant, derived from a distribution different from that of the remaining, dominant or more numerous observations. The statistic corresponding to the given test is calculated and compared with the critical value at the chosen confidence level, being 99% according to my suggestion (see also Verma, 1997, 2005; Verma and QuirozRuiz, 2006a, 2006b) or 95% according to most books in chemistry such as Miller and Miller (2005). Most discordancy tests are considered significant for "greater than", i.e., if the calculated statistic is greater than the critical value, H0 is rejected and, consequently, H1 is accepted; in other words, the observation(s) being tested is (are) discordant outlier(s). But when the calculated statistic is smaller than the critical value, H0 is accepted and H1 is rejected, i.e., the observation(s) tested is (are) declared as legitimate. Unfortunately, "inverse" tests also exist because they are considered significant for "smaller than" (see Barnett and Lewis, 1994; Verma, 2005). The user must therefore be careful in using discordancy tests.

]]>
Critical values are required to apply any discordancy test (Barnett and Lewis, 1994; Verma, 1997, 2005; Verma et al., 1998). I recommend the use of new highly precise and accurate values published by Verma and QuirozRuiz (2006a, 2006b, 2008, 2011) and Verma et al. (2008a) for 33 discordancy test variants. As an innovation, these new critical values were reported along with individual standard error estimates. These authors also proposed new regression equations for computing critical values for those sample sizes that were not tabulated, thus permitting the application of these discordancy tests for all sample sizes up to 30,000. Artificial neural network (ANN) was used by Verma et al. (2008a) for arriving at bestfitted regression equations. Soon afterwards, Verma and QuirozRuiz (2008) used Statistica software to investigate alternative polynomial fitting in conjunction with natural logarithm transformation of sample sizes. Bestfitted polynomial equations were thus reported and favorably compared with the equations obtained by ANN. More recently, Verma and QuirozRuiz (2011) clarified that the critical values for skewness test N14 published earlier by them were of onesided type and reported more precise and accurate twosided critical values for this test.

I present the variation of critical values with sample size for onesided Grubbs type test N1 and Dixon test N7 in Figure 1 and for powerful twosided skewness and kurtosis tests N14 and N15 in Figure 2. The dependence of critical value on the sample size for all tests is so high (Figures 1a, 1c, 2a, and 2c) that polynomial regression in these diagrams does not provide any satisfactory fit to the data (see Verma and QuirozRuiz, 2008). Natural logarithmtransformation of the xaxis (sample size) results in significant "smoothing" of the curves (compare the earlier diagrams with Figures 1b, 1d, 2b, and 2d, respectively), which enabled Verma and QuirozRuiz (2008) to propose bestfit polynomial equations for interpolation and extrapolation of critical values from 100 up to 30,000. For all smaller sample sizes up to 100, precise critical values have been simulated, so there is no need for interpolation equations. Note, however, that the complex nature of curves (Figures 2b and 2d) even after logtransformation will not allow bestfit polynomial equations to be proposed for the entire range of sample sizes from 5 to 30,000. These are the reasons why Verma and QuirozRuiz (2008) proposed equations for sample sizes of 100 to 30,000 (and not for 5 to 30,000). Nevertheless, it is now possible to apply any of the 33 discordancy tests to practically any kind of experimental data without having any limitation on the sample sizes.

Another important point to note concerns the horizontal dotted lines denominated as 2s and 3s in Figure 1a. They represent, respectively, the two and threestandard deviation (2s and 3s) methods also used in the literature as discordancy methods based on "population" criteria, according to which all observations lying outside the range of mean±2s or mean±3s (2s and 3s methods, respectively) are simply rejected as discordant. The correct statistical procedure for small size sampling as carried out in most experiments would be equivalently the Grubbs test (N1) applied, respectively, at 95% and 99% confidence levels. Unfortunately, such statistically erroneous methods (see Barnett and Lewis, 1994), for example, 2s, have been applied in the literature (e.g., Gladney et al., 1992; Imai et al. 1995). Their use should, however, be abandoned in favor of Grubbs test N1 as has already been suggested by Verma (1998a), Verma and QuirozRuiz (2006b), and Verma et al. (2008a). Hayes et al. (2007) have also independently criticized and discarded such standard deviation methods based on population criteria. The 1s method sometimes practiced for handling experimental data should be considered worse and statistically erroneous than the 2s or 3s methods.

Discordancy tests without new critical values

A few more discordancy tests, viz., Tietjen and Moore's statistic (Tietjen and Moore, 1972), Shapiro and Wilk's statistic (Shapiro and Wilk, 1965; Shapiro et al., 1968), twosided test for extreme outlier using a robust estimator of standard deviation (Iglewicz and Martínez, 1982; no tabulated critical values are available), and consecutive or recursive test of multiple outliers (Rosner, 1975, 1977; Jain, 1981), have also been proposed, but only old less precise critical values (generally accurate to two decimal places only) are available for their application.

In fact, Tietjen and Moore's procedure (Tietjen and Moore, 1972) is similar to the Grubbs tests of S2(n)/S2 to S2(n), (n1), (n2), (n3)/S2 types (test N4 k=1 to k=4 types; see Verma, 2005 for more details on test N4 statistics), and the new critical values simulated by Verma and QuirozRuiz (2006b, 2008, 2011) and Verma et al. (2008a) are applicable when the outlying observations being tested are on either end of the ordered data array. For k=510, only approximate critical values are at present available (Tietjen and Moore, 1972). The multiple or many outlier test RST of Rosner (1975, 1977) is also similar to the Grubbs statistics N2 and N3 (see Verma, 2005), with the difference that RST should be computed from trimmed mean and trimmed standard deviation values.

Therefore, new more precise and accurate critical values are required for completing the MTM of Verma (1997) and significantly improving this line of geochemometric research.

It is not clear if we should apply the concept of discordant outliers to raw compositional data without any transformation as done by Verma and coworkers (e.g., Verma, 1997, 1998a, 2005; Velasco and Verma, 1998; Velasco et al., 2000; Guevara et al., 2001; VelascoTapia et al., 2001, Verma and QuirozRuiz, 2008; MarroquínGuerra et al., 2009; Verma et al., 2009a). Verma and Agrawal (2011) and Verma S.K. et al. (2012), on the other hand, used discordancy tests to evaluate natural logarithm of element ratios for discordant outliers, prior to the application of linear discriminant analysis to their compiled data.

]]>
For evaluation of compositional data, it is possible that some kind of transformation is required prior to the application of discordancy tests. Although this should be the subject of future research in geochemometrics, for now may I suggest that the logtransformed ratio data, rather than the element concentrations, should be evaluated for discordancy.

Finally, discordancy tests can also be applied during the data acquisition stage of mass spectrometric determinations. Such an application of Dixon tests was reported by DoughertyPage and Bartlett (1999), although unfortunately it is not a widespread practice explicitly reported in the literature. As an example of unpublished cases, mass spectrometric software in the Geochemistry department of the MaxPlanckInstitut für Chemie in Mainz, Germany, allows the application of Dixon tests before the data are printed out from the instrument. More importantly, for correcting interlaboratory bias in isotopic determinations, all laboratories are supposed to report results of isotopic measurements on established reference materials, such as Eimer & Amend Sr carbonate and more recently, National Bureau of Standards NBS 987 for 87Sr/86Sr (Faure, 2001). Similarly, for Nd isotopic measurements it is customary to report 143Nd/144Nd values obtained on the La Jolla standard (e.g., Verma, 1992). This practice helps eliminate the systematic errors (interlaboratory bias) in Sr and Nd isotopic determinations, especially when using data from different laboratories for interpretation of geological processes.

It is also customary in isotopic studies that the analytical errors on isotopic data be individually reported (see, e.g., Verma, 1992). Nevertheless, from the geochemometrics point of view, the shortcoming seems to reside in the fact that the individual analytical errors are reported as two times the standard error of the mean (2σE; note the wrong notation based on population, it should actually be 2sE) and not as total withinrun uncertainty based on the Student t value. Reporting of simply the standard deviation value without mentioning the total number of measurements is even a more severe mistake. The statistically correct report cannot be easily prepared for the literature data, because the total number of measurements, from which the standard error was calculated, is seldom reported in published literature.

I propose that the geochemometrically correct way to report these individual withinrun errors as the uncertainty usamp would be as follows:

where ssamp is the standard deviation based on p determinations of isotopic ratio in that particular sample and t(p1) is the twosided Student t value at 95% or 99% confidence limits. It is needless to say that the isotopic mean ratio and its standard deviation should be calculated only after ascertaining the absence of discordant outliers in the original data array. Only then, this branch of geochemistry will be fully consistent with geochemometrics.

SIGNIFICANCE TESTS

Significance tests (Student t, Fisher F, oneway ANOVA, and twoway ANOVA) are not routinely applied for the interpretation of geochemical data, although some books on geosciences do recommend their use (e.g., Jensen et al., 1997; Verma, 2005). If we were to accept geochemometrics as the emerging science, significance tests should become an integral part of data evaluation in Earth sciences. However, note that these tests require that individual statistical samples be normally distributed. Therefore, DODESSYS (Verma and DíazGonzález, 2012) should prove an important tool for the application of significance tests, i.e., for assuring that the basic assumption of normal distribution is complied.

We are attempting to make geochemometrics a reality by reinterpreting some of the published data (e.g., HernándezMartínez and Verma, 2009) so that the Earth science community could compare and contrast these new geochemometric interpretations with those put forth in the respective original papers. Precise critical values are helpful in this respect. These can be obtained from freely available software R after proper programming, or can be consulted directly in Verma (2009) for interpolation equations.

]]>
Once again, it is not clear if significance tests should be applied to crude compositional data or logtransformed ratios, although it might be desirable to do so to transformed variables.

DIAGRAMS IN GEOCHEMISTRY

Numerous bivariate, ternary or multielement diagrams are used in geochemistry. However, they should be evaluated from geochemometrics point of view.

Bivariate diagrams

First, I discuss problems with conventional bivariate diagrams (diagrams with two axes) in geochemistry and point out statistical solution to these problems. Such diagrams have been widely used in geochemistry. However, for compositional variables there may be problems when these diagrams are used for geochemical concentrations of chemical elements to draw statistical conclusions. Long ago, Chayes (1960, with more than 210 cites in international journals as judged from the Institute for Scientific Information database) had pointed out difficulties in the use of crude compositional variables. These problems of compositional variables (closure problem and constant sum effect) were later stressed by Aitchison (1982, 1984; these papers with more than 280 cites in international journals). Aitchison, in his pioneering work (Aitchison, 1986; cited more than 940 times in international journals), also proposed solutions to overcome the difficulties of constant sum and closed compositional space of crude variables. He noted that, instead of using crude compositions, one must think in terms of a multivariate approach by calculating compositional ratios having a common denominator and then working in logarithms of these ratios. The division eliminates the compositional units, which may be wt% or %m/m, or ug/g, and renders the compositions as simple numbers opening up the space, i.e., absolute magnitudes are converted in relative magnitudes. The logtransformation of ratios opens up the space theoretically to infinity, in the positive or negative direction, or both, depending on the nature of the common denominator used. The natural logarithm or other kind of logarithm can be used for logtransformation.

Unfortunately, the most well known among the bivariate diagrams are the so called Harker diagrams based on silica as the xvariable and other major and traceelements as the yaxis variables, or Harker type diagrams, in which a compositional variable other than SiO2, e.g., MgO, might be used as the xvariable. The geochemical literature is full of such diagrams, which are used to draw statistical inferences. The basic problem with these diagrams is that there is an inherent negative correlation of other chemical variables with SiO2 because of the constant or unit sum constraints (e.g., Chayes, 1978). In fact, this author additionally showed that even a positive correlation of some variables with SiO2 is also possible. The existence of negative or positive correlation in these diagrams is routinely used to infer about geological processes, and the inherent statistical correlation from closed sum concept pointed out above is not even mentioned, nor is it taken into account. The reader can, therefore, readily see from the above discussion that the Harker or Harkertype diagrams should not be used any more to draw statistical inferences, or should be used with caution.

Other diagrams obviously unfit for the purpose of drawing statistical inferences are those in which a common variable in both axes is used (see Reyment and Savazzi, 1999, for more discussion), such as AA/B or AB/A type diagrams where A and B are two chemical elements.

The new science of geochemometrics should emphasize on this shortcoming of diagrams and popularize the statistically correct solutions.

]]>
Correlation coefficient in bivariate diagrams

The simple concept of Pearson's linear correlation coefficient (r) for compositional data again seems to be irrelevant for interpretation in geochemistry, and should be replaced by the proportionality concept (Aitchison, 1986; Reyment and Savazzi, 1999). For normal, i.e., fullspace data there is no problem in using the conventional r. The concept of logratio variance is of use in this respect. If we estimate the variance of logtransformed ratios of two elements i and j for a set of samples, we can use this "relative variance" value (var{log(xi/xj)}) as an indicator of correlation. If the relative variance value approaches zero, there is a perfect relationship between A and B. Remember here the sizes of the samples or specimens is probably irrelevant. In other words, we can replace the concept of perfect correlation by that of perfect proportionality. Greater values of relative variance express greater departure from the perfect proportionality between parts or components under study (A and B). When the relative variance approaches ∞, the concept of a complete lack of proportionality becomes applicable.

To make this new concept amenable to everyone, Aitchison (1997 cited in Reyment and Savazzi, 1999) introduced a finite scaling transformation as a measure of the relationship between two parts. This scale runs from 0, which signifies a lack of proportional relationship, to 1 that corresponds to a perfect proportional relationship. It requires at least three parts for the computation of the proportionality measure, but has the disadvantage that association cannot be identified as negative or positive. These concepts have yet to be incorporated in geoscientific research. In the mean time, I suggest that the conventional r can be used for evaluating logratio transformed compositional data, and not crude compositions.

Some workers such as Chayes (1960) and Aitchison (1986) discouraged the use of simple geochemical compositions in bivariate diagrams, but ironically recommended the use of ternary diagrams (Chayes, 1965, 1985; see also Aitchison, 1986). If the proposed ternary diagrams are constructed by the way they are, i.e., by recalculating proportions to 100% sum of the three variables, they are likely to be bound by the problems point out in the present work even if they are based on logtransformed variables. The only application in which these adverse effects are not of much significance will be when the experimental errors or uncertainties in the three ternary variables are exceedingly small or negligible, which is not likely for the compositional variables generally used in such diagrams, particularly traceelements. Even with the modern analytical techniques, the total propagated uncertainties (combined calibration and measurement uncertainties for the "unknown" samples) for these elements in geological materials are likely to be large enough for serious consequences in ternary diagrams. Effects of analytical errors or uncertainties on individual samples in a ternary diagram are not known. This may be the reason why some workers (e.g., Presnall, 1969; Chayes, 1985) have proposed the use of ternary diagrams as the only choice to visualize and interpret certain kinds of data.

Ternary diagrams are so frequently used that there are tens of thousands of references in published literature. This generalised use is disappointing in view of some existing studies (Butler, 1979; Philip et al., 1987; Howard, 1994), which point out problems with such diagrams. Statistical summary in ternary diagrams is modified, and genetic inferences may be biased from interactions of other factors that cannot be easily separated from petrogenetic controls (Butler, 1979), for which they are frequently used, for example, the well known AFM (alkalisironmagnesium) diagram (Rollinson, 1993) in geochemistry and igneous petrology. The use of ternary diagrams for the comparison of sample sets must also be viewed with caution (Philip et al., 1987). It has also been suggested that instead of error polygons, confidence intervals representing total uncertainty estimates should be used to visualize statistically significant differences between means (Howard, 1994).

To the best of my knowledge, however, no study has yet been reported on correctly propagated errors (or uncertainties) from three individual errors (or uncertainties) of the variables or components used to construct such ternary diagrams. Existing studies (Howard, 1994) on error propagation have been even incorrect because of the unaccounted covariance terms that result from the basic mathematics to construct these ternary diagrams (Bevington and Robinson, 2003; Verma, 2005). Using Monte Carlo simulation, involving very large repetitions of 100,000, I demonstrate, for the first time, the inherent problem of error distortion and visual amplification or reduction in these very frequentlyused ternary diagrams.

I report as examples the results of two case studies or error models involving a total of 25 data points with heteroscedastic errors (unequal standard deviations) characterized by an equal relative standard deviation (RSD) simple model and a more realistic, unequal RSD, complex model (see models 1 and 2, respectively, in Table 1). Cases of homoscedastic errors (equal standard deviations independent of the mean values), being unrealistic in Earth sciences and chemistry, are not considered.

Construction of a ternary diagram ABC from three measured variables Am, Bm and Cm, with their respective standard deviation estimates of SAm, SBm and SCm, involves three analogous equations. I present only one such equation (equation 26). The first ternary variable A can be calculated from:

]]>

Even when the initial variables (Am, Bm and Cm) are not correlated, i.e., their covariance can be neglected, the recalculated ternary variables A, B and C will necessarily be correlated (Bevington and Robinson, 2003), i.e., their covariance terms must be taken into account to estimate final uncertainties of the transformed variables that are constrained within the closed triangular space of ternary diagrams. Unfortunately, the equations to handle the propagated errors are strictly valid only for population standard deviation (σ), provided all terms (quadratic, cubic, etc.) are taken into consideration. For the sample standard deviation (s) being an estimate of σ, the equations are even more approximate (Verma, 2005). One such highly approximate equation for calculating the variance (sA2) of the ternary variable A, which takes into consideration the covariance terms (S(Am)(Bm)(Cm))2 and (S(Am)(AmBmCm))2 and, is as follows:

Use of such complex approximations is not recommended (Verma, 2005).

Therefore, to achieve the objective of correctly and efficiently evaluating ternary diagrams and proposing a statistically viable alternative, I resorted to Monte Carlo simulation, in which, as a first step, a series of independent and identically distributed (IID) random variates U(0, 1)uniformly distributed in the space (0, 1), were generated (Law and Kelton, 2000). These were tested for randomness, transformed to normal random variates N(0, 1) (Verma and QuirozRuiz, 2006a), and used for simulation of total propagated errors during the construction of a ternary diagram.

As examples, model 1 (equal RSD) assumes initial errors in the measured variables Am, Bm and Cm as 5% RSD. Model 2 (unequal RSD) is based on a more realistic case of 1%, 3% and 5% RSD, respectively and additionally, on equation 28 (Thompson, 1988) for the first variable Am as follows:

where the limit of detection (LODA) is three times the standard deviation at zero concentration (IUPAC, 1978) of the variable Am and cvA is taken to be a constant (being the coefficient of variation for relatively large values of Am as compared to LODA). I note that the use of the new equations for LOD proposed in this work will not change the inferences.

For model 2, the following values (arbitrary units) were set: LODA = 0.015, LODB = 0.025, LODC = 0.030, cvA = 0.01, cvB = 0.03, cvC = 0.05 and These LOD values were assumed so that the quantification limits, being approximately three times the LODs, could be less than the smallest experimentally measured values of Am, Bm and Cm, used to illustrate these findings. The cvA, cvB, cvC values correspond to the RSD values of 1%, 3%, and 5%, respectively. This allowed me to reasonably model the low concentration values of the variables corresponding to the 25 data, especially those involving 0.1 unit of a measured variable (the last three rows in Table 1).

Values of sAm (equation 28) and sBm and sCm (analogous equations) for each of the 25 data were then calculated and used in the simulations for model 2. For model 1, however, these calculations were straight forward. Further, to facilitate a visual comparison of the initial data, the three measured variable mean values for all 25 data were assumed to sum up to 100 (see the first three columns in Table 1). The results would not change even if these measured data summed up to values significantly different from 100, because they will necessarily sum up to 100 after the ternary transformations (equation 26 and analogous ones not presented).

]]>
The results (Table 1 and Figures 3a and 3b) show significant error distortion in ternary diagrams. The size of the symbols would correspond approximately to 99% confidence limits (total uncertainty estimates) provided each variable were measured about 10 times. If the individual data (mean values of the 25 data under consideration; Table 1) were obtained from a smaller number of measurements (<10), the uncertainty estimates would be greater than those represented in Figures 3a and 3b, depending on the corresponding Student t critical values.

Very large repetitions of 100,000 were used to best represent the size and shape of the data symbols, which would remain practically the same for any smaller or larger repetitions, such as 10,000 or 1,000,000. Only the density of the simulated data symbols will accordingly change.

For model 1 with equal RSD of 5%, the recalculations essential for constructing ternary diagrams (equation 26 and analogous ones not presented) result in totally different RSD values (0.01% to 7.11% in Table 1; see the three columns listed under "Model 1 (Figure 3a)", in which none of the calculated RSD values is equal to the initial 5% RSD), with considerable distortion of symbol shapes (Figure 3a). When two of the three measured variables (Am, Bm and Cm ) are equal for a given ternary datum, the new propagated RSD for these two variables would also be the same, but different from the initial RSD values (e.g., see the rows identified by * and ** and the first 9 rows of results in Table 1, the latter correspond to the data plotted in the vertical direction in Figure 3 a). Only for the exceptional case of equal Am, Bm and Cm (each about 33.33%  the fourth row of data in Table 1; the centroid of a ternary diagram  Figure 3a), all three new RSD values are equal (about 4.10%). For all other cases, the new RSD values are totally different from each other and also from the initial RSD of 5%. For data lying close to the apexes or to the boundaries, the shapes of the symbols are even more distorted and their sizes become much smaller (Figure 3a; see the last three data rows in Table 1) than for those lying in the central region of ternary diagrams.

For model 2 with unequal RSD of 1%, 3% and 5% and a realistic error structure (Thompson, 1988), the recalculations inherent in ternary diagrams result in even greater modification of RSD values (0.02% to 11.47% in Table 1; see the three columns listed under "Model 2 (Figure 3b) and consequent greater distortion of symbol shapes (Figure 3b). For all cases, the new RSD values are totally different from the initial RSD values (Table 1), symbol shapes are distorted and their sizes become smaller for data lying close to the apexes or to the boundaries.

In ternary diagrams, for a given datum the relative mean value determines the region where it will actually plot, whether central or near the apexes or boundaries, and its total uncertainty estimates containing covariance terms indicate its final shape. Traditionally, small symbols are used to represent the data. However, the symbols occupy a much smaller area near the apexes or boundaries to such an extent that the two data plotting very close to the apexes A and C and one close to the AB boundary are hardly even visible, whereas the symbols are much larger in the central region (Figures 3a and 3b). Analytical errors of the order of 1% to 5% are reasonable estimates for many applications, but may even be underestimates of total analytical uncertainties in some areas of Earth sciences, such as igneous or sedimentary petrography and trace element geochemistry.

In numerous ternary diagrams proposed in the literature for data interpretation (Rollinson, 1993; Verma, 2010), the variables are generally modified by "suitable" multiplication or dividing factors, e.g., Ti/100Zr3Y, so that the "useful" region would lie in the central part of the diagrams, away from the apexes and boundaries. The present Monte Carlo simulation procedure shows that this is precisely the "unwanted" area where the data symbols, when plotted with their respective total uncertainty estimates, would occupy the largest part of the diagram, thus rendering all such proposals of ternary diagrams in Earth sciences and chemistry statistically less powerful and probably even meaningless for the interpretation of experimental data, especially those characterized by large analytical errors.

Furthermore, the existence of constant or unitsum constraint or closure problem in handling compositional data has long been recognized (Chayes, 1960, 1978). Even if the initial data in ternary diagrams are of truly "continuous" variables (and not of compositions), the methodology to construct such diagrams would result in the closure problem that is very similar to the problem of compositional data in most bivariate diagrams, such as the well known Harker type diagrams frequently used in geochemistry (Rollinson, 1993).

One statisticallycorrect solution to resolve this problem and open the sample space (theoretically from ∞ to +∞) is the natural logarithm transformation of ratios ("logratio") using a common denominator (Aitchison, 1986; Buccianti et al., 2006), although it is not clear to me why Aitchison (1986) used ternary diagrams to illustrate his innovated procedure for compositional data handling. Aitchison and Egozcue (2005) commented that ternary diagrams, complemented with centring and scaling techniques (von Eynatten et al., 2003) are one of the most important and practical tools to represent compositional data. Error propagation in such diagrams was, however, not attempted by any of these authors.

I propose and show that the best statisticallycorrect alternative to statistically erroneous ternary diagrams is to use bivariate diagrams based on the two logratios of the three measured variables Am, Bm and Cm (Figures 3c and 3d). The equal 5% RSD values (model 1) are projected as a constant standard deviation value (Figure 3c; Table 1). Because the transformed logratio variables ln (Am/Cm) and ln (Bm/Cm), can take any value from ∞ to +∞ (see the two columns of "Transformed experimental data" in Table 1 where both negative and positive mean values result from the 25 data), it would be meaningless to use RSD as a parameter to express the dispersion estimate; instead, standard deviation or total uncertainty estimates (confidence intervals) should directly be used. The standard deviation values for unequal RSD and complex error structure (model 2) are also similarly small (Figure 3d). They increase, as expected from the statistical and chemical principles, when one or two variables in a threecomponent datum approach the respective LOD values (see the data in the final two columns corresponding to three rows in Table 1; these data plot in Figure 3d as extreme values).

In the present work, although only two models were used to illustrate these critical findings, the use of any other uncertainty values or error structure, or actual error or uncertainty estimates when available, would provide essentially the same conclusion that ternary diagrams are statistically erroneous, because they are characterized by distortion of initial errors and unusually large errors visible in the central region of the closed triangular space, away from the apexes and boundaries. I conclude that ternary diagrams should better be abandoned (Figures 3a and 3b) or, at least, their use minimized. Logratio transformed bivariate diagrams (Figures 3c and 3d) could henceforth be adopted to handle three variables in two dimensions. This would not only facilitate correct statistical treatment, but also provide new ways of interpreting data in Earth sciences and chemistry.

]]>

DISCRIMINATION DIAGRAMS

Ever since the advent of plate tectonics, this graphic technique came into existence for deciphering the tectonic setting of igneous and sedimentary rocks (Rollinson 1993). A recent comprehensive review by Verma (2010) focused first on the statistical evaluation of traditional bivariate and ternary diagrams, and then presented the advantages of using the more recent (20042011) multidimensional diagrams. In the section of "Ternary Diagrams" I have provided further evidence against the indiscriminate use of ternary diagrams.

Aitchison (1986) proposed logratio transformation as the solution for correct handling of compositional data. Reyment and Savazzi (1999) presented detailed account of multivariate techniques, including some computer programs, to take Aitchison's recommendation into account.

As a more recent example, Verma et al. (2006) used majorelements in their diagrams by logratio transformation with (SiO2)adj as a common denominator (thus, obtaining ten ratios from eleven majorelements). Because a more abundant component was used as the denominator, all majorelement ratios are likely to result in numbers smaller than 1, and therefore their logratio transformation will result in negative numbers, thus having opened the space from zero to Had they chosen a less abundant majorelement such as (MgO)adj or (P2O5)adj, the logtransformed space might have opened in both positive and negative directions.

Agrawal et al. (2008), on the other hand, for proposing their diagrams based on immobile traceelements (La, Sm, Yb, Nb, and Th), chose Th as the common denominator. After logratio transformation, they worked in the fourdimensional space of ln(La/Th), ln(Sm/Th), ln(Yb/Th), and ln(Nb/Th). Here both positive and negative values in the transformed space are likely to occur, opening, thus, the space theoretically in both ∞ and +∞ directions.

Similarly, Verma and Agrawal (2011), in their attempt to propose new discrimination diagrams based on immobile elements (TiO2)adj, Nb, V, Y, and Zr opted for the more abundant (TiO2)adj as the common denominator. They, therefore, also worked in an open space of logtransformed variables from zero to They further assured through DODESSYS that the logtransformed ratios be normally distributed.

Finally, Verma S.K. et al. (2012) used majorelements after logratio transformation with (SiO2)adj as a common denominator and proposed new multidimensional diagrams for acid magmas. They also followed the same methodology of Verma and Agrawal (2011) for ascertaining discordant outlierfree samples. Verma and DíazGonzález (2012) have documented additional application cases to confirm the usefulness of DODESSYS and new multidimensional discrimination diagrams.

In all of these papers, the authors proposed new discriminant function based diagrams after linear discriminant analysis (LDA) of logtransformed data. Strictly speaking, however, the variables used in LDA should be drawn from a multivariate normal distribution rather than a number of univariate normal distributions. New critical values are required to test for the former, which if achieved will be a significant progress not only for geochemometrics but also for other science and engineering fields.

]]>
With the exception of Verma et al. (2011b) diagrams for acid magmas, all other multidimensional diagrams are meant for tectonic discrimination of basic and ultrabasic magmas. This clearly demonstrates that new multidimensional discrimination diagrams are very much needed for intermediate magmas as well as additional diagrams based on immobile elements for acid magmas. The available diagrams were extensively evaluated by the original authors. Four tectonic settings have been successfully discriminated, which are island arc, continental rift, oceanisland, and midocean ridge. More recently, Verma et al. (2011b) have evaluated these diagrams from independent datasets and documented high success rates not only for these four tectonic settings, but also for the continental arc of the Andes and the Central American Volcanic Arc interpreted as similar to the island arc setting. Verma et al. (2011b) also applied these diagrams to evaluate the dominant tectonic setting of the Mexican Volcanic Belt, which was inferred as continental rift.

PETROGENETIC MODELING

The knowledge of chemical equilibrium constants was incorporated in geochemistry in terms of solidliquid partition coefficients, and a long history of development exists for modeling magmatic processes of partial melting, fractional crystallization, magma mixing, and assimilation with or without fractional crystallization. Rollinson (1993) is a good source for more detailed information on this topic.

Partial melting of a source region is governed by equations presented by several researchers (e.g., Schilling and Winchester, 1967; Shaw, 1970, 1978; Consolmagno and Drake, 1976; Hertogen and Gijbels, 1976; Langmuir et al. , 1977; Wood, 1979). Inversion of partial melting equations was also proposed (Minster and Allègre, 1978; Albarède, 1983; Hofmann and Feigenson, 1983) and used more recently in Mexico by VelascoTapia and Verma (2001, in press) for Sierra Chichinautzin, by Verma (2004) for eastern Mexican Volcanic Belt, and by Verma (2006) for Los Tuxtlas volcanic field. For modeling of fractional crystallization, one can resort to the details presented by Schilling and Winchester (1967), Allègre et al. (1977), Yanagi and Ishizaka (1978), Villemant et al. (1981), and Le Roex and Erlank (1982), among others. Combined or decoupled processes of assimilation and fractional crystallization have been invoked to explain magmatic evolution (DePaolo, 1981; Powell, 1984; Cribb and Barton, 1996).

Other more complex petrogenetic models were presented by O'Hara (1977, 1980, 1993, 1995). Similarly, more complex energyconstrained models have been advocated by Spera and Bohrson (2001, 2002, 2004) and Bohrson and Spera (2001, 2003). "In situ" threedimensional combined thermal and chemical modeling of magma chambers has also been initiated (e.g., Verma and Andaverde, 2007; Verma et al., 2011c, 2011d), but it is still at its infancy stage.

Uncertainty propagation in these petrogenetic models has not been extensively covered. Verma (1998b, 2000) presented approximate error propagation equations for use in geochemical modeling. There is ample room to carry out Monte Carlo simulation for uncertainty propagation in petrogenetic modeling, because most variables, including solidliquid partition coefficients (e.g., TorresAlvarado et al., 2003), in the petrogenetic equations have uncertainties associated to them. Finally, Aitchison's recommendations (Aitchison, 1986) should be incorporated in this field of geochemistry to reinforce the new science of geochemometrics.

GEOTHERMOMETERS

Solute geothermometers have been widely used in geothermal exploration and exploitation. They have been in use now for nearly forty years. A recent review by Verma et al. (2008b) summarized all available equations and reported a new computer program SolGeo for use in such studies. Among solute geothermometers, the use of more recent proposal of silica geothermometers (M.P. Verma, 2008) requires special care. The geothermometers based on Na/K are more commonly used, for which statistically improved equations have been reported by Verma and Santoyo (1997), DíazGonzález et al. (2008), and Verma and DíazGonzález (2012). All these regression equations for the Na/K geothermometer report standard errors in the regression coefficients. Therefore, error propagation in such equations can also be correctly handled by Monte Carlo simulations.

]]>
Although gas geothermometry has been proposed in geothermics (D'Amore and Panichi, 1980; Giggenbach, 1980; Arnórsson and Gunnalugsson, 1985; Henley et al., 1985), these geothermometers have been less used than solute geothermometers.

In this area of geochemistry, correct statistical handling of compositional data recognizing multivariate nature of fluid composition is yet to take place. Incidentally, most Na/K geothermometers seem to comply with Aitchison's procedure of logratio transformation (see Verma et al., 2008b for review on geothermometers) but only as a bivariate procedure for Na and K (and not a multivariate transformation involving other chemical elements as well). I suggest that additional work should be carried out in the field of geothermometry to make this tool more reliable in the exploration and exploitation of geothermal resources.

FINAL CONSIDERATIONS

Undoubtedly, there are other important areas of research that would reinforce the new science of geochemometrics. Among the currently available fields, I once again cite most of the topics covered in this paper, i.e., data quality, discordancy and significance tests, regressions, error propagation in ternary diagrams through Monte Carlo simulation, and multivariate techniques for correct compositional data handling.

There is some vague notion in the literature that, with the availability of more sophisticated instrumental techniques, the data quality has improved over the years. Is it really true? Is it the precision or the accuracy that has been improved? Or both have been improved? The new science of geochemometrics should answer these crucial questions. My own impression based on unpublished compilations of literature data, without having done yet a systematic research and interpretations, is that it is the precision, and not the accuracy, that has probably improved.

As an integral part of the data quality research in geochemometrics, the systematic behavior of LODs should be further investigated, and a clear theoretical explanation should be put forth. The instrumental sensitivities should always be reported. Total uncertainty estimates would be vital for this line of research. Certified geochemical reference materials for most elements of the Periodic Table are also highly desirable for geochemometric purposes.

Overall performance estimates of discordancy tests in terms of relative efficiency criterion (Verma et al., 2009b; GonzálezRamírez et al., 2009) should be complemented by the five individual probability calculations as suggested by Barnett and Lewis (1994) and achieved by Hayes and Kinsella (2003) for two discordancy tests. Although new precise and accurate critical values have recently been proposed for 33 test variants, precise values are still required for several other discordancy tests proposed in the literature.

Correct statistical treatment prior to the application of discordancy tests to compositional data has yet to be proposed and its use generalized. Does the application of discordancy tests to logratios (Verma and Agrawal, 2011) represent such a correct statistical treatment? Or should we explore discordancy tests for multivariate normal distribution?

Significance tests combined with discordancy tests should be routinely used for interpreting geochemical data. This statistical approach will therefore become an integral part of geochemometrics.

]]>
Monte Carlo simulations should be used for an objective comparison of different regression techniques currently available. Are these techniques directly applicable to compositional data? Or is some kind of transformation required to make them suitable for geochemometrics?

Similar to the ternary diagrams evaluated in this work, uncertainty propagation through Monte Carlo simulation in other diagrams such as discriminant functionbased multielement (multidimensional bivariate) diagrams should prove useful for future proposals of discrimination diagrams.

An objective comparison of robust and outlierbased methods is urgently needed. This will provide indications of appropriate statistical methods for use in the interpretation of geochemical data. And it will solve the controversies that are central to this important aspect of geochemometrics.

The new science of geochemometrics should warn against the erroneous use of numerous bivariate and ternary diagrams in Earth sciences and facilitate the use of statistically correct methodology for data interpretation. In the light of the simulation results documented in this paper, ternary diagrams should probably be abandoned or at least their use minimized, and bivariate plots involving natural logarithmratio transformed variables be adopted as the best, statisticallycorrect alternative to handle three variables in two dimensions.

Other multivariate techniques, such as principal component analysis and cluster analysis, should also be explored, although to geochemometrics these techniques may not prove more efficient than the linear discriminant analysis. Nevertheless, the statistically correct procedures should be made available to all those interested in using geochemometrics in the interpretation of geological and geochemical data.

Are there suitable methods other than the logratio transformation to handle compositional data? Aitchison (1999) seems to have demonstrated that there is none else, other than his extensive discussion on logratio transformation using a common denominator (Aitchison, 1986).

Work related to "in situ" thermal and chemical modeling of heat sources, if carried out in combination with Monte Carlo simulations, is likely to provide an important progress in petrogenetic modeling and is therefore highly recommended to reinforce the new science of geochemometrics.

ACKNOWLEDGEMENTS

I am grateful to Edgar Santoyo, Ignacio TorresAlvarado and K. Pandarinath for the invitation to deliver a plenary lecture on geochemometrics during XX Congreso Nacional de Geoquímica, October 1015, 2010, Temixco, Morelos, and to prepare full paper for its possible publication in the Special Section of Revista Mexicana de Ciencias Geológicas dedicated to this event. I also sincerely thank the three official reviewers of the earlier version of this paper and one of the guest editors Edgar Santoyo; their comments helped me to improve my presentation.

Verma, S.P., 2009, Evaluation of polynomial regression models for the Student t and Fisher F critical values, the best interpolation equations from double and triple natural logarithm transformation of degrees of freedom up to 1000, and their applications to quality control in science and engineering: Revista Mexicana de Ciencias Geológicas, 26(1), 7992. [ Links ]