When developing the regression protein in R, one of the values obtained are the regression coefficients which are used to calculate the protein value for every new sample. There is a regression coefficient for every wavelength, so we can make a plot for the regression coefficients which maybe help us to understand how the calibration is working, but is not always like this and sometimes are very difficult to understand or almost impossible.

In the case of the regression make (with 4 terms) for the soy meal in transmittance, we can see the coefficient values like this.

29 ene. 2018

This is the last post about analyzing soy meal in transmittance. The last option we make was to make a special selection from the original database to see if we get some improvements in the performance of the calibration, but with this option appears some slope, so I prefer the first option were the statistics were quite good but with a Bias.

With all the database the statistics for the PLS regression (with 4 terms) are:RMSEP..........1,12SEP............0,529 (Prediction error bias corrected)RSQ............0,898and the XY plot is:

With the samples selected with the approach described in the part 9, the statistics are:RMSEP.......1,12SEP.........0,633 (Prediction error bias corrected)Sres........0,547 (Prediction error Slope/Intercept corrected)RSQ.........0,898and the XY plot is:

As we can see the first option is the best, and gives quite aceptable errors for protein in intact soy meal in transmittance.

In the case of the Soy meal, we have a validation sample set (from an Infratec Nova), and we get the predictions from a calibration sample set from an Infratec 1241. As we saw in "Analyzing Soy meal in transmittance (part 8)" , the predictions are fine, but we have a bias and the idea was to merge the samples from the Infratec Nova with the samples from the Infratec 1241 and to develop a calibration.

But before that we want to see another option, and for that we are going to use an option from Win ISI: "Select local samples from a product file".

With this option we project the validation samples in the PC space we got from the calibration samples, and we search for neighbors into a certain cutoff.

Those samples will be take them apart into a file, to develop an exclusive calibration for the validation samples.

In this case I am going to use a cutoff of 0,2 (Mahalanobis distance).

From the 657 samples, 527 were selected and exported to R to develop the calibration. There are some clear outliers, but there is a high improve in the statistics.

Now we remove the samples with number in red, because they are out of the action limits and recalculate.

28 ene. 2018

Continuing from the post "Analyzing Soy meal in transmittance (part 7)", we are going to remove the four samples which are out of the action limit (residual higher than 3.RMSEP) , and to recalculate the model.soy_ift_prot1r1<-soy_ift_prot1[-c(183,107,108,267),]Prot_plsr_r1<- plsr(soy_ift_prot1r1$Prot~soy_ift_prot1r1$X_msc, ncomp = 16,data =soy_ift_prot1, validation = "LOO")summary(Prot_plsr_r1)predictions<-(Prot_plsr_r1$fitted.values[,,9])soy_ift_prot2r1<-cbind(soy_ift_prot1r1$Sample, soy_ift_prot1r1$Prot, predictions)monitor_prot_r1<-monitor10c24xyplot(soy_ift_prot2r1)With this code we get the new X-Y plot and the new statistics, and finally we are going to keep this model.
I don´t consider necessary to remove more samples, and the Monitor function give us the distribution of the residuals into the different regions:

This calibration was done with data from an IFT1241.There is a new instrument called Infratec NOVA, and an exercise has been done in order to check if the calibration developed in an Infratec 1241 can be used in routine in an Infratec NOVA. With this purpose a set of external validation samples had been analyzed in an Infratec NOVA using the same transmittance path length than in the Infratec.

Once the samples had ben analyzed the spectra has been exported and as reference values we add the predicted values obtained in NIR reflectance instruments calibrated with values for the official reference methods.

We will use this data to check the model or adjust it if necessary. We can validate using different number of terms to see if the model is overfitted for this external data set, and we will se that this is the case and that the best results are for 4 terms, but there is a bias due probably to dome differences in the instruments itself.

With 4 terms the validation (with the Monitor function) is:

We see the actual values in red, and that a Bias adjustment is recommended, so with the bias adjustment we would see the yellow dots.As we can see we have a bias, but the error with the Bias corrected is quite good (SEP=0,529).If we add more terms the statistics are not so good like this, so maybe the best option is to add this samples to the data base and recalibrate to add the new variability to the model.

27 ene. 2018

In the values of the constituents, we have some values with zeros, so these values must not be considered during the calibration. If we have a long data set we can look for the minimum and maximum values, and if the minimum is zero we can remove the samples with this value to develop the quantitative models.

Here are the histograms of the data sets without the samples with zeros.

We can make a PLS regression with all the samples and after to remove the outliers we found clear. So for the protein the PLS regression would be:

The LOO cross validation, will help us to decide which is the best number of terms to choose for the regression, so we can look to one of the explained variance plot, where we can see how the RMSEP decrease as the number of terms increase, but there will be a certain number of PLS terms where the RMSEP stay stable or even increase, so we must nor choose more terms than necessary in order not to over fit the model.

plot(Prot_plsr,"validation",estimate="CV")

If we look to the regression summary,

summary(Prot_plsr)

we can see that the best number of terms for the regression is nine.

Let´s see the statistics in a XY plot, and for it I am going to use a Monitor function I developed in R some time ago.

predictions<-(Prot_plsr$fitted.values[,,7])

soy_ift_prot2<-cbind(soy_ift_prot1$Sample,

soy_ift_prot1$Prot,

predictions)monitor10c24xyplot(soy_ift_prot2)

As we can see we must remove some outliers, which are out of the action limit (numbers in red), and decide what to do with the samples are out of the warning limit (numbers in orange).

The Monitor function take apart those sample, so we can remove them from the data frame and recalculate.

24 ene. 2018

A good way to see the variance explained by the PCs is a 2D plot, where we see the projection of the scores over the PC terms, so it is a way to see in which PC term we can see a discrimination, or outliers.In the case of the soy meal data, we can see the distribution of the scores in the plane formed by the first and second principal components.

and now imagine projecting the dots o perpendicularly over the axes (PCs), and this projections are the perpendicular dots of the next plot for the first and second PCs. In the projections of the first PC we see clearly out the samples 298 and 296, and if we would make a zoom of the second PC projections we would see clearly out the samples 373 and 298.

As we can see in this plot the whole variance of the data is explained by the first three PCs.

23 ene. 2018

Under all these post about Analyzing Soy meal in transmitance there is the excuse to work with "R" and to see a lot of chemometric functions which the available R packages offer. So continuing with this this is the fifth post about it.We are more use to see the Mahalanobis distance with ellipses, so let see the same as in the previous post with the "drawMahal" function of the Chemometric package.First we use the Nipals algorithm to calculate the score matrix T and the loading matrix P with the X matrix with the Math treatment MSC (Multiple Scatter Correction).X_msc_nipals<-nipals(X_msc,a=2)T_msc<-X_msc_nipals$TP_msc<-X_msc_nipals$PdrawMahal(T_msc,center=apply(T_msc,2,mean), covariance=cov(T_msc), quantile=0.975,col="blue", xlab="PC1",ylab="PC2")identify(T_msc)

As we can see we have the same outliers, but we see them in a different way.

22 ene. 2018

This is the fourth of the posts about analyzing soy meal unground in an Infratec, adding the sample directly to the conveyor in the same way that we do with wheat or barley. The range wavelength in the Infratec is from 850 to 1050nm, in steps of 2 nm, so we have a total of 100 data points.

I am use to look for outliers using the Mahalanobis distance (MD), which is based in the scores values for the samples in the Principal Component Space.

There are several packages in R, to see the value of the MD, and one of them is the package "Chemometrics", so we load this package and run sam script wit the values we have get from the previous post.

We can fit to ablines to configure the MD, one for Warning with a vaue of 3.00 and another for the Action with a value of 4.00. The line for warning is orange and for action is orange.

21 ene. 2018

This is the third of the posts about analyzing soy meal unground in an Infratec, adding the sample directly to the conveyor in the same way that we do with wheat or barley. The range wavelength in the Infratec is from 850 to 1050nm, in steps of 2 nm, so we have a total of 100 data points.

Now we can compare these samples with all the samples in the data set (including the extremes).

We can identify clearly sample 298 in the two plots, but 373 is not easy to see on the left plot, but it is in the direction of sample 298, so we can see the constituents values of this samples to have an idea why they are extremes in the second PC.

20 ene. 2018

This is the second of the posts about analyzing soy meal unground in an Infratec, adding the sample directly to the conveyor in the same way that we do with wheat or barley. The range wavelength in the Infratec is from 850 to 1050nm, in steps of 2 nm, so we have a total of 100 data points.

Once we have decided one of the math treatments to work, we can apply a Principal Components analysis to the data. This way we can understand better the structure of the data.X_msc_prcomp<-prcomp(X_msc)

This way we obtain two importan matrices, the score matrix and the loadings matrix (We have been talking about this matrices in other posts).In this post we are going to check the loadings that we can see graphically in two ways: as spectra or in the Principal Component space.If we want to se them as spectra (first three loadings), run this script in R:>matplot(wavelengths,X_msc_prcomp$center,type="l", xlab="wavelengths",ylab="transmitance")

Or we can see them in the Principal Components space, were we can see the range of variation

18 ene. 2018

One of the common applications in NIR analysis is the measure of soy meal, to predict Moisture, Protein, Fat and Fiber. As we know, Protein is the most important parameter and it is important to get an accurate prediction.

What about to measure soy meal in a transmittance instrument like Infratec?. Infratec has a smaller range, but this range (850 to 1050 nm) penetrate most into the sample, so we can measure in transmittance with a certain path length to avoid saturation. With this purpose, a certain number of samples with known reference value for the parameters was analyzed in the instrument, putting the soy meal unground and directly into the conveyor.

Spectra of the samples was export in a spectra file and lab values added.

Spectra file was export into R software as raw spectra, and a multiple scatter correction was added.