Sunday, February 23, 2014

Approximately a year ago I made a post graphing unemployment in Europe and other locations. I have always wanted to do this again, not because the R-code would be so interesting, but just because I wanted to see the plots. As time progressed I attempted not to do this in R, but in Julia. I could not get it good enough in Julia, so this is, alas, the R version.

Data

Data from Eurostat. Or, if you are lazy, Google une_rt_m, which is the name of the table. There is a bit of pre-processing of the data, mostly getting names of countries decent for plotting. The plots shown are unemployment and its first derivative, both smoothed.

Sunday, February 16, 2014

Last week I extended my Bayesian model. So this week I wanted to test it with different data. There is one other data set with profiling data in R, french fries in the reshape package. 'This data was collected from a sensory experiment conducted at Iowa State University in 2004. The investigators were interested in the effect of using three different fryer oils had on the taste of the fries'. In this post I try t analyze the data, however the documentation which I found is so limited, it is needed to second guess the objectives of the experiment.

Data

The data includes a treatment and a time variable. The time is in weeks and crossed with the other variables, suggesting that this is systematically varied and hence in itself a factor of interest. In the end I decided to cross treatments and time to generate a product factor with 30 levels. In addition I crossed time with a repetition factor so this effect could function as a random variable. As bonus, there are some missing data, which Stan does not like and are removed before analysis.data(french_fries,package='reshape')head(french_fries)vars <- names(french_fries)fries <- reshape(french_fries, idvar=vars[1:4], varying=list(vars[5:9]), v.names='Score', direction='long', timevar='Descriptor', times=vars[5:9])head(fries)fries$Product <- interaction(fries$treatment,fries$time)fries$Session <- interaction(fries$time,fries$rep)fries$Descriptor <- factor(fries$Descriptor)fries <- fries[complete.cases(fries),]

Results

Traceplot is not shown, but the general plot provides some overview.

Plot of the profile plot has been adapted given the design. This plot should convey the message the data contains. As time progresses the flavor becomes more rancid and pointy, less potato. This process starts at circa 4 weeks.Treatment 3 seems least affected, but the difference is minute.

Sunday, February 9, 2014

Last week I made the core of a Bayesian model for sensory profiling data. This week the extras need to be added. That is, there are a bunch of extra interactions and the error is dependent on panelists and descriptors.
Note that where last week I pointed to influence of Procrustes and STATIS in these models, I probably should have mentioned Per Brockhoff's work too.

Data

See last week

Model

A few features were added compared to last week: Round effect was averaged over all descriptors. It is now dependent on descriptors. As normalization, the sum of the round effects within a descriptor is fixed to be 0. Similar, a shift effect was defined per panelist. It is now per panelist*descriptor combination, again normalized to sum to 0 by descriptor. Residual error was defined as one variable for all data. It is now descriptor and panelist dependent. I decided to add variances there. It turned out to be quite a complex model. Running time was about 100 samples per minute on my hardware: too long to sit there waiting for it, but could fit in during a meeting or lunch. model1 <- 'data { int<lower=0> npanelist; int<lower=0> nobs; int<lower=0> nsession; int<lower=0> nround; int<lower=0> ndescriptor; int<lower=0> nproduct; vector[nobs] y; int<lower=1,upper=npanelist> panelist[nobs]; int<lower=1,upper=nproduct> product[nobs]; int<lower=1,upper=ndescriptor> descriptor[nobs]; int<lower=1,upper=nround> rounds[nobs]; real maxy; }parameters { matrix<lower=0,upper=maxy> [nproduct,ndescriptor] profile; vector<lower=-maxy/3,upper=maxy/3>[npanelist] shift[ndescriptor]; vector<lower=-1,upper=1> [npanelist] logsensitivity; vector<lower=-maxy/3,upper=maxy/3> [nround] roundeffect[ndescriptor]; real<lower=0,upper=maxy/5> varr; vector [npanelist] lpanelistvar; vector [ndescriptor] ldescriptorvar; }transformed parameters { vector [nobs] expect; vector[npanelist] sensitivity; real mlogsens; real mlpanelistvar; real mldescriptorvar; real mroundeff[ndescriptor]; real meanshift[ndescriptor]; vector [nobs] sigma;

Results

I do not think there is much point in showing all printed output. However the summary plot is interesting. There is something with the eight level of some of the factors, a few extra samples might be not unwelcome.
The error is more dependent on panelist than on descriptor, panelists 7, 20 and 29 might benefit from some training.

profile

The code for profile has been slightly modified, last week I only used a few of the samples. For me the intervals look nice and sharp. Other than that choc3 is very different from the others, more sweet, milk, less cocoa. Choc1 very bitter and cocoa.

rounds

It is premature based on one data set, but rounds do seem to have minimal effect. If this was structural on more data sets, this term might be removed.

Panelists' shift

The plot shows that shift is important and this is a factor which should be in the model. Getting this under control might cost more than it is worth.

Sunday, February 2, 2014

I looked at Bayesian analysis of sensory profiling data in May and June 2012. I do remember not being totally happy with the result and computations taking a bit more time than I wanted. But now it is 2014, I can use STAN and I have been thinking about the model I want a bit more. Hence a fresh start.

Data

Data is the chocolate data from SensoMineR. I find it more convinient to have the data in long format, so the first action is a reshape. Score gets an as.numerical, since it was integer and I do not know how that effects STAN calculations.data(chocolates,package='SensoMineR')vars <- names(sensochoc)choc <- reshape(sensochoc, idvar=vars[1:4], varying=list(vars[5:18]), v.names='Score', direction='long', timevar='Descriptor', times=vars[5:18])choc$Descriptor <- as.factor(choc$Descriptor)choc$Score <- as.numeric(choc$Score)

Model

sensory panel data

To explain the model, I first need to explain about sensory panel data. The objective of the sensory experiment is to obtain a profile, a table which shows how strong food tastes on selected descriptors (pre-defined properties). To obtain this, panelists taste food stuffs and score the strength of the food stuff on the descriptors. The order of tasting the food stuffs is registered as rounds. Unfortunately panelists have 'errors' in their scores. To minimize the effect of the errors, 10 to 20 panelists are used, possibly with repetitions.

standard analysis

In general these data are analyzed by descriptor with ANOVA models such as:
score ~ product + panelist + round + session + error,
possibly with second order interaction terms, of which panelist*product would be the largest.

Alternatively, methods such as Procrustes or STATIS are used. These methods try to find communality in the n dimensional configuration which products occupy in the the descriptor space. In particular, Procrustes tries to find a common configuration by shifting, scale (size) change and rotating individual configurations such that they all align. Note that this process destroys the link between products and descriptors.

building a model

The aim of this post is to build the bare bones of a model which combines the good parts of Procrustes with the features of a linear model. The model must analyze all descriptors in one go. Hence one part of the model is the profile. Panelist effects are covered with a shift and shrinking of the profile. At this point the shift and scale is covered by one parameter per panelist, this will probably be changed in a second version of the model. A rounds effect has been added, common for all descriptors and all panelists, another spot for model extensions.
A few priors have been added on scale usage, the values seemed reasonable for a trained panel. model1 <- 'data { int<lower=0> npanelist; int<lower=0> nobs; int<lower=0> nsession; int<lower=0> nround; int<lower=0> ndescriptor; int<lower=0> nproduct; vector[nobs] y; int<lower=1,upper=npanelist> panelist[nobs]; int<lower=1,upper=nproduct> product[nobs]; int<lower=1,upper=ndescriptor> descriptor[nobs]; int<lower=1,upper=nround> rounds[nobs]; real maxy; }parameters { matrix<lower=0,upper=maxy> [nproduct,ndescriptor] profile; vector<lower=-maxy/3,upper=maxy/3>[npanelist] shift; vector<lower=-1,upper=1> [npanelist] logsensitivity; vector<lower=-maxy/3,upper=maxy/3> [nround] roundeffect; real<lower=0,upper=maxy/5> sigma; }transformed parameters { vector [nobs] expect; vector[npanelist] sensitivity;// the following variables are not strictly needed but // greatly increase speed and # effective samples real meanshift; real mlogsens; real mroundeff; meanshift <- mean(shift); mlogsens <- mean(logsensitivity); mroundeff <- mean(roundeffect);// end of definitions of variables for speed for (i in 1:npanelist) { sensitivity[i] <- pow(10,logsensitivity[i]-mlogsens); } for (i in 1:nobs) { expect[i] <- profile[product[i],descriptor[i]] *sensitivity[panelist[i]] + shift[panelist[i]]-meanshift + roundeffect[rounds[i]]-mroundeff; } }model { logsensitivity ~ normal(0,0.1); shift ~ normal(0,maxy/10); roundeffect ~ normal(0,maxy/10); y ~ normal(expect,sigma); } '

Wiekvoet

Wiekvoet is about R, JAGS, STAN, and any data I have interest in. Topics range from sensometrics, statistics, chemometrics and biostatistics. For comments or suggestions please email me at wiekvoet at xs4all dot nl.