Recently I have been analyzing my protein microarray data. I choose vsn method to process the raw data as many relative researches did. My data looks like as below:

protein name

T1

~

T87

N1

~

N62

1

2789

...

2760

1980

...

1800

2

3360

...

4080

5260

...

4800

3

10068

...

12000

5500

...

6200

~

...

...

...

...

...

...

190

16032

...

18000

6500

...

7350

Negative control

990

...

1120

2010

...

780

There are two groups of samples and 190 kinds of proteins, and T represents tumor group(n=87), N represents normal groups(n=62). The last row is the control spot with 'No DNA'.

Firstly, I use my data to construct expression set. Then I can call justvsn to normalize the whole data just as the 'Introduction to robust calibration and variance stablisation with VSN Wolfgang Huber April 16,2015' says, as in content 3 "Running VSN on data from mutiple arrays(single colour normalisation)".

Now I want to normalize the data like Suman Sundaresh et al did in their article " From protein microarrays to diagnostic antigen discovery: a study of the pathogen Francisella tularensis". In "Data preprocessing and normalization" section of the article says "Since the dataset contains expression profiles of 244 of the 1741 F.tularensis antigens that generated some immune response, only the seven known true-negative intra-array control signals (cell-free expression reactions lacking template gene) are used as ‘house-keeping’ probes to obtain the scale and offset parameters. The transformation function‘vsn’ is then applied to the whole dataset using these parameters. This method calibrates the measurements and renders the variance relatively independent of the mean signal."

So I want to do the same to use negative controls to obtain these parameters then to normalize the whole data. But I don't know how to write the R command.Hope anyone can tell me how to do it. Thanks a lot.

Using only 7 probes as input to fit the vsn model is pushing the limits of identifiability though (i.e. the result could be rather noisy). My recommendation would be to try use more probes (say, at least 40), even for the price of some small bias. You could do some experimentation with these choices and hopefully it doesn't make a big difference.

I thought Chapter 7 was all about positive controls or features which were differentially expressed.Now I get it that it can also apply to negative controls. But the example in the chapter says that taking features 100 to 200 as spike-in controls then to normalize the whole dataset. My understanding of features means the proteins in my data. However, my dataset has only one "No DNA" probe but with 149 samples(87T+62N), that means there is only one feature. I'll show my data in R:

No, you cannot fit the vsn parameters based on data from one probe only.

I recommend calling vsn in the standard way, and visualising the values of the "mean-Rv-Negative" probes to see whether there are, e.g. any remaining large trends, or indvidual outliers (i.e. use the probe for quality assessment),.

so you recommend I treated the "mean-Rv-Negative" as another protein, say my dataset includes 96 kinds of protein(in fact 95 kinds of proteins expressed from 85 ORFs plus one "no DNA" control) and use justvsn method to transform the all 96 proteins data instead of leave the "no DNA" control out? I'm not sure the vsn method can transform the real probes including the "no DNA" control? I always think the method can only treat the real probes with express proteins. I tried the two ways and used the meanSdPlot() to verify the transformation. Transformation the 95 kinds of probes shows better variance-stabilization than the 96 kinds of probes transformation.