I have a microarray expression dataset (46 samples, thousands of attributes) and I want to perform feature selection first, and, based on this subset of features (shouldn't be more than 4 or 5, based on my reduced number of samples) built a classifier model. Due to the reduced possibilities of 46 samples I would like to ask for some of your advice, to those who have already faced these type of problems.

In your experience, what selection strategies worked best (filter vs wrapper methods? any in particular?)

Do you use Cross-validation in feature selection? With this reduced dataset I can't divide my dataset and use one part for feature selection and other for building the classifier..

I've been using Weka software until now. Given this situation and the possibilities this software offers, can I do separately the feature selection (Select Attributes window), and after, remove the "useless" attributes from the arff (preprocess window) and build the classifier later (Classify window)

or should I use a meta classifier (e.g.AttributeSelectedClassifier)?

Not sure about when is this last option recommended. Would this avoid overfitting better than the former manner?

Sorry if the post isn't precise enough, but any experience related to this kind of problems, any suggestion of pipeline (and in weka, if possible), would be appreciated.

$\begingroup$Feature selection should be part of the cross-validation scheme. You will probably find useful tips by browsing our cross-validation tag.$\endgroup$
– chlJul 31 '13 at 14:21

$\begingroup$Here I get a bit confused between the cross-validation used to estimate the performance of the feature selection and the CV used to estimate the performance of the classifier. They're 2 different CV, aren't they? Since you can't train the model until you know your attributes, and not until you've trained your model can you test it and estimate its performance. Sorry if I'm missing something too obvious...$\endgroup$
– PGreenJul 31 '13 at 14:31

1

$\begingroup$Besides this confusion I have, @chl, I've seen stats.stackexchange.com/questions/2306/…. I'll have a look to the references you give (using RF for this kind of problems). Thank you$\endgroup$
– PGreenJul 31 '13 at 14:52

$\begingroup$Yes, usually you would need two CV loops (unless using ensemble methods), and feature selection is embedded in an outer-loop, otherwise there's a risk of over-fitting/over-optimism. See The Elements of Statistical Learning (§7.10.2 in the latest electronic version--2nd ed., print 10) and work by A.-L. Boulesteix, for example.$\endgroup$
– chlJul 31 '13 at 15:33