On 3/14/2013 2:46 PM, Aaronne wrote:> Hi Smart Guys,>> I have got the data (can be downloaded here: [enter link description > here][1]) and tried to run a simple LDA based classification based on > the 11 features stored in the dataset, ie, F1, F2, ..., F11.>> Here I wrote some codes in Matlab using only 2 features. May I ask > some questions based on the codes I have got please?>> clc; clf; clear all; close all;> %% Load the extracted features > features = xlsread('ExtractedFeatures.xls');> numFeatures = 23;> %% Define ground truth> groundTruthGroup = cell(numFeatures,1);> groundTruthGroup(1:15) = cellstr('Good');> groundTruthGroup(16:end) = cellstr('bad');> %% Select features> featureSelcted = [features(:,3), features(:,9)];> %% Run LDA> [ldaClass, ldaResubErr] = > classify(featureSelcted(:,1:2), featureSelcted(:,1:2), > groundTruthGroup, 'linear');> bad = > ~strcmp(ldaClass,groundTruthGroup);> ldaResubErr2 = sum(bad)/numFeatures;> [ldaResubCM,grpOrder] = > confusionmat(groundTruthGroup,ldaClass);> %% Scatter plot> gscatter(featureSelcted(:,1), featureSelcted(:,2), > groundTruthGroup, 'rgb', 'osd');> xlabel('Feature 3');> ylabel('Feature 9');> hold on;> plot(featureSelcted(bad,1), featureSelcted(bad,2), 'kx');> hold off;> %% Leave one out cross validation> leaveOneOutPartition = cvpartition(numFeatures, > 'leaveout');> ldaClassFun = @(xtrain, ytrain, > xtest)(classify(xtest, xtrain, ytrain, 'linear'));> ldaCVErr = crossval('mcr', > featureSelcted(:,1:2), ...> groundTruthGroup, 'predfun', ldaClassFun, 'partition', > leaveOneOutPartition);> %% Display the results> clc;> disp('______________________________________ Results > ______________________________________________________');> disp(' ');> disp(sprintf('Resubstitution Error of LDA (Training Error > calculated by Matlab build-in): %d', ldaResubErr));> disp(sprintf('Resubstitution Error of LDA (Training Error > calculated manually): %d', ldaResubErr2));> disp(' ');> disp('Confusion Matrix:');> disp(ldaResubCM)> disp(sprintf('Cross Validation Error of LDA (Leave One Out): %d', > ldaCVErr));> disp(' ');> disp('______________________________________________________________________________________________________');>>> I. My first question is how to do a feature selection? For example, > using forward or backward feature selection, and t-test based methods?>> I have checked that the Matlab has got the `sequentialfs` method but > not sure how to incorporate it into my codes.> II. How do using the Matlab `classify` method to do a classification > with more than 2 features? Should we perform the PCA at first? For > example, currently we have 11 features, and we run PCA to produce 2 or > 3 PCs and then run the classification? (I am expecting to write a loop > to add each feature one by one to do a forward feature selection. Not > just run PCA to do a dimension reduciton.)>> III. I have also try to run a ROC analysis. I refer to the webpage > [enter link description here][2] which has got an implementation of a > simple LDA method and produce the linear scores of the LDA. Then we > can use `perfcurve` to get the ROC curve.> IIIa. However, I am not sure how to use `classify` method with > `perfcurve` to get the ROC.>> IIIb. Also, how to do a ROC with the cross-validation?>> IIIc. After we have got the `OPTROCPT`, which is the best cut-off > point, how can we use this cut-off point to produce better > classification?>> %% ROC Analysis> featureSelcted = [features(:,3), > features(:,9)]; groundTruthNumericalLable = > [zeros(15,1); ones(8,1)];> % Calculate linear discriminant coefficients> ldaCoefficients = LDA(featureSelcted, > groundTruthNumericalLable);> % Calulcate linear scores for the training data> ldaLinearScores = [ones(numFeatures,1) > featureSelcted] * ldaCoefficients';> % Calculate class probabilities> classProbabilities = exp(ldaLinearScores) ./ > repmat(sum(exp(ldaLinearScores),2),[1 2]);> % Fit probabilities for scores> figure,> [FPR, TPR, Thr, AUC, OPTROCPT] = > perfcurve(groundTruthNumericalLable(:,1), classProbabilities(:,1), 0);> plot(FPR, TPR, 'or-')> xlabel('False positive rate (FPR, 1-Specificity)'); ylabel('True > positive rate (TPR, Sensitivity)')> title('ROC for classification by LDA')> grid on;>> IV. Currently, I calculate the accuracy of the training and cross > validation errors by the classify and `crossval` functions. May I ask > how to get those values in a summary by using `classperf`?>> V. If anyone knows a good tutorial of using Matlab statistic toolbox > to do machine learning task with a full example please tell me.> Some Matlab Help examples are really confusing to me because the > examples are made in pieces and I am really a novice to machine > learning. Sorry if I asked some question bot proper. Thanks very much > for your help.>>>> A.>>> [1]: http://ge.tt/6eijw4b/v/0> [2]: > http://matlabdatamining.blogspot.co.uk/2010/12/linear-discriminant-analysis-lda.html