Forum Stats

Different results with leave-one-out X-Val

first of all thanks for your fantstic data mining tool RM.I'm using version 5.0.008 und I've got a problem:

When I do a X-Validation with leave-one-out on my data set (random seed in main process is still 2001) I get different results.I've found out that when I don't set the leave-one-out option you can now set and unset the "use local random seed" option.Okay so far. When I set the "use local random seed" with let's say 1000 and now set the leave-one-out option again I get a result of 69% accuracy.But if I leave the 'use local random seed" unset and now set the leave-one-out option again I get about 74% accuracy?

How can that be? ???It seems a bit absurd to me as these option mustn't even come in effect since the leave-one-out option is set...(?)

Hi,in fact you are right. This behavior results from the way the cross-validation sets are built: Instead of treating the case with x=n different, there are simply built n random sets all consisting of one single example. The result is the same, unlike you are using an algorithm incorporating randomness like the LibSVM does.Hence the XValidation then consumes the first random numbers of the global random number sequence, the LibSVM behaves different, because receiving different numbers...

Hi Sebastian,that's because me and my group we're trying to achieve best results (accuracies) in classifiaction of our data.First we're doing a grid search for the best parameters for the SVM (gamma and C) and after applying these we're doing a x-val again.(I know about overfitting the model but in this case it doesn't matter...)And at that point I noticed the effect with the leave-one-out option. By the way, we've also got the same problem like in thread topic http://rapid-i.com/rapidforum/index.php/topic,214.msg831.html#msg831 but the solution given there doesn't work at all. (But that also doesn't matter.)

So I just wanna know which accuracy I should choose, because I don't know which one's the right one.we need to know this in order to finish our study...

Hi,I guess it doesn't matter As long as you optimize without a valid performance estimation, the accuracy in a following validation would increase anyway. So go ahead with the higher value. But down to the point: You can't say. It's just an estimation and the differences seem to come from the randomness of the process itself...So you might repeat it several times with varying randomseeds / settings and average to get a valid estimation...