Forum Stats

Target Shuffling

Is there any quick way to implement "Target Shuffling" in RM?In a target shuffling model evaluation, performance should be measured for the actual dataset as well as for a number of datasets with randomly rearranged label values.Using random labels is not enough. The actual labels should be used and assigned to different examples.

0

Answers

The way I understand it is as following:Step 1. You train a classifier and observe that is has X percent accuracy.Step 2. You then randomize your labels, train another classifier, and observe that is has Y percent accuracy.Step 3. You repeat Step 2 multiple times and find Z = best(Y).

When X is sufficiently better then Z, you claim that the model underlying X is not caused by noise.

Target Shuffling works as you describe. The only restriction is that step 2 should take care not to bias the label distribution. That is why the original set of labels is used with randomized order (hence the term shuffling instead of randomizing).

Such shuffled dataset can be easily constructed using R or even excel, but how could one implement the whole proccess in RM and get one final result?Some times the top-n random models are required for comparison and some dataset similarity measures. This is to avoid using too many repeats in step 3, in relation to the number of examples, and have a large number of datasets that are not truly shuffled.

I implemented Target Shuffling in RM.I saved it as a Building Block for easy inclusion in projects.The enclosed code is for a building block. Save it in a file called [tt]Target Shuffling.buildingblock[/tt] your repository directory.

I hope you find it useful.

I'll be happy to get any comments.

Sincerely, Amnon Khen

Target ShufflingShuffles the labels of the input example set. Be sure to define the label and id attribute names.sort_up_down.png<?xml version="1.0" encoding="UTF-8" standalone="no"?><!-- This achieves "target shuffling". I don't know if it is the most elegant way.

It does so by: 1) multiplying the example set 2) in one copy: 2.1) leave only the label 2.2) shuffle examples (which are only hte labels) 3) in the other: 3.1) remove the label 3.2) rename the id to old_id 3.3) make it a regular attribute 4) add a "fake" id column to both copies 5) join copies 6) clean up: 6.1) remove fake id 6.2) rename old_id to id 6.3) make it an id attribute