Largest Source Subset Selection for Instance Transfer

Abstract

Instance-transfer learning has emerged as a promising learning framework to boost performance of prediction models on newly-arrived tasks. The success of the framework depends on the relevance of the source data to the target data. This paper proposes a new approach to source data selection for instance-transfer learning. The approach is capable of selecting the largest subset S^* of the source data which relevance to the target data is statistically guaranteed to be the highest among any superset of S^*. The approach is formally described and theoretically justified. Experimental results on real-world data sets demonstrate that the approach outperforms existing instance selection methods.