Séminaire Donnees et APprentissage Artificiel

Statistical significance testing on graphs

26/06/2009Intervenant(s) : Gemma C. Garriga (Helsinki Institute for Information Technology)Mining graph data is an active research area. Several data mining methods and algorithms have been proposed to identify structures from graphs; still, the evaluation of those results has not been considered. Within the framework of statistical hypothesis testing, we focus on randomization techniques for unweighted undirected graphs. Randomization is an important approach to assess the statistical significance of data mining results. Given an input graph, our randomization method will sample data from the class of graphs that share certain structural properties with the input graph. We will describe three alternative algorithms based on local edge swapping and Metropolis sampling. We test our framework with various graph data sets and mining algorithms for two applications, namely graph clustering and frequent subgraph mining. We will study also how the randomization techniques can be used for multiple hypothesis testing on graphs, and how to assess the results of queries on muliplerelational data.