Using structure to select features in high dimension

AI seems impossible to dissociate from Big Data, usually intended to mean hundreds of thousands of training samples if not more. But what if what's large about your data is the number of features? This setup poses different statistical and computational challenges, and traditional feature selection methods fall short. The field of structured sparsity offers solutions in the case where a structure (e.g. group, tree, or graph) can be given over the features: the selected features should respect this structure. Structured sparsity methods aim at making good predictions using a small number of features (sparsity) consistent with the given structure; for instance, these features will belong to a small number of predefined groups, or be connected on a predefined graph.

This talk is motivated by applications to genetics, in which it is usual to have orders of magnitude more features than samples, and prior knowledge is available as structure over the features, but it is not the only setting in which this applies.