Abstract

Graph data such as chemical compounds and XML documents are getting more
common in many application domains.
A main difficulty of graph data processing
lies in the intrinsic high dimensionality of graphs, namely,
when a graph is represented as a binary feature vector
of indicators of all possible subgraph patterns,
the dimensionality gets too large for usual statistical methods.
We propose an efficient method to select a small number of salient
patterns by regularization path tracking.
The generation of useless patterns is minimized by progressive extension of
the search space.
In experiments, it is shown that our technique is considerably more
efficient than a simpler approach based on frequent substructure mining.