Abstract

This report details the implementation of a fragment extraction algorithm using
an average case linear time tree kernel. Given a treebank, the algorithm
extracts all fragments that occur at least twice, along with their frequency.
Evaluation shows a 70-fold speedup over a quadratic fragment extraction
implementation. Additionally, we add support for trees with discontinuous constituents.