q-gram Matching Using Tree Models

By Prahlad Fogla and Wenke Lee

Abstract

q-gram matching is used for approximate substring matching problems in a wide range of application areas, including intrusion detection. In this paper we present a tree based model to perform fast linear time q-gram matching. All q-grams present in the text are stored in a tree structure similar to Trie. We use a tree redundancy pruning algorithm to reduce the size of the tree without losing any information. We also use suffix links for fast q-gram search during query matching. We compare our work with the Rabin-Karp based hash-table technique, commonly used for multiple q-gram search. We present results of experiments on system call sequence data used for intrusion detection