To link to the entire object, paste this link in email, IM or documentTo embed the entire object, paste this HTML in websiteTo link to this page, paste this link in email, IM or documentTo embed this page, paste this HTML in website

Large scale and high- throughput pattern matching on parallel architectures

LARGE-SCALE AND HIGH-THROUGHPUT PATTERN MATCHING
ON PARALLEL ARCHITECTURES
by
Yi-Hua Edward Yang
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
December 2011
Copyright 2011 Yi-Hua Edward Yang

Large-scale pattern matching has many applications ranging from text processing to deep packet inspection (DPI) where hundreds or thousands of pre-defined strings or regular expressions (regexes) are matched concurrently and continuously against high-bandwidth data input. The large number of patterns and the high matching throughput make large-scale pattern matching both compute and memory intensive. ❧ In this thesis, we propose novel algorithms, constructions, and optimizations to accelerate large-scale pattern matching on two prominent classes of parallel architectures: Field Programmable Gate Arrays (FPGA) and general-purpose multi-core processors. We focus our studies on string pattern matching (SPM) and regular expression matching (REM) in the context of DPI for network intrusion detection. We utilize various design methodologies including pipelining, partitioning, parallel processing, aggregation and modular composition to improve the performance of our SPM and REM solutions on both FPGA and multi-core architectures. ❧ For SPM, we analyze various real-life dictionaries as lexical trees and identifyidentify the "double power-law" distribution commonly present in the tree nodes. We then propose a head-body partitioning algorithm to partition a dictionary tree into a small "head" and a memory-efficient "body" running in parallel. The "head" part is mapped either as a pipelined binary search tree on FPGA or as a small deterministic finite automaton (DFA) on a processor core; the "body" part is implemented as a compact and variable-stride body branch data structure. Together the head and body parts achieve high-bandwidth and attack-resilient matching throughput with good memory efficiency. ❧ For REM, we propose a modified version of the classic McNaughton-Yamada construction. Our modified construction converts an arbitrary regex into a modular nondeterministic finite automaton (NFA) suitable for implementation on FPGA. We also design a spatial stacking technique to easily construct multi-character matching circuit; a BRAM-based character classification to improve the resource efficiency; and a 2-dimensional staged pipeline to operate large number of REM circuits in parallel on FPGA. On a multi-core system, we transform the modular NFA into a segmented NFA, each segment mapped to a (64-bit) word processed as a unit by the processor core. Various techniques are applied to improve the segment processing in both memory and computation efficiency.Various techniques are applied to improve the segment processing in both memory and computation efficiency. ❧ To handle frequent and dynamic pattern updates, we provide algorithms for fast compilation of large dictionaries, as well as automated construction of large-scale REM circuits. For each of the proposed solutions, we evaluate our designs and optimizations using real-life DPI patterns and data streams with a wide range of characteristics.using real-life DPI patterns and data streams with a wide range of characteristics. ❧ Computationally, a DFA is more efficient than an NFA. However, converting an NFA to an equivalent DFA can sometimes cause exponential state explosion, making the size of the DFA significantly larger and practically infeasible to implement. In the final part of this thesis, we introduce a novel semi-deterministic finite automata (SFA) which lies between NFA and DFA in terms of computation and memory complexities. We propose state convolvement test and compatible state grouping algorithms to convert NFA into SFA with controlled space-time tradeoff. Although constructing a minimum-sized SFA is shown to be NP-complete, we develop a greedy heuristic to quickly construct a near-optimal SFA in time and space quadratic in the number of states in the original NFA.

The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.

LARGE-SCALE AND HIGH-THROUGHPUT PATTERN MATCHING
ON PARALLEL ARCHITECTURES
by
Yi-Hua Edward Yang
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
December 2011
Copyright 2011 Yi-Hua Edward Yang