Channels

Services

Google releases RE2 Regular Expression Library for C++

Google have released RE2, a regular expression library for C++, based on automata theory with guaranteed linear search times, limited stack use and higher performance. Regular expressions were introduced in the 1970s by Ken Thompson as a way to describe patterns of text in his text editor QED. Regular expressions have since been incorporated deep in Unix culture in tools such as ed, sed, grep, egrep, awk and lex and are embedded in the core of languages such as Perl, Python and JavaScript. They have also become part of geek culture.

Google makes use of regular expressions throughout their infrastructure and applications, but had noted that the common implementations were based on a backtracking search. This meant that it was possible to see run time on matching rise exponentially as the size of input increased and there was no limit on how much stack could be used. These issues made it ill suited to Google's need for a more thread-friendly, predictable regular expression matcher within its applications such as Code Search, Sawzall and BigTable. To address this problem, Google developers created RE2, a "mostly drop-in replacement" for PCRE's C++ bindings. RE2 is suited to larger inputs and offers a fast, safe and thread-friendly regular expression engine. The BSD-style licensed C++ code for RE2 is available to download from the project's Google Code page.