OriginallywrittenbyPeterBumbulis(peter@csg.uwaterloo.ca)CurrentlymaintainedbyBrianYoung(bayoung@acm.org)There2cdistributioncanbefoundat:http://www.tildeslash.org/re2c/index.htmlThesourcedistributionisavailablefrom:http://www.tildeslash.org/re2c/re2c-0.9.1.tar.gzThisdistributionisacleanedupversionofthe0.5releasemaintainedbyme(BrianYoung).Severalbugswerefixedaswellascodecleanupforwarningfreecompilation.Ithasbeendevelopedandtestedwithegcs1.0.2andgcc2.7.2.3onLinuxx86.PeterBumbulis' original release can be found at:ftp://csg.uwaterloo.ca/pub/peter/re2c.0.5.tar.gzre2cisagreattoolforwritingfastandflexiblelexers.Ithasservedmanypeoplewellformanyyearsanditdeservestobemaintainedmoreactively.re2cisontheorderof2-3timesfasterthanaflexbasedscanner,anditsinputmodelismuchmoreflexible.Patchesandrequestsforfeatureswillbeentertained.Areasofparticularinteresttomeareporting(aSolarisandanNTversionwillbeforthcoming)andwidecharactersupport.Notethatthecodeisalreadyquiteportableandshouldbebuildableonanyplatformwithminormakefilechanges.

Version 0.5 Peter’s original ANNOUNCE and README:

re2c is a tool for generating C-based recognizers from regular
expressions. re2c-based scanners are efficient: for programming
languages, given similar specifications, an re2c-based scanner
is typically almost twice as fast as a flex-based scanner with
little or no increase in size (possibly a decrease on cisc
architectures). Indeed, re2c-based scanners are quite competitive
with hand-crafted ones.
Unlike flex, re2c does not generate complete scanners: the user
must supply some interface code. While this code is not bulky
(about 50-100 lines for a flex-like scanner; see the man page
and examples in the distribution) careful coding is required for
efficiency (and correctness). One advantage of this arrangement
is that the generated code is not tied to any particular input
model. For example, re2c generated code can be used to scan
data from a null-byte terminated buffer as illustrated below.
Given the following source:
#define NULL ((char*) 0)
char *scan(char *p) {
char *q;
#define YYCTYPE char
#define YYCURSOR p
#define YYLIMIT p
#define YYMARKER q
#define YYFILL(n)
/*!re2c
[0-9]+ {return YYCURSOR;}
[\000-\377] {return NULL;}
*/
}
re2c will generate:
/* Generated by re2c on Sat Apr 16 11:40:58 1994 */
#line 1 "simple.re"
#define NULL ((char*) 0)
char *scan(char *p) {
char *q;
#define YYCTYPE char
#define YYCURSOR p
#define YYLIMIT p
#define YYMARKER q
#define YYFILL(n)
{
YYCTYPE yych;
unsigned int yyaccept;
goto yy0;
yy1: ++YYCURSOR;
yy0:
if((YYLIMIT - YYCURSOR) < 2) YYFILL(2);
yych = *YYCURSOR;
if(yych <= '/') goto yy4;
if(yych >= ':') goto yy4;
yy2: yych = *++YYCURSOR;
goto yy7;
yy3:
#line 10
{return YYCURSOR;}
yy4: yych = *++YYCURSOR;
yy5:
#line 11
{return NULL;}
yy6: ++YYCURSOR;
if(YYLIMIT == YYCURSOR) YYFILL(1);
yych = *YYCURSOR;
yy7: if(yych <= '/') goto yy3;
if(yych <= '9') goto yy6;
goto yy3;
}
#line 12
}
Note that most compilers will perform dead-code elimination to
remove all YYCURSOR, YYLIMIT comparisions.
re2c was developed for a particular project (constructing a fast
REXX scanner of all things!) and so while it has some rough edges,
it should be quite usable. More information about re2c can be
found in the (admittedly skimpy) man page; the algorithms and
heuristics used are described in an upcoming LOPLAS article
(included in the distribution). Probably the best way to find out
more about re2c is to try the supplied examples. re2c is written in
C++, and is currently being developed under Linux using gcc 2.5.8.
Peter