Lexical Analyzer

This is a discussion on Lexical Analyzer within the C++ Programming forums, part of the General Programming Boards category; Hi everyone !!
I'm trying to code a Lexical Analyzer in C++, so can anyone tell me where to start ...

A word of caution about using Spirit, though - for complex grammers, long compilation times and slow parsing performance often result. The former is mostly due to the deeply nested template instantiations, and the latter because Spirit was designed foremost to be a feature-rich, flexible library - efficiency was not the primary focus. That isn't to say that it's terribly bloated, either - with something as complicated as C++, it's probably unfair to expect anything other than a hand-coded implementation to run blazingly fast.

I have read through a good portion of the Spirit source, though, and can honestly vouch that it's an extremely well-designed piece of software - a true masterpiece, actually. Considering all that it has to offer, it's hard to imagine that it could have been implemented with any much less of a footprint, really. So, the bottom line is that, yes, it's certainly well-equipped enough to handle C++ parsing, but probably not sufficient to be used in something like a production-level compiler, obviously.

Spirit is a full-blown parser language. Lexical analysis, while it can be viewed as a form of parsing (it is), is really a lot simpler than that. Applying Spirit at the character level seems like overkill, and it's always bothered me that it's not very straightforward to hook Spirit up to a more traditional lexer. You can do it, but you need to be a Spirit expert, and who wants to be one of those?

That's sort of the point of using something like Spirit, though - to do away with the old-style lexer, altogether. Rather than mess with clunky (and error prone) finite state mechanics and specialized subroutines, you simply express the grammer pretty much in it's most direct form, using chains of reusable (and often passive) token-recognizers coupled with the semantic actions that do the actual work of organizing the information as you see fit. This could even entail generating a meta-grammer that is further parsed at a higher level, for instance. So the traditional lexer really isn't at all necessary (or rather, use one or the other, but not both).