Hi Cafe,
In one of my projects I have a lexer that seems to be taking an
inordinate amount of time and space. The lexer is generated by Alex
using the lazy ByteString (with position information) template. I
compiled and ran with profiling enabled and I get a report like this:
-------------------------------------------------------------------------------
Tue Jun 21 16:56 2011 Time and Allocation Profiling Report (Final)
ViewCallGraph +RTS -p -RTS gnutls.bc
total time = 51.80 secs (2590 ticks @ 20 ms)
total alloc = 9,482,333,244 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
alexScanTokens Data.LLVM.Private.Lexer 24.1 4.5
alex_scan_tkn Data.LLVM.Private.Lexer 21.2 32.7
tokenAs Data.LLVM.Private.Parser.Primitive 6.7 2.9
alexGetChar Data.LLVM.Private.Lexer 6.5 22.7
-------------------------------------------------------------------------------
The entries below these four are marginal. The third entry is from my
code and isn't a big deal (yet), but the other three seem to indicate
that the lexer is responsible for about 50% of my runtime and memory
allocation. For reference, this particular input is about 18M of
text, though the ratios are just as bad for smaller inputs.
My uneducated suspicion is that Alex is constructing separate
ByteStrings that it passes to each of my token constructors, and that
this is responsible for a large part of this allocation. Most of my
token constructors just ignore this ByteString - assuming there really
is an allocation for each token, is there any way to avoid it? I was
looking at alexScanTokens, alex_scan_tkn, and alexGetChar but didn't
see any obvious ways to improve them.
Alternatively, does any one have lexing performance tips that might
help?
Thanks
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110622/d5ac93c3/attachment.pgp>