<div>Hi Anakim,</div>
<div> </div>
<div>Nice to see someone else working in this space.</div>
<div> </div>
<div>I have also been working on a set of parallel parsing techniques, which can use small Parsec parsers for local context sensitivity.</div>
<div> </div>
<div>See the second set of slides in <a href="http://comonad.com/reader/2009/iteratees-parsec-and-monoid/">http://comonad.com/reader/2009/iteratees-parsec-and-monoid/</a> for an overview of how I&#39;m doing something similar to feed Parsec independent chunks. Note that this approach bypasses the need for a separate sequential scan, which otherwise floods your cache, and lets you get closer to the performance limit imposed by Amdahl&#39;s law. </div>

<div> </div>
<div>The code in the second set of slides can be adapted to your case: load everything into a lazy bytestring or fingertree of strict bytestrings, then for each strict bytestring chunk in parallel, scan it for the first newline, and then start an iteratee based parsec parser from that point. I use the iteratee based parsec parsers so that when I glue the partial parses together I can feed the unparsed data on the left side of the first newline in each chunk to the parser I&#39;m joining on the left. I provide a monoid for the purpose of gluing together these partial parses, which encapsulates this behavior. </div>

<div> </div>
<div>
<div>The fingertree case is particularly nice, because it can be used to do cheap incremental reparsing using the same machinery based on out of band updates to the input.</div></div>
<div> </div>
<div>This approach is sufficient to parse a lot of interesting languages. As you have noted with Makefiles it can handle indentation based control structures, and with a variation on a Dyck language monoid it can be extended to Haskell-style layout or parenthesis matching/lisp parsing by applying the same techniques at a higher level.</div>

<div> </div>
<div>-Edward Kmett </div>
<div> </div>
<div class="gmail_quote">On Wed, Sep 9, 2009 at 10:42 AM, Anakim Border <span dir="ltr">&lt;<a href="mailto:akborder@gmail.com">akborder@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">
<div class="im">&gt; Very interesting idea!<br>&gt;<br>&gt; I think the big thing would be to measure it with GHC HEAD so you can<br>&gt; see how effectively the sparks are being converted into threads.<br>&gt;<br>&gt; Is there a package and test case somewhere we can try out?<br>
<br><br></div>At this point the parser is just a proof of concept. For those brave<br>enough, however, I&#39;ve put the code on github:<br><br><a href="http://github.com/akborder/HsMakefileParser/" target="_blank">http://github.com/akborder/HsMakefileParser/</a><br>
<br>The &quot;<a href="http://test.mk/" target="_blank">test.mk</a>&quot; file provides some test cases. To get an input big<br>enough to measure multiple thread performances, you can concatenate<br>that file a few thousand times: the timings in my previous message<br>
were obtained parsing a 3000x concatenation (final size: 1.1 MB).<br>
<div>
<div></div>
<div class="h5">_______________________________________________<br>Haskell-Cafe mailing list<br><a href="mailto:Haskell-Cafe@haskell.org">Haskell-Cafe@haskell.org</a><br><a href="http://www.haskell.org/mailman/listinfo/haskell-cafe" target="_blank">http://www.haskell.org/mailman/listinfo/haskell-cafe</a><br>
</div></div></blockquote></div><br>