Sunday, May 28, 2006

Continue on yesterday's work, I finished converting MiniPerl6 module so that it now work happily with ratchet.

The problem is that I used \w+ to match variable names but I should use [<alpha>|_]\w* instead (since the variables named with numbers are numbered captures, a special category). Another problem was hidden, or bypassed, by changing the order of alternations after I fixed the rule-using problem. It happened to be that the named capture parsing code in Pugs::Emitter::Rule::Perl5::Ratchet used a reference of boolean value in an "if" statement. I have to add another pair of ${} around it, at least before switching to Perl 6.

Saturday, May 27, 2006

The Pugs::Grammar::MiniPerl6 module was broken since the default setting of parsing option of Pugs::Compiler::Rule several days ago so that the ratchet option was turned off in it. Currently compile_p6grammar.pl generates a module used no ratchet feature which is identical to that it used to do. So that I focused on it these days.

I fixed the ratchet Perl 5 emitter on backslashes and named capture yesterday. And, I've almost finished fixing the MiniPerl6 module itself.

The main parts are to revise the grammar and update the translator compile_p6grammar.pl. The former one including changing some rules into "token" and rewrite the others to fit the :sigspace. And the later part is to make it accept "token" and "regex" as well as "rule."

Another big change is a modifier ":p" on rule ProductionRule. Since ProductionRule is the top rule in the grammar, the match of it should start from the very begin of input string. Correspondingly, compile_p6grammar.pl also translate it correctly.

I said that I "almost" finished fixing it. That means some of them are still broken and if you tried "make test" you will find the last test failed. I'll fix the temporary variable declaration as soon as possible. Stay tuned!

Synopsis This is a subproject of Pugs (http://pugscode.org). One of the goals of Pugs is to give an implementation of Perl 6 and help it bootstraping. A new bootstrap plan has been proposed shortly. It takes four steps and I propose to complete the first two steps in this subproject. It consists of two new perl 5 modules. One translates a very restrictive subset of Perl 6, MiniPerl6, to Haskell and the another translates the Perl 6 Rule in to Parsec (a Haskell parsing module) parsing code.

This is the easiest way to bootstrap we currently found. Bootstrapfrom pure Perl 5 has been proven unbelievable hard due to the lack ofgood parsing components. Pugs itself is written in Haskell because ofthe Parsec module. It makes the parsing easier but still the parserhas grown too large to debug and maintain. Since the Perl 6 Rule hasbeen discussed for a long time and became stable and powerful enough,the Perl 6 seems better be parsed by Perl 6 Rule itself.

Another feature of this project is that there is no "mixed" componentsin the whole process. Bootstrap plans often consist of some partsmixing two languages to get over the language gap before the targetcan be used. But we carefully designed the plan and languages to beused making that every components are written in a pure Perl 5 or Perl6. This make the project reusable.

Deliverables Module Pugs::Grammar::MiniPerl6 This module includes a Perl 6 Rule file and a Perl 5 script translates the rule into Perl 5 parser module with the help of Pugs::Compiler::Rule.

Module Pugs::Emitter::Rule::Parsec This is the module used in the second step of bootstrap. It emits Haskell code from the AST producted by Pugs::Compiler::Rule.

Module Pugs::Compiler::Rule This is an existing module mainly written by Flavio S. Glock (FGLOCK). Since it's heavily used is the above two modules and is not completed, I may spend some time improving it to support more Perl 6 Rule syntax.

1. This module, Pugs::Grammar::MiniPerl6, uses Pugs::Compiler::Rule to read a special *mixed* Perl 6 Rule whose production rules are written in Perl 5 (the current requirement of P::C::Rule). 1a. The rule is used to translate a subset of Perl 6, MiniPerl6, to haskell.

2. Then, Pugs::Compiler::Emit::Parsec lands and uses this module to translate the full Perl 6 grammar into a parser. The full grammar can now write production rules in MiniPerl6 since P::C::E::Parsec can use this module to translate such production rules to haskell and makes the final output be pure Haskell.

3. When compiling Pugs, the .hs preprocessor will use Pugs::Compiler::Emit::Parsec to accept the full Perl 6 grammar generating Parser.hs. Then GHC will compile it to executable.

4. The executable can now read the full Perl 6 grammar again generating compiler in PIL. Then self hosting is done.

I propose to finish the first two steps. Note that the first step seems contain a mixed-language file, but since the Perl 5 code only appear in production rules and are marked with 'use v5,' it's still valid Perl 6.

The most challenging parts of this project are:

1. The parsed grammar saved in Perl 5 hash-and-array tree-like data structure is extremely complex and nearly unreadable. Since the correctness has not been tested much, debugging modules using it is a difficult job.

2. We don't know if the Perl 5 Rule parser is powerful and efficient enough to handle the full grammar. Currently translating a sample grammar file with five small rules takes one minute CPU time on a Xeon 2.8GHz machine (may be unprecise due to HTT).

3. We don't know if Parsec is powerful enough to handle all functionality provided by Perl 6 Rules. Hopefully, we should be able to use only those which can be translated easily in the first version of the full grammar of Perl 6. And since Parsec code was copied into Pugs' source tree for the ease of modifying, it's not that serious.

And the second part will appear as misc/pX/Common/Pugs-Compiler-Rule/lib/Pugs/Emitter/Rule/Parsec.pm

The whole project will be licensed under the same licenses of the Pugs source tree, which are currently dual licensed under GPL-2 and Artistic-2.0b5.

Project Schedule The following is the planned schedule:

May 31st -- Pugs::Grammar::MiniPerl6 has complete MiniPerl6 grammar with correct production rules. Jun 31st -- Pugs::Compiler::Rule be able to accept all syntax we want to use in the full Perl 6 grammar. Jul 1st -- Start organizing full Perl 6 grammar. Jul 31st -- Pugs::Compiler::Emitter::Rule::Parsec can translate most rule constructions. Aug 1st -- Start rewriting exist parsing functions to Perl 6 rules. Aug 20st -- Parser/*.hs can be replaced by the generated ones. Sep 15th -- Parser.hs can be replaced by the generated one.

Bio I am a undergraduated student interested in pragramming language field as well as one of the commiter of Pugs. I use Perl a lot in daily works (data processing, crontab tasks, etc.), and with the PL interest, I noticed and took a look at Perl 6 about two years ago. I joint Pugs project and became a commiter right after one month of the birth of it. I focused mainly on parser and added some language constructions.

Tuesday, May 23, 2006

These days the Pugs repository was unusually silent, so did the channel which is full of joining and leaving messages. The main reason might be the disappearance of Audrey: her last commit was on May 18th, five days ago.

These days I spent most of development time on understanding and fixing ratchet. The goal was to make it handle compile_p6grammar.pl correctly. I achieved it several hours ago by properly handle match objects generated by quantified subrules. The result is:

Sunday, May 14, 2006

As I mentioned several days ago, the Pugs::Grammar::MiniPerl6 was almost finished so that I turned my attention to Pugs::Compiler::Rule. fglock++ just committed the first version of ratchet parser which will not backtrack by default and will give an approximately 10x+ speedup by eliminating backtracking.

Currently the ratchet support is far from complete. 5 of 24 test cases failed before yesterday. I fixed one and found a serious bug on $pos resetting. I added two test cases for it so that it now fails 6 of 26 test cases.

fglock++ marked the failed tests as TODO and I found that I made a mistake on the meaning of "no backtracking," thanks to him and pmichaud++ correcting my understanding. I'll spend some time on the (?>) and section of backtracking of the perlre manual page.

Wednesday, May 10, 2006

As I mentionedyesterday, Makefile.PL and a basic test file were added. But there were still something missing. I fixed the t/basic.t so that the test really passes. And the last two tests are corrected by updating the compile_p6grammar.pl to the one I used which deal with backslashes correctly.

On the another hand, I originally planned to support statements more than a single "return." But I dropped the idea due to two reasons:

I was not familiar enough to rules to write the grammar.

Translating general statements to Haskell is not straight forward.

But I implemented an alternative: temporary variables.

Assignment is not provided so variables can be translated to a simple "let ... in ..." expression, pure lambda one. The old yada example can now be written as

Tuesday, May 09, 2006

It was really a long weekend and Monday. And a progressive Monday night and Tuesday. However, I forgot to write the blog entry last night. The numbered captures ($0, $1, ... ) are supported yesterday as well as escaped characters in string. But it didn't work until I commited the part of code dealing with character classes in Pugs::Emitter::Rule::Perl5 several minutes before. And the grammar now accept the array constructor "[1, 2, 3]".

clkao++ added the Makefile.PL and test suit making Pugs::Grammar::MiniPerl6 more like a module instead of the sank box of my own. I have to admit that I haven't written any perl module before and just studied the Makefile.PL in Pugs::Compiler::Rule.

The building process includes an invokation of compile_p6grammar.pl to translate the grammar file to the .pm file. I tried to use lrep instead but found character classes are not supported (just as what happened in Pugs::Compiler::Rule). But fglock++ stop me from switching to lrep and got offline right after that -- could you tell me why should I stay using compile_p6grammar.pl? :)

Currently all language constructions I planned to support in MiniPerl6 are implemented (except the "(a, b)" one but it's not necessary). I'll add more tests for it then move my concentration on Pugs::Compiler::Rule.

Updated: fglock++ left an IRC message to me "lrep is under a rewrite - compile_p6grammar.pl has less features, but it is more up-to-date". Ok, I'll use you script then :)

Friday, May 05, 2006

The last post was on midnight of Sunday, 4 days ago. There were three in class midterms in 37 hours from Tuesday morning to noon of Wednesday. They are "special topics on graph algorithms," PDE, and compiler. Yes, I'm taking a compiler course this semester, which is a required course for junior. The exam problem is quite cool: given a grammar of Java-like (or may it's really a snip of java spec) array-newing syntax, write a parser accepting it, 100 points. No other constrains on algorithm. And here comes the bonus: write another one, 50 points. It took me some time to convince myself that write a parser in Perl 6 Rule will not count.

Then I wrote the proposal of summer of code and submitted it in evening. There are some more detail thought about the new plan with not many new ideas. But the may-faced challenging part is one of them:

The parsed grammar saved in Perl 5 hash-and-array tree-like data structure is extremely complex and nearly unreadable. Since the correctness has not been tested much, debugging modules using it is a difficult job.

We don't know if the Perl 5 Rule parser is powerful and efficient enough to handle the full grammar. Currently translating a sample grammar file with five small rules takes one minute CPU time on a Xeon 2.8GHz machine (may be unprecise due to HTT).

We don't know if Parsec is powerful enough to handle all functionality provided by Perl 6 Rules. Hopefully, we should be able to use only those which can be translated easily in the first version of the full grammar of Perl 6. And since Parsec code was copied into Pugs' source tree for the ease of modifying, it's not that serious.

clkao asked my plan on that if the second one really happened.

clkao> scw: what are you going to do if perl5 based rule parser is too slow?scw> clkao: tune it :p* clkao grins

I love the idea.

And some real-work reports. I fixed the variable and function naming rule so that it doesn't consume too many things in the afternoon. That makes function application as a operand of summation works. Then a little rewrite of rules of function application with parameters makes it work, too.

The performance problem seems affect my development more and more. I now do changing only on the generated Perl 5 module and only when everything works I'll merge the change to the grammar file and run the translation script again to see if the result is still correct. The reason is that I found the rule I used in the afternoon took one minutes and twelve seconds to translate and the one I committed several minutes before took nearly two minutes. It seems time to hack into Pugs::Compiler::Rule.

On the another hand, fglock++ revised the benchmark between P::C::R, inlined matching functions and Perl 5 regular expression. The performance seem to be ten times better one by one from the last one (P5regex) to the first one (P::C::R). Maybe I'll then write my own translation script in P5regex (a *lot* to do then) or inlined P::C::R structure instead of current one which uses P::C::R.