2006.02.27

The backchannel #perl6, with Larry getting accustomed to the hivemind (and reportedly feeling the reality dissonance when we logged off to lunch), has seen many constructive design discussions, and Specs are updated on the fly.

It's very exciting, and I'd like to journal about the process and the production so far, but my keynote will start in 5 hours at 9am, and I haven't even finished half of the slides, so I better get back at JIT slide compilation now... Stay tuned for more updates later. :-)

2006.02.25

The entry I just wrote in the past 2.5 hours is lost again -- this time due to a disconnected wireless network -- so here is some very brief recaps of what we've accomplished:

Quasiquoting and Macros

Previously, Pugs only supports macros that return strings (aka source filters), because the macro-returning-AST form was not specified. We populated the Macro section of the Subroutine Spec (S06), and bsb implemented them in Pugs (note that the :COMPILING flag makes free variables in the quasiquote bind to the macro user's site):

This paves the way of desugaring require, use and other special form into regular macros. Also I came up with a pretty cute way of interpolating an expression inside a quasiquote ("unquoting") -- just repeat the quote delimiters three times:

where the $ast is a string ("code snippet"), or another AST object constructed by quasiquotes. Also, because q:code is just sugar for q:code<perl6> (which is just sugar for q:code(:lang<perl6>)), it leaves room for q:code<perl5> or even q:code<TT2>...

Core Speedups

Test.pm is now precompiled to Test.pm.yml at the beginning of "make smoke" (and installed along with "make install"), thanks to gaal, lumi and bsb's work. On Larry's computer, the smoke cycle now takes 79 minutes, instead of ~100 minutes as before.

Nothingmuch has proposed a per-user cache system that will allow us to reuse parse trees for individual test files as well, and gaal thinks it can be implemented in Perl 6 itself, but it's not currently implemented.

The compile time is also much reduced, by splitting the two bottlenecks (Pugs.AST.Internals and Pugs.Parser) into further submodules, and move all DrIFTed instances into separate files. To automate this effort, anatoly, gaal and lumi worked on a call graph visualizer and refactoring splitter for Haskell programs -- implemented in Perl 5.Result Objects from Matches

Matches can now return arbitrary object as its "result" (think $& in Perl 5), instead of just a simple string, as specified in v11 of the Rules spec (S05). These are accessed by dereferencing the match object as a closure as $match() -- the default match object $/ has a result object called $(). This allows Parsec-style rules:

With rule interpolation (<{...}>) and lookaround (<(...)>), this allows a rule to match some subrules, and conditionally invoke more subrules based on the result objects of the earlier ones. This is critical for writing a Perl 6 parser in rules, as it allows a
rule to construct AST objects, using result objects from its submatches
as material. It allows us to port the existing Parsec subparsers in a
direct style, rather than using separate tree-transformation rules that
operates on a huge Match object.Perl 6 on Perl 5

The pX subproject, initiated by putter and contributed by tewk and fglock, is an attempt in implementing a self-hosting Perl 6 implementation using Perl 5 as a virtual machine. In addition to tewk's OpTable port, tonight fglock committed a working Perl 6 Rules Parser, which uses itself to parse the rules syntax.

The engine is interesting as it can be extended at runtime, and allows dynamic interpolation of rules, as well as calling back to Perl 5 code -- something that's quite difficult to do with Pugs's native PCRE bindings.GetOpt

Ran and migo started porting Getopt::Std to Perl 6. They then worked with nothingmuch to parse for even more style of options -- long, mixed, stacked and whatnot -- and generalized into Getopt::Process, which also serves as a nice showcase of the state of the art of Perl 6 syntax.

The eventual goal is to massage the command line arguments into argument lists, so you can use subroutine signature syntax as option specification, including plurality, types, validation, and it'd even work for positional parameters. Stay tuned!Syntax Generalizations

Lots of corners in Perl 6 syntax that made Pugs's Parser slow and/or difficult to implement have been normalized in the Synopses. For example, braces at the end of a line can now terminate statements without using semicolons:

try { ... }try { ... }

It also makes control structures and macro declarators very easy to parse, because this is now invalid:

sub f {...} sub g {...}

There needs to be a semicolon between the two named-sub expressions, though the end-of-line semicolon is still optional.

Perhaps more importantly, thanks to Robrt's help, all these Synopses changes now gets posted to the perl6-language mailing list, for all to review and discuss. Hopefully this will alleviate the Perl 6 is changing under us and we can't see where it's going problem, and encourage more constructive feedback loop. Have fun! :-)

The P6Doc sub-project has started again, building on the hierarchy sketched out by
Skud and Andrew Savige a while ago. We are taking material from the design
documents, various quick references and other documents
in the Pugsdocs tree, and present them in an accessible fashion.

Currently, they are divided into five categories:
Overview,
Tutorial,
FAQ,
Spec
and
API. Moving quickrefs into Overview/Tutorial/FAQ is an ongoing task -- commits are most welcome. :-)

The
Spec
category contains both the normative
Synopses, such as
Perl6::Spec::Operator (aka S03), as well as various drafts, such as
Perl6::Spec::Concurrency (previously known as S17draft). Non-binding drafts are
clearly marked as such -- a conforming Perl 6 implementation does not need to
implement them, until the design team promote them into normative specs.

I hope that using words (the Rules Spec and
Perl6::Spec::Rules) instead of
numbers (the 5th Synopsis or
Perl6::Bible::S05) will make lookup and
discussion easier, and free the partitioning of language specification from the
Camel book's chapters.

Also, I have removed references to the historical Apocalypse documents from the Synopses text, to make them more self-contained. Consequently, if there are some
part of Apocalypse missing from the specs, they need to be copied over, or be
considered non-normative.

While the
Spec
documents talk about language (aka requirement
specification for Perl 6 implementors), the
API
documents spell out how the core objects are
exposed on the language level, and is thus aimed at people writing Perl 6 programs.

For example, Perl6::API::Code and Perl6::API::Sigs will enumerate
the public methods of a &code object and its signatures, including lexical pad,
references to outer scopes, and so on. Similarily, Perl6::API::Object and
Perl6::API::Class would define the metaobject protocol; Perl6::API::Rule
and Perl6::API::Grammar would define the interfaces to the rules engine.

In other words, the API documents capture our current approach in building a
self-hosting Perl 6 implementation, and gives Pugs backend authors a concrete
set of API, so the backends can implement them as primitives to replace the
portable-but-slower bootstrap code written in Perl 6.

It is important to stress that currently
allPerl6::API documents within
the Pugs tree are considered non-binding drafts. There may very well be
other Perl 6 implementations with a better API, for example using Parrot-native
objects instead of Perl 6 objects to represent core types, and they will still
be conforming implementations. However, we still choose to call them (non-normative)
Perl6::API instead of Pugs::API, as the bootstrap code can run on any Perl 6
implementation conforming to the Synopses.

Currently, the P6Doc files are installed along with Pugs, into Perl5's sitelib
path and formatted as manpages. We plan to release them separately on CPAN
and replace the current Perl6::Bible distribution, and work on the bundled
p6doc
program to provide additional features, such as full-text search
and
explain this Perl 6 code snippet.

2006.02.23

Too much interesting subprojects going on simultaneously, plus I lost a half-formed journal due to accidental battery draining, so here is just some random sketches. (Also see my preliminary Hackathon-Plan for some starting points.)

11 months ago, I noted that getting Leo on IRC was the wisest course of action I made during Pugs development did till that Day -- now I think it's appropriate to express that sentiment again, as Larry is now on #perl6 and #parrot as TimToady!

In other news, I've been ghost-writing for Larry, committing various Synopses changes that cleared up much of the warnocked ambiguities raised on p6l and #perl6, which were blocking proper code generation to Perl 5 and Parrot:

"my $x = sub foo { ... }" is now legal, because named declaration forms are now expressions, just like their anonymous counterparts. Therefore the dreaded a/an anonymizer is gone for good -- just use "my $x = my Int sub foo { ... }".

"my Dog $fido .= new" is fully explained -- $fido is bound to ::Dog in compile time as soon as it's declared. The ::Dog prototype object is both undefined and false (but has an .id and .isa(Dog)), and dispatches .new as any other Dog instance would.

The above works because we cleared up the semantics of return inside KEEP/UNDO, made require capable of importing into lexical scopes, plus the capability of doing quasiquoting via CODE.

"./pugs -O" can close the packages and disallow runtime introduction of new symbols, which affects GLOBAL as well. This allows early binding for almost everything, though the object code needs to be saved under a different extension, as the linker cannot handle linking of mixed optimised/unoptimised modules.

"self.foo" now has context-hinted shorthands: "$.foo", "@.foo", "%.foo" etc. This works because "accessors" are dispatched exactly like methods now, so they are made into methods.

Consequently, "@.foo(1,2,3)" means the same thing as "list self.foo(1,2,3)".

There's more -- much much more, as the sheer productivity of ten lambdamøøse are unstoppable (and nearly unbackloggable) -- but I'll split them into separate journal entries. :-)

2006.02.21

Despite sleeping 11+ hours each day, I did get plenty of design and discussion done with gaal, in particular a pdd20-inspired refactoring for lexical pads, which I'll write about in another entry. The recent pX spike project of a Perl 6 rules implementation on Perl 5 -- and use it to parse Perl 6 programs -- is very much worth journaling about as well, but that'd take another entry too.

However, most of our pair-coding time was spent on improving the most egregious showstopper to would-be Pugs hackers -- namely, that the "make; make test" cycle simply took too much time.

This issue was brought to #perl6's attention as part of chromatic's poignant rant, citing that Pugs took 8 hours to complete building and run all its tests. And because he is a devout TDD follower, he'd like to run all tests after he made any change to the Pugs internals, which would take (gasp) 16 hours.

Most of his other points in the rant can be resolved directly:

Test::Builder did fail its tests for a while, but was repaired along with other OO modules as part of release engineering before 6.2.11. Adopting a regular release cycle will fix that.

Hooking up to Parrot as a runtime will no doubt bring more contributors (and get us faster-than-C performance), but hooking up to Perl 5 will obviously bring even more. Fortunately, we are doing both, plus JavaScript (and maybe CLR too, now that I'm going to YAPC::NA in Chicago and will probably visit LINQ folks en route.)

Because the Synopses are user-level requirements, Pugs would need its own PDD-like set of documentation that discusses the design of various compiler components. I'd like to resume the Pugs Apocrypha series of documents with nothingmuch et al during the Hackathon.

But the most pressing Pugs is slow issue demands a technical solution: the cycle takes 4 hours on my laptop -- 30 minutes to compile and 210 minutes to finish testing, which is simply too much, even if we take into account that we have 616 test files and 11070 test cases.

The current situation was mainly caused by Prelude.pm, a module with built-ins (such as printf) implemented in Perl 6 itself and loaded for each Pugs run. The problem is, compiling the Prelude.pm takes 15 seconds here, and it will add another 2.5 hours to the test cycle if it has to be reloaded for each test file.

Many moons ago (July 2005), gaal hacked in support for precompiled Prelude, using the ./pugs -CPugs backend to turn Perl 6's parse tree into huge Haskell expressions, and rebuild the Pugs executable again with the Prelude statically linked. This shaved the startup time from 15 seconds to 0.5 seconds.

The tradeoff is that this makes compilation of the Pugs.Run awfully slow (20+ minutes for optimised builds) and consumes a lot of RAM (curiously, even more so on unoptimised builds). One can turn precompilation off by tweaking config.yml to say precompile_prelude: false, but that will make tests unbearably slow to finish.

Gaal set forth to fix this problem once and for all, by using YAML as the cached intermediate format, much as Python's .pyc/.pyo bytecode files. We wrote a rule for DrIFT that can generate fromYAML and asYAML instance methods for all Haskell types, which provides roundtrip serialization to our Syck bindings.

The upshot is that the new ./pugs -CParse-YAML backend can turn Perl 6 into a YAML syntax tree, which can be loaded back during runtime using the Pugs::Internals::eval_p6y($file) primitive. Thanks to Syck's speedy parser, the startup time is now 0.7 seconds without any additional time penalty to compiling Pugs.Run, bringing the total compilation time down to 8 minutes (optimised build; unoptimised takes 4 minutes).

This goes a long way in solving the compilation time problem; moving the DrIFT instances away from e.g. Pugs.AST.Internals to another module will probably save another minute or two.

Tomorrow we will apply the same technique to Test.pm (as well as other .pm files). Seeing that each test file currently takes 5 seconds to load Test.pm, yamlizing it will likely save another hour off the test cycle. And if we start making use of cached .t.yml.gz files next to each .t programs, the entire build-test cycle can probably be reduced to 30 minutes or less. That will be lovely indeed. :-)

After a 19-days pause, I'm finally journaling again. Sorry for the respite -- I was massively distracted by the sudden exposure in Taiwanese mainstream media about my runtime typecasting. Despite me never granting any interviews, they simply translated most of my private blog's content and put a sensational (albeit mostly positive) spin on it. Oh well.

Last night's dinner in Jerusalem with Larry and Israeli hackers was also quite enjoyable; much to my surprise, I discovered that Larry follows #perl6 irc logs very closely, which explains the recent phenomenon of Larry posting on p6l (e.g. the goto and macro specs) shortly after it's been discussed on #perl6.

I finally got my .de visa today, which means I'd be able to show up at GPW. Unfortunately I had to skip the first day of the conference, due to conflicting scheduling with OSDC.il.

The five-days Pugs Hackathon starts tomorrow at Mount Arbel, with a healthy mix of 10+ lambdafolks and camelfolks. I plan to give Perl6-on-Perl5 a serious push during this week, and carry the same technique for Perl6-on-Parrot in early March with Leo, right after the GPW. More details later -- if we can get internet working in the hackathon site, that is. :-)

I'm very grateful to the lambdacamels for carrying Pugs forward in the
past three months, which has been a difficult period of transition for me.

Now that I'm back to reality under a new-yet-original identity, I look
forward to meet fellow Pugs hackers again at the upcoming hackathons:
Israel, Germany, Austria, Japan, Taiwan, and undoubtedly many other
places.