September 2004 Archives

SOAP has great hype; portable, simple, efficient, flexible, and open, SOAP
has it all. According to many intelligent people, writing a web service with
SOAP should be a snap, and the results will speak for themselves. So they do,
although what they have to say isn't pretty.

Two years ago I added a SOAP interface to the Bricolage open source content management
system. I had high expectations. SOAP would give me a flexible and efficient
control system, one that would be easy to develop and simple to debug. What's
more, I'd be out on the leading edge of cool XML tech.

Unfortunately the results haven't lived up to my hopes. The end result is
fragile and a real resource hog. In this article I'll explore what went wrong
and why.

Last year, I led the development of a new content-management system called
Krang, and I cut SOAP out of the mix.
Instead, I created a custom XML file-format based on TAR. Performance is up,
development costs are down, and debugging is a breeze. I'll describe this
system in detail at the end of the article.

What is SOAP?

In case you've been out to lunch, SOAP (Simple Object Access Protocol) is a
relatively new RPC (Remote Procedure Call) system that works by exchanging XML
messages over a network connection, usually over HTTP. In an RPC system, a
server offers routines (procedures) that clients may call over a network
connection. SOAP surpasses its direct predecessor, XML-RPC, with an enhanced type system and an improved error-handling system. Despite the name, SOAP is
neither particularly simple nor object-oriented.

Bricolage Gets SOAP

When I joined the Bricolage project, it lacked a good way to control the
application aside from the browser-based GUI. In particular, we needed a way to
import data and trigger publish runs. Bricolage is a network application, and
some useful tasks require interaction with multiple Bricolage servers. SOAP
seemed like an obvious choice. I read "Programming Web Services with Perl" and
I was ready to go.

I implemented the Bricolage SOAP interface as a set of classes that map
SOAP requests to method calls on the underlying objects, with some glue code to
handle XML serialization and deserialization. I used XML Schema to describe an
XML vocabulary for each object type, which we used to validate input and output
for the SOAP methods during testing.

By far the most important use-case for this new system was data import. Many
of our customers were already using content management systems (CMSs) and we needed to
move their data into Bricolage. A typical migration involved processing a
database dump from the client's old system and producing XML files to load in
Bricolage via SOAP requests.

The SOAP interface could also move content from one system to another, most
commonly when moving completed template changes into production. Finally, SOAP
helped to automate publish runs and other system maintenance tasks.

To provide a user interface to the SOAP system, I wrote a command-line
client called bric_soap. The bric_soap script is a
sort of Swiss Army knife for the Bricolage SOAP interface; it can call any
available method and pipe the results from command to command. For example, to
find and export all the story objects with the word foo in their
title:

Later we wrote several single-purpose SOAP clients, including
bric_republish for republishing stories and
bric_dev_sync for moving templates and elements between
systems.

What Went Right

The well-documented XML format for Bricolage objects made developing data import systems straightforward. Compared to previous projects that attempted
direct-to-SQL imports, the added layer of abstraction and validation was an
advantage.

The interface offered by the Bricolage SOAP classes is simpler and more
regular than the underlying Bricolage object APIs. This, coupled with the
versatile bric_soap client, allowed developers to easily script
complex automations.

What Went Wrong

SOAP is difficult to debug. The SOAP message format is verbose even by XML
standards, and decoding it by hand is a great way to waste an afternoon. As a
result, development took almost twice as long as anticipated.

The fact that all requests happened live over the network further hampered
debugging. Unless the user was careful to log debugging output to a file it was
difficult to determine what went wrong.

SOAP doesn't handle large amounts of data well. This became immediately
apparent as we tried to load a large data import in a single request.
Since SOAP requires the entire request to travel in one XML document, SOAP
implementations usually load the entire request into memory. This required us
to split large jobs into multiple requests, reducing performance and making it
impossible to run a complete import inside a transaction.

SOAP, like all network services, requires authentication to be safe against
remote attack. This means that each call to bric_soap required at
least two SOAP requests — one to login and receive a cookie and the
second to call the requested method. Since the overhead of a SOAP request is
sizable, this further slowed things down. Later we added a way to save the
cookie between requests, which helped considerably.

Network problems affected operations that needed to access multiple
machines, such as the program responsible for moving templates and elements
— bric_dev_sync. Requests would frequently timeout in the
middle, sometimes leaving the target system in an inconsistent state.

At the time, there was no good Perl solution for validating object XML
against an XML Schema at runtime. For testing purposes I hacked together a way
to use a command-line verifier using Xerces/C++. Although not a deficiency in
SOAP itself, not doing runtime validation led to bad data passing through the
SOAP interface and ending up in the database where we often had to perform
manual cleanup.

Round Two: Krang

When I started development on Krang, our new content management system, I
wanted to find a better way to meet our data import and automation needs. After
searching in vain for better SOAP techniques, I realized that the problems were
largely inherent in SOAP itself. SOAP is a network system, tuned for small
messages and it carries with it complexity that resists easy debugging.

On the other hand, when I considered the XML aspects of the Bricolage
system, I found little to dislike. XML is easy to understand and is sufficiently
flexible to represent all the data handled by the system. In particular, I
wanted to reuse my hard-won XML Schema writing skills, although I knew that I'd
need runtime validation.

In designing the new system I took a big step back from the leading edge. I
based the new system on the TAR archive file format, which dates back to the
mid-70s!

Figure 1.

I named the file format "Krang Data Set" (KDS). A KDS file is a TAR archive
containing a set of XML files. A special file, index.xml, contains
data about all the files contained in the KDS file, providing class names and
IDs. To reduce their size, it's possible to compress KDS files using
gzip.

I wrote two scripts, krang_import and
krang_export, to read and write KDS files. Each object type has
its own XML Schema document describing its structure. Krang classes implement
their own deserialize_xml() and serialize_xml()
methods. For example, to export all templates into a file called
templates.kds:

$ krang_export --templates --output templates.kds

To import those templates, possibly on a different machine:

$ krang_import templates.kds

If the object being exported has any dependencies, the KDS file will include
them. In this way a KDS file generated by krang_export is
guaranteed to import successfully.

By using a disk-based system for importing and exporting data I cut the
network completely out of the picture. This alone accomplishes a major
reduction in complexity and a sizable performance increase. Recently we
completed a very large import into Krang comprising 12,000 stories and 160,000
images. This took around 4 hours to complete, which may seem like a long time
but it's a big improvement over the 28 hours the same import required using
SOAP and Bricolage!

For system automation such as running publish jobs from cron, I
decided to code utilities directly to Krang's Perl API. This means these tools
must run on the target machine, but in practice this is usually how people used
the Bricolage tools. When an operation must run across multiple machines,
perhaps when moving templates from beta to production, the administrator simply
uses scp to transfer the KDS files.

I also took the opportunity to write XML::Validator::Schema, a pure-Perl XML
Schema validator. It's far from complete, but it supports all the schema
constructs I needed for Krang. This allows Krang to perform runtime schema
validation on KDS files.

What Went Right

The new system is fast. Operating on KDS files on disk is many times faster
than SOAP network transfers.

Capacity is practically unlimited. Since KDS files separate objects into
individual XML files, Krang never has to load them all into memory at once.
This means that a KDS file containing 10,000 objects is just as easy to process
as one containing 10.

Debugging is much easier. When an import fails the user simply sends me the
KDS file and I can easily examine the XML files or attempt an import on my own
system. I don't have to wade through SOAP XML noise or try to replicate network
operations to reproduce a bug. Separating each object into a single XML file
made working on the data much easier because each file is small enough to load
into Emacs.

Because Krang's design accounted for the XML system from the start it has a
much closer integration with the overall system. This gives it greater coverage
and stability.

What Went Wrong

Operations across multiple machines require the user to manually transfer
KDS files across the network.

Users who have developed expertise in using the Bricolage SOAP clients must
learn a new technology.

Conclusion

SOAP isn't a bad technology, but it does have limits. My experience
developing a SOAP interface for Bricolage taught me some important lessons that
I've tried to apply to Krang. So far the experiment is a success, but Krang is
young and problems may take time to appear.

Does this mean you shouldn't use SOAP for your next project? Not
necessarily. It does mean that you should take a close look at your
requirements and consider whether an alternative implementation would help you
avoid some of the pitfalls I've described.

The best candidates for SOAP applications are lightweight network
applications without significant performance requirements. If your application
doesn't absolutely require network interaction, or if it will deal with large
amounts of data then you should avoid SOAP. Maybe you can use TAR instead!

The Perl 6 Summary for the week ending 2004-09-24

This is my last summary before I start my teaching practice. Hopefully I've
things set up so writing the summary isn't going to interfere with that, and
vice versa.

This week in perl6-compiler

State of Rules

Discussion of the state of the Perl 6 compiler (with particular reference to
the rules engine) continued. People wanted to make sure that the rules engine
that Luke and Patrick are working on would be flexible enough to cope with
different languages in closures.

Ed Peschko asked that there be some way of "turning the rules engine inside
out" to make something which, given a rule, would generate strings that could
match against it. Actually, this is the second time Ed asked for this, as Luke
reminded him. Luke went on to implement a generator in hypothetical perl 6,
which seemed to please everyone but Ed.

Meanwhile, in perl6-internals

Matt Diephouse fell afoul of problems with the compile and
compreg opcodes in his efforts to reimplement Parrot Forth in PIR.
Steve Fink made some suggestions for workarounds based on his work on writing a
regular expression compiler.

From further discussion, it seems that, if you're implementing a stack based
language, you'd do well to manage the language's stack yourself rather than
using Parrot's User stack which is interestingly scoped.

Will Coleda had some problems with a class's __init function
not being called magically. Nobody else could reproduce the problem. After a
certain amount of confusion, Will did make realclean; perl Configure.pl;
make; make test and all was well again.

If you're experiencing a weird problem, it's probably best to do the rebuild
before posting to the list. Or you could fix the build system to have more
accurate dependencies...

Jeff Clites had some questions about how finalizers interact with the work
that Leo's done while implementing an incremental garbage collector for Parrot.
Leo had some thoughts, but noted that there's still a problem with ordered
finalization and destruction.

[Your summarizer is really starting to understand why old school GC types
really don't like finalizers...]

Dan asked for volunteers to finish the task of getting Python bytecode
working on Parrot; he reckoned that the work was mostly done, but that neither
he nor Leo have the time to go the last couple of yards.

Will Coleda posted a progress report on his Parrot TCL implementation which
is progressing gradually towards being a full blown TCL implementation; he's
working towards using special Tcl* PMCs with real TCL semantics instead of the
current scheme which uses Perl PMCs.

Namespaces, Part 1

Dan posted the first part of his Namespaces spec. There was, of course, much
discussion. Inevitably, there was another revision, and further discussion.

Leo posted an overview of the work he was doing on Parrot's internals to put
a faster calling scheme in place (as discussed endlessly). The usual
perl6-internals discussion and revision process swung into action.

Remember the French teacher that Leo mentioned? Well, the man himself,
Christian Aperghis-Tramoni, popped up on the list and pointed everyone at his
work so far, and asked for help in finding more information.

If anyone would like to translate Christian's work from French to
English...

Michele Dondi wondered if sigils could be "(sort of special) operators ...
thus allowing attributes/methods or even adverbs". Larry's response was
superbly deadpan.

The usual footer

Hmm... maybe I should trying doing the perl6-language summary like that
every week; it's certainly quicker to write like that. Let me know what you
think.

If you find these summaries useful or enjoyable, please consider
contributing to the Perl Foundation to help support the development of Perl.
You might also like to send feedback or contributions to a getting Piers to OSCON 2005 fund.

This week in perl6-compiler

The current state of the compiler

Discussion of the current state of the nascent perl 6 compiler and how best
to contribute to its development even before code has been released continued.
The best way to contribute right now is "Write tests". Don't worry about what
the test harness should look like; simple tables of rules, test strings and
expected matches will be very welcome.

The status discussion also touched on how to handle different languages in
the closures embedded in rules.

Uri Guttman had some thoughts on bootstrapping Perl 6's grammar. He hoped
that his suggested approach would enable lots of people to work on the thing at
once without necessarily getting in each other's way. Adam Turoff pointed
everyone at a detailed description of how Squeak (a free Smalltalk) got
bootstrapped.

This week on perl6-internals

Matt Diephouse fell afoul of a problem with IO buffering when he was taking
a look at Parrot Forth, so he asked the list for help. Leo supplied the help,
so Matt supplied a patch to Parrot Forth which made it print its prompts
correctly when run under modern (CVS) Parrot.

Stéphane Payrard was bemused to discovered that parrot routines
declared with the @LOAD pragma don't automatically execute if
they're in the main segment. He suggested that the issue be either fixed or
documented.

Leo documented it.

NCI basics

Charles Somebody tried to crash the monomonikered big leagues by failing to
vouchsafe his surname when he posted a question about getting
NCI to work with wxWindows. For reasons that escape me, the answers (and,
indeed, Charles's surname — Lowell) appeared in a different thread.

Who says Perl 6 is the only language that's taking a long time to appear on
Parrot? Amir Karger posted his first annual update on his attempt to get Parrot
to emulate the Z-machine. Hopefully subsequent updates will be more
frequent.

Meanwhile, in perl6-language

Michele Dondi wondered how to write pack-like functions in Perl
6, where the first argument is a string which specifies the signature of the
rest of the function call. The proposal stumped me, but maybe you all can make
something of it.

While peacefully reading Synopsis 5 (Rules & Grammars), Dave Whipp
noticed that grammatical inheritance wasn't as flexible as the Role based
compositions that can be used when working with classes. Larry wondered allowed
about having grammar roles, but I don't think they've been officially mandated
yet...

Michele Dondi continues to make my head hurt with zir proposals. In part
it's because I've still not worked out whether zie is male or female, and in
part because, well, zir proposals are challenging. In this particular proposal
zie wondered if there would be a way to magically write recursive anonymous
functions without having to introduce a new symbol of some sort.

Luke and Larry think there will be such a way, but the precise syntax hasn't
settled just yet.

Jonathan Scott Duff wasn't happy with the new range quantifier syntax in
Synopsis 5. He posted a bunch of questions that were nagging at him. Larry had
some good answers (if you're interested in contributing to the design of Perl 6
you should really read Larry's replies).

Announcements, Apologies, Acknowledgements

And so ends another summary. I hope you liked it. Sorry for the delay if
you're reading this on the mailing list; this teacher training malarkey is
remarkably tiring.

If you find these summaries useful or enjoyable, please consider
contributing to the Perl Foundation to help support the development of Perl.
You might also like to send feedback or contributions to a getting Piers to OSCON 2005 fund.

I am converting some articles from MS Word to HTML by hand. I often use
bulleted outlines so I face a lot of work creating lists with nested sub-lists.
It didn't take opening and closing many <ul> and
<li> tags to realize I wanted to automate the task.

It's very easy to use filters with my text editor and I've written several
in Perl to speed up my HTML formatting. I decided to create a filter to handle
this annoying chore. After a little thought I decided the logic involved was
just tricky enough to use a finite state machine (FSM) to keep things
straight.

Large programs always require some thought and planning while most small
programs can be written off the top of the head. Once in a while I run into a
small problem with just enough complexity that I know "off the top" will
quickly become "not so quick, very dirty, and probably wrong." Finite state
machines can help to solve these knotty problems cleanly and correctly.

There have been several times I wanted to use an FSM but didn't because I
lacked a good starting skeleton. I would do it some other way or hand code an
FSM using a loop, nested if statements, and a state variable. This
ad hoc approach was tedious and unsatisfying.

I knew there had to be a Perl module or two implementing FSMs and decided it
was time to add this tool to my toolbox.

What Are Finite State Machines?

A finite state machine is a set of states, one of which is the starting
state. Each state has a set of transitions to other states, based on the next
input token. A transition has a condition and an action. If the condition is
met, the FSM performs the action and then enters the new state. This cycle
repeats until reaching the end state.

Finite state machines are important in the theory of computation. A
Wikipedia article
on finite state machines goes into the computer science aspect. FSMs are
valuable because it is possible to identify precisely what problems they can
and cannot solve. To solve a new class of problems, add features to create
related machines. In turn, you can characterize problems by the kind of machine
needed to solve it. Eventually you work up to a Turing machine, the most
theoretically complete computing device. It is a finite state machine attached
to a tape with potentially infinite capacity.

The formal and highly abstract computer science approach to FSMs does not
hint at their usefulness for solving practical problems.

The Bubble Diagram

Figure 1 shows the parts of a finite state machine. The circles represent
states. A double circle indicates the start state. The arcs represent the
transitions from one state to another. Each arc label is a condition. The
slashes separate actions.

Figure 1. Finite state machine elements.

I like FSMs because they are graphical. I can grab a pen and sketch the
solution to a problem. The bubble diagram helps me see the combinations of
states and input conditions in a logical and organized way.

Figure 2. Initial rough diagram.

Figure 2 is the initial diagram I drew for the HTML list problem. The
diagram turned out to be have an omission — it does not handle blank
lines. Having discovered this bug, it was easy to make sure to consider blank
lines in every state.

Figure 3. The final diagram.

Figure 3 is a formal diagram of the final FSM, complete with blank-line
handling, prepared for this article. Normally, I would never be so elaborate
for such a small problem. You can see that it is an excellent way to
communicate a solution to others.

Looking at DFA::Simple

I have yet to look for a Perl module to do something and come up totally
empty handed. That's absolutely my favorite thing about Perl.

I looked for existing FSM modules and found DFA::Command and
DFA::Simple.
DFA::Command looks good but was more than I needed. (By the way, the simplest
FSM is also a "deterministic finite automata," thus the DFA in the module
names.)

The "simple" in DFA::Simple sounded promising, but the module's
documentation is poor. The sample program is difficult to follow and does not
process input from a file. Using the module in the real world is left as an
exercise for the reader: the docs do not mention handling errors and the
end-of-file condition.

While FSMs start as circles and arcs, they end up as tables of states and
transitions. My interest was in exactly how the packages created these tables.
Having clean, easy-to-look-at code is very important to me. I'm probably the
one who'll have to change it someday!

I was very unhappy that DFA::Simple defines its sample FSM in a
position-dependent way. The state being defined depends upon its textual
position in the source. It uses numeric constants for state IDs within the
tables and you must synchronize them with any changes. Comments indicate which
state number you think you are working on, but they could easily
unsynchronize as you add, drop, and move states.

This was a big negative. I have run into enough bugs caused by
position-dependency that I refused to consider using this code. I almost
decided to write my own module, but first went back and took a last look at
DFA::Simple to see if there was any way I could make it work.

I realized that an FSM use symbolic constants for state definitions. Array
references using these symbolic constants would provide the self-documenting
and maintainable approach I demanded. It was easy to develop a way to process
input (handling errors and EOF) that works in most cases. I put it together,
gave it a try, and it worked!

The Input and Output

The input comes from copying an MS Word document and pasting it
into my text editor. My bulleted outlines look like this:

o Major bullet points
- each line is a point
- a leading "o " is removed
o Minor bullet points
- indicated by a leading "- "
- may not start a list
- only one level of minor points is supported
o Future enhancements
- numbering of points
- handling of paragraphs
o Start a new group of points
- a blank line indicates end of old group
- old group is closed, new one opened

This outline actually documents key features of the program.

A correctly nested HTML list must come entirely within the parent list item.
This is what made the filter tricky enough that I wanted to use a finite state
machine. When you open a top-level list element you have to wait for the next
input to see if you should close it or start a sub-list. When you look at an
input, you have to know if you have an open list element pending. The current
state and the input determine what to do next, exactly fitting the finite state
machine model.

Not being an HTML expert, I initially tried to insert my sub-list between
two higher-level list elements. The page rendered correctly but would not validate, revealing my error and starting
this project.

The Parts of the Machine

The first step in building the machine was deciding how to represent state
names symbolically rather than numerically. The natural choice is the
constant pragma.

I defined the machine as two arrays. The first is @actions
defining what to do when entering and exiting a state. You might use it to
initialize a data structure on entry and save it on exit, or allocate resources
on entry and de-allocate them on exit.

Each element of @actions is a two-element array. The first is
the enter action and the second the exit action. My program logs a message on
entry and leaves the exit action undefined.

Each element of the second array, @states, is an array of
transitions for a state. A transition has three elements: the new state; the
test condition; and the action. The condition is a subroutine returning true or
false, or undef, which always succeeds.

The machine tests the transitions of the current state in sequence until a
condition returns true or is undefined. Then it takes the action and enters the
new state. If the new state is different than the old one, the respective exit
and entry actions execute.

As with the final else in a nested if statement,
consider having an undef at the end of the transitions. If nothing
else, log a "can't happen" error message to catch holes in your logic as well
as holes caused by future changes.

This uses the symbolic constants for state IDs in both tables. The states'
numerical values are irrelevant. I followed convention and numbered the states
consecutively from zero, but I tested the program using random state numbers.
It worked fine.

This is much better than the definition of the sample FSM in DFA::Simple.
Now adding, deleting, and shuffling states will not introduce
position-dependent bugs.

Final Assembly

This creates a new DFA::Simple object using references to the
@actions and @states arrays. The State
method retrieves or sets the machine's state, invoking entry and exit routines
as needed. Here, the call sets the initial state.

Putting the Machine in Motion

The major goal of this project was to develop a skeleton to develop FSMs
quickly, with an emphasis on filters. I wanted to develop a way of handling
file input and processing errors that would work for future projects.

This loop accomplishes that:

while (<>) {
chomp;
# log input line for debugging
print LOG "Input <$_>\n";
# trim input and get rid of leading "o "
s/^\s+//;
s/\s+$//;
s/^o\s+//;
# process this line of input
$fsm->Check_For_NextState();
# get out if an error occurred
last if $error;
}

The while (<>) loop is standard for filters. The
application cleans up input as needed. In this case, it chomps each line and
trims and strips it of some text bullet characters.

The log message is for initial debugging. This, along with the new state
message, gives a clear picture of what is happening.

The most important statement is $fsm->Check_For_NextState();,
which does the major work — scanning the current state's transition table
to see what to do next.

The FSM must be able to exit early if it spots an error. This requires the
use of a global $error switch. It starts out as false. If
$error is true, the program ends and returns the value of
$error.

End-of-file handling also uses a global switch, $done. It is
initially false and changes to true when the program exhausts standard input.
One last cycle runs to give the machine a chance to clean up. Transition
conditions checking for EOF can simply return $done.

The program does not support early normal termination. You can enter a state
with no transitions and the program will read (but ignore) any remaining input.
However, there is no way to avoid reading all of the input without setting
$error. You can implement early exit by adding a
$exit switch and handling it similarly to $error.

Appropriately for filters, this program is line oriented. Many FSMs read
input a character at time: lexical scanners and regular expression parsers, for
example. This program could easily change to handle character input.

The Action Routines

Now that the machine is purring along, it needs to do something. This is
the purpose of the actions. Those in my program are typical of a finite state
machine. They invoke small subroutines that form a mini-language for building
lists with sub-lists. The subroutines are very simple, but running them in the
right order does the job.

Thus, this approach breaks a complex problem into smaller pieces, each of
which is simply solvable. This is what programming is all about.

Conclusion

The next time you have a knotty little problem like this one, try sketching
a solution using a bubble diagram and implementing it with DFA::Simple. You'll
have added a great new tool to your toolbox.

This week on perl6-compiler

Yes you read that right; development of the Perl 6 compiler now has its own
mailing list. Hopefully, in the coming weeks, the current perl6-internals list
will get renamed parrot-internals to reflect that split.

As I write this, groups.google.com hasn't picked up the new list, but I'm
sure it will do eventually, so I'm going to continue to use Google URLs here on
the theory that they'll work eventually. If you're desperate to read the whole
threads before then, start at perl6-compiler on
Perl.org — there's all of 35 messages there as I type this...

In a thread that migrated over from perl6-language Gregory Keeney continued
discussion of whether Perl 6 will/should have an officially blessed bundling
system, or whether it should simply have good enough hooks that someone could
write one. There was some discussion about whether perl6-compiler is actually
the correct list for the discussion...

Herbert Snorrason wondered what state the Perl 6 compiler was in. (I think
we all did to be honest, but Herbert got to ask the question.)

Patrick said that he and Luke (I think) are at the beginning stages of
building a basic perl 6 grammar engine that compiles to parrot and handles some
basic optimizations. In tandem with that, Patrick was also working on writing a
Perl 6 grammar. The plan is (and no plan survives contact with the enemy) to
get a working basic grammar engine up in the next month or so then use that to
build the Perl 6 parser.

As soon as there's code, announcements will be made in all the right places
(and who knows? maybe even on Slashdot).

If you want to help with the process of writing the grammar engine, your
best bet (at least until there's some running code) is to snag the appropriate
apocalypse and use that to write tests and post 'em to the list.

Meanwhile, in perl6-internals

Last week Leo had discussed unhooked ops without definite opcode numbers and
asked for comments on what to do with them. Apparently Dan finds many of the
boolean functions useful, so they're going to be kept, probably hidden behind a
namespace, once we know how namespaces work.

Right now, Parrot's configuration system palms off a lot of the work to the
local Perl installation by querying the perl config. Which isn't ideal in the
long run. So Dan suggested that now's the time to make a list of what info
configure.pl grabs from the perl installation and instead to write(appropriate)
probing code ourselves so we can do perl-free configuration. The usual
"autoconf!" "No! Metaconfig!" "No! Something else entirely!" discussion
occurred. (For the record, Dan seems to favour the "something else entirely"
option).

The discussion of required semantics to implement a regular expression
engine continued. Chip Salzenberg chipped in with some experience from the
Topaz project (a brave project to reimplement Perl 5 in C++ that, sadly failed
whilst teaching Chip a great deal). I confess that the thought of an
open_up_and_say_aaah method on Perl Scalars delighted your
summarizer.

(Am I the only person who wants to repeat Namespaces! with the same
intonation used for "Monorail!" in the Simpsons?)

As Dan pointed out, Parrot needs namespaces, more than that, it needs them
to be designed. So, like the excellent designer he is, he laid out his plan for
namespaces and invited comments. And comments there were. We can probably
expect a PDD soon. Ish.

Dan restated his position on using Autoconf in the parrot build system. It's
really rather simple:

We're not using autoconf. Period.

In the discussion, Herbert Snorrason suggested we institute a "Rule One" for
Dan. In other words, Dan is always right. Unless he changes his mind ("Rule
Two"). Or is overridden by Larry ("The One True Rule One").

It works for me.

Constant strings

Dan posted a discussion of const_string, a Parrot function for
making use of string constants from C. It's very useful, but it doesn't seem to
be as useful as Dan would like it to be, so he's extended the API.

Personally, I'd like Symbols as first class data structures in Parrot, but
not enough that I'm likely to implement the darned things.

Self-described newbie Richard Jolly asked for clarification on what mixing
languages will look like. Dan pointed out that it hadn't actually been
specified yet. Besides, the internals list didn't do syntax.

He then outlined what he thought the syntax would look like. He doubts that
people will end up mixing languages in single source files though. Further
discussion pretty much agreed with Dan, generally arguing that most uses of
multiple languages would come from using other languages' libraries.

Jeff "mod_parrot" Horwitz revived a thread from a year ago. He
wanted to be able to use the running process image (say httpd) and grab its
various API functions using dlfunc. Sadly, although this works for
functions that Apache uses from shared libraries, it doesn't work for those
functions that are statically linked into the binary. He wondered if this was a
bug, or if he was going to have to stick to using the workaround he'd come up
with.

It turns out that it isn't a bug, but it can be done by passing a NULL as
the filename from which you're dynamically loading a function. Thanks Leo.

Meanwhile, in perl6-language

Synopsis 9 covers Perl's data types. And, from such a small acorn grows one
might oak of a thread. Some of the discussion was even about Perl's data types.
There was also a snide remark about summaries and the correct pronunciation of
"Z" — in revenge for which I plan to make sure that David Green is never
mentioned in a Perl 6 Summary.

Last week, Larry pointed out that even existing keywords could be overridden
with macros. Piers Cawley pointed out that they'd be no fun if they didn't.
Larry added the caveat that macros couldn't work with precompiled library code
(which is a shame, but understandable). This thread developed into the
(occasionally intemperate) discussion of Bundles that later migrated to
perl6-compiler.

David Wheeler wondered if the (Smalltalk by way of) Rubyish style of
eschewing for in favour appropriate iterators and code blocks
would eventually become good Perl 6 style. (I expect it will in my code). Larry
didn't think it would, but pointed out that the syntax had been modified
recently to make it relatively easy to work that way if you wanted to. Instead
of writing, say:

@foo.each({...})

we'll be able to write

@foo.each:{...}

Announcements, Apologies, Acknowledgements

If you find these summaries useful or enjoyable, please consider
contributing to the Perl Foundation to help support the development of Perl.
You might also like to send feedback or contributions to a getting Piers to OSCON 2005 fund.

The expression "Embedded Database" requires an explanation. A
"database" is an application that allows the targeted retrieval
of stored data -- a log-file is not a database. By
"embedded" I mean a database that does not run in a separate
process, but instead is directly linked ("embedded") into the
application requiring access to the stored data. This is in
contrast to more conventional database management systems (such
as Postgres or Oracle), which run as a separate process, and to
which the application connects using some form of Inter Process
Communication (such as TCP/IP sockets, for instance).

The prime advantage of embedded database systems lies in their
availability and ease of administration. Since the data is kept
in ordinary files in the user's space, there is no need to
obtain special permissions to connect to the database process
or to obtain a database account. Furthermore, since embedded
databases require nothing more than a normal library, they can
be useful in constrained environments (think shared web hosting),
where no "proper" database is available. They can even be linked
to an application and shipped with it.

Sequential Access: Tie::File

The simplest database is still a flat file of records, separated
by a record separator. If the separator is chosen to be the new line character "\n", as it usually is, then a record
corresponds to a line.

Flat files allow only sequential access. This can be both a blessing
and a curse: if the application primarily requires the lookup of
individual records in arbitrary order, then flat files become
impractical once the number of records becomes too large. On the
other hand, if the typical access pattern is sequential for at least a
substantial subset of records, then a flat file can be very practical.

Here is a typical (if a little construed) example: Assume that
we have several thousand very long strings (say, each string
having more than one million characters) and we want to
determine whether any two strings are equal. To do so, we have
to compare each string to all remaining ones, using two nested
loops. However, the total amount of data is too large to be all
kept in memory. On the other hand, we access all the
records in a predictable, sequential order.

The Perl module Tie::File allows us to do so in a
particularly convenient fashion: it ties a Perl
array to the file in such a way that each record corresponds to
an element of the array.

Here then is a code snippet that demonstrates the use of
Tie::File to our string comparison problem
introduced above:

There is no question that this program requires heavy disk
access, and if we had additional information about the data in
question, we should try hard to use it and avoid the brute force
approach of comparing "everything to everything". However, if
we have no choice, then Tie::File makes code like
the above as efficient as can be: It does not attempt to load
the entire file into memory, so that arbitrarily large files can
be processed. The first time we iterate over the file,
Tie::File builds up an index containing the offset
of each record from the beginning of the file, so that
subsequent accesses can go directly to the proper position in
the file using seek(). Finally, accessed records
are cached in memory and the cache size can be adjusted passing
the memory option to the tie
command.

However, any changes to the array (and its underlying file), in
particular insertions and deletions in the middle, will always
be slow, since all the subsequent records will have to be moved.
If your application requires such capabilities, don't use
Tie::File, use a Berkeley DB!

Fast Lookup: Berkeley DB

A Berkeley DB is almost the exact opposite of a flat file in regards
to the access patterns it supports. Scanning through a large number
of successive records is difficult if not entirely impossible, but
reading, inserting, updating, and deleting random records is very,
very fast!

Conceptually, a Berkeley DB is a hash, saved to a file. Each record
is found by its key, and the value of the record is treated as a
string. It is up to the application to interpret this string - for
instance, to break it up into parts, representing different fields.

The following code opens a Berkeley DB (creating a new file, if
necessary) and writes a single record, containing information
suitable for the /etc/passwd file. We then look up
the same record, using the login name as key, and split
the data into its constituent fields.

The code above uses the Perl interface to the Berkeley DB
modelled after its C API. The BerkeleyDB module
also supports a tie interface, tying the database
file to a Perl hash, making code like the following possible:
$hash{ $login } = join( ':', @info );.
The API-style interface, on the other hand, follows the native
interface of the Berkeley DB quite closely, which means that
most of the (quite extensive) original C API documentation also
applies when programming Perl - which is good, since the perldoc
for BerkeleyDB is a bit sketchy in places. Try
man db_intro to get started on the native Berkeley
DB interface, or visit the Berkeley DB's homepage.

Complex data structures can be stored in a Berkeley DB provided
they have been serialized properly. If a simple solution using
join and split as demonstrated above
is not sufficient, we can use modules such as Data::Dumper
or (better) Storable to obtain a flat string
representation suitable for storage in the Berkeley DB. There is
also the MLDBM ("Mulit-Level DBM") module, which
provides a convenient wrapper interface, bundling serialization
and storage.

For suitable problems, solutions based on Berkeley DBs can work
very well. However, there are a couple of problems. One is that
the Berkeley DB must be available - it has to be installed
separately (usually as /usr/lib/libdb.so).
The bigger limitation is that although there are quite a few
problems out there for which a single-key lookup is sufficient,
for many applications such a simple access pattern will not work.
This is when we require a relational database -- such as SQLite.

Complex Queries: SQLite

There are situations when neither of the aforementioned
straightforward access patterns applies. Berkeley DBs are great
if we know the key for the record - but what happens if we only
know part of the key, or we are interested in a set of records,
all matching some pattern or logical expression? Alas, the
Berkeley DB does not allow wildcard searches. Another frequently
arising problem occurs when each record has several fields or
attributes and we want to be able to look up records by either
one of them (such as being able to search a list of books by
author as well as title). This can be achieved in a Berkeley DB
using multiple lookups, but one can't help thinking that there
got to be a better way! Finally, the simple key/value
mapping of the Berkeley DB occasionally forces duplication of
data, since all the relevant information has to be present in
each record - this both wastes storage space and creates
maintenance problems.

Relational databases address all these points. They permit wildcard
and logical expression searches, multiple searchable columns, and joins
to represent relationships. Unfortunately, search capabilities such
as these used to be restricted to big, standalone relational database
engines (such as Postgres or Oracle), often making them unavailable for
the quick-and-dirty Perl project.

This is where the SQLite project comes in: Founded in 2000, SQLite
tries to combine the convenience of the Berkeley DB architecture with
the power of a relational (SQL) database. Its latest release is version
2.8.14, with the next major revision, version 3.0, currently in beta.
SQLite implements most of standard SQL (with some exceptions, such
as correlated subqueries).

However, with added power comes additional complexity: a relational
database no longer ties neatly to one of Perl's
built-in data structures. Instead, one has to use the
DBI (DataBaseInterface) module and code the queries
in SQL. Furthermore, since the structure of the data is no
longer predetermined, the programmer has to define and construct
the required tables explicitly.

This is not the place for a full discussion of the Perl-DBI, but
the example below is enough to get started. First, SQLite has to
be installed on the system. Conveniently, the DBI-driver for
SQLite, DBD::SQLite, already contains the database
engine itself as part of the module - so installing this module
is all that is required to be able to use SQLite from Perl.

We begin by connecting to the SQLite database
contained in the file "data.dbl". Note that the
connect method loads the appropriate database
driver - all we need to do is specify use DBI;. We
create and populate two tables, and then run a query joining the
two tables and using an SQL wildcard. The results are returned
as a reference to an array. Each array element is an array
itself, containing the column values in its elements.

The books table references the authors
table via a foreign key. This is a one-to-many relationship: each
book has only a single author, but an author can have written
multiple books. What would we have done if books could have been
co-authored by multiple writers? We would have used a cross-reference
table to represent the resulting many-to-many relationship. Try
that with Berkeley DBs!

SQLite shares with the Berkeley DB the approach of being mostly
typeless and treating data as simple strings. In our table
definitions above, we did not specify the data types of any of
the columns, as would be required by Oracle or Postgres. SQLite
provides a bit more structure, in that it provides separate
columns, so that distinct values do not have to be glued
together to form a string, as was required when using a Berkeley
DB. Furthermore, SQLite will try to convert strings to
floating point numbers if numerical comparisons or operations
(such as SUM or AVG) are performed on
them.

Since SQLite is a relational database, the data architecture
(the "schema") can be arbitrarily complex, with data distributed
across multiple tables and with foreign keys to represent
relationships. This is a big topic -- any book on the relational
model will provide plenty of information, including issues such
as database normalization and tools such as Entity/Relationship-Models.

Summary

In this paper, we considered three ways to add database capabilities to
a Perl program, without requiring major external database installations
or administrative support. Depending on the typical access patterns,
flat files or Berkeley DBs may be suitable choices. For more complex
queries, the SQLite project provides all the power of relational
database architecture in a single Perl DBD module.

Library loading

Pathological register allocation scenarios

Gregor N Purdy had asked Dan if his work compiler could be made to spit out
structurally equivalent C code to the Parrot code that was breaking IMCC. His
idea being that we could then see how C compilers dealt with such nastiness.
Dan thought that, whilst this was a good idea, it would be too much work to
implement. Gregor wasn't so sure.

PMC Instantiation

Last bits of the basic math semantics

If you believe Barbie, "Math is hard". She's right, up to a point. The
list's spent a couple of weeks now sorting out the design of Parrots underlying
mathematical and numeric systems to make sure that maths works right (for
sufficiently useful values of "right"). This particular line of discussion
covers rotations and stuff, where you're actually treating a number as a bit
field.

Cross-compiling parrot

And you thought compiling Parrot on a Win32 box was hard. Robert Schwebel
wants to cross compile Parrot and isn't having a good time. Dan wasn't
surprised because the Parrot build process still gets most of its information
from the local perl installation which will generally be wrong when you're
cross compiling.

Dan noted that part of the problem is that we don't have people on the team
with a need or the experience of doing cross compilation and added that he'd be
thrilled if this were to change. Any patches to make things better for cross
compilers will, of course, be gratefully received.

Proposal for a new PMC layout and more

Leo's concerned that the current PMC layout isn't the Right Thing, and laid
out a proposal describing some changes he thinks would be worthwhile. In
essence, he argues for removing the requirement for fixed sized PMC headers and
separate variable sized buffers in favour of unifying buffers and PMCs so that
PMCs become variable sized, thus eliminating some time consuming indirection,
and space consuming overheads.

Nicholas Clark thought the proposal was interesting, but said that, since
the proposed changes would be invisible to the user, he'd be far happier with a
functionally complete implementation of parrot with stable, useful APIs.

Dan rejected the proposal (both for technical reasons and because he agreed
with Nicholas). I don't think Leo was convinced by the technical reasons, but
the "Let's get the interfaces finished!" argument clinched it.

Semantics for regexes

Dan appears to have opened an entertaining can of worms when he outlined his
view of the minimum string semantics required to support a regular expression
engine and asked for comments. Boy did he get them. And boy did they run off in
all sorts of strange directions. Interesting directions mind. Anyway, further
down the thread, Dan, Chip Salzenburg and Patrick Michaud seemed to reach
something approximating agreement about the low level semantics required.

TODOs and calls for volunteers

Leo came up with a list of things that need fixing/implementing and asked
for volunteers. These include sorting out what happens with the experimental
ops, implementing new_extended for every PMC class and finishing
up BigInt's MMD and vtable functions.

He also had some proposals for how we should get the Integer classes
properly implemented now we know what the semantics will be.

Meanwhile, in perl6-language

Roles trying to be nice

Abhijit Mahabal had some questions about making roles work. Luke, Patrick,
and Jonathan Scott Duff set about answering them. I'm not entirely sure that
any of the answers so far are enough for Abhijit, but then I'm not entirely
sure that any answer could be enough. At some point you have to take things on
trust and hope that nothing breaks at runtime.

Pipeline performance

Rod Adams brought up some issues with the performance of "pipelining" arrays
in Perl 5 — in general doing say grep {...} map {...} @ary
is rather slower than writing an explicit loop. He wondered if Perl 6 would be
faster. Larry's answer that all lists function lazily if they can in Perl 6
seems to imply that yes, Perl 6 will be faster.

Synopsis 9 draft 1

"Synopsis 9?" I hear you ask "But we haven't seen Apocalypse 9 yet!".
Indeed we haven't, but that's not stopped Larry writing it. Synopsis 9 gives an
overview of Perl 6's data structures (hopefully enough for anyone who happens
to be starting work on a rough cut of a Perl 6 implementation) which will be
covered in more detail when the Apocalypse itself comes out.

The usual storm of discussion and general proofreading goodness went on.

The range operator

Joe Gottman wondered if there would be sufficiently lazy way to generate
ranges. In particular, he wanted to know if he'd be able to write reverse
(1..5) and have Perl handle that lazily, or if he could do 5 .. 1
:by(-1). Larry thought that, if range objects were made sufficiently
smart, there would be no reason why the reverse approach couldn't
be lazy.

Can PRE and POST be removed from program flow?

John Siracusa wondered if it would be possible to turn off the PRE and POST
Design By Contract hooks in production code to improve performance. (Leaving
aside arguments about whether this is sensible; personally I reckon that a
production environment is where you should be most worried about values that
might fail the PRE/POST hooks). Larry reckoned it would be possible to simply
clobber the global definitions of PRE and POST to make them no ops. This wasn't
enough for John, who wanted to be able to get rid of them entirely so even the
call to the no op didn't happen. So Damian showed him the obvious macros...

Announcements, Apologies, Acknowledgements

It looks like the Google groups gateway is working again, so I'll keep with
the Google style linking.

If you find these summaries useful or enjoyable, please consider
contributing to the Perl Foundation to help support the development of Perl.
You might also like to send feedback or contributions to a getting Piers to OSCON 2005 fund.

Simon Cozens

Serendipity -- it means those occasions when things come together to give you
moments of inspiration. While preparing perl.com one week, I was editing an
article on how to give lightning talks by Mark Fowler and at the same time
I was dealing with another author who said he was having difficulty stretching
out an article -- a very good article, on a topic I wanted to see covered -- to
a full 2,500-words-or-so length.

I then realized there were probably a load of people out there with interesting
things to say about what they're doing with Perl, but who couldn't or
didn't want to write a full-sized article. This is, after all, the principle
that makes lightning talks so popular. Maybe we could create a forum where people
could have short, informal articles published on an interesting Perl topic of
their choice -- lightning articles.

In the same vein as lightning talks, they can be both about an interesting
use of the technology, or social commentary on Perl and its community, or just
a bit of fun. If you've got something you want to get off your chest, or you've
got an interesting new idea you want to talk about, but don't think you
could fill 2,500 words, try writing a lightning article. You have an absolute
maximum of 500 words -- as measured by wc -w (or perl -0lne
'print scalar split/\s+/') on your POD or plain text file -- to
say what you need to say.

Send them to chromatic@oreilly.com, and when we've got a good
batch of five or six together, we'll publish them here.

Mark Fowler

How many times have you shipped something, and then as soon as it's gone
live you've spotted a glaring mistake that, despite staring you in the face the entire time you were developing the code, you've somehow overlooked?

One thing we have problems with at work is double-encoded HTML entities in
our web pages. We often use HTML entities to encode letters that aren't ASCII, since this way we then don't have to worry about the text encoding we're
using.

For example, we want to render the string Hello Léon>
into our HTML document. So, instead of including the é into our document
directly we transform it into its entity form, replacing it with &eacute;:

<p>Hello L&eacute;on</p>

This can be done automatically by Perl in numerous ways, e.g. with HTML::Entities:

Or if you're using the Template Toolkit through the html_entity
filter:

<p>Hello [% name | html_entity %]</p>

The root of the troubles we were experiencing was that entity encoding could
occur in our code in multiple, different places depending on where the string
we were about to render came from. And if we accidentally did it more than once
then we ended up with HTML that looked like this:

<p>Hello L&amp;eactue;on</p>

This means the web browser sees the &amp; and converts it
to an ampersand and then renders the rest of the eacute; as normal
text:

Hello L&eacute;on

Not exactly what we wanted. Of course, these things are fairly trivial to fix
once you spot them. The real problem we were having is that these errors kept
repeatedly popping up, and having our testing department coming back to us every
release with another one of these errors was getting embarrassing. We'd
gone blind to the errors -- working so closely with the web site we'd glance
at the page and not notice what should have been staring us in the face.

So we decided to automatically test for the problem.

In the end I decided to write Test::DoubleEncodedEntities, a Test::Builder
module that would test for these errors and run under Test::Harness like
all our other tests. The ok_dee function relies on the fact that none of our web sites would ever use strings like &amp;eacute;
(this is true of most web sites - the only web sites that do are ones like this
that feature articles on how to encode HTML).

Ricardo Signes

Everyone has seen a problem too boring to solve. Rather than keep a stiff upper
lip and trudge forward, you head to the CPAN and find the pre-packaged solution
that probably already exists. It's just another display of your complete
impatience and laziness, and that's great: with the CPAN at your side, you
can solve boring problems effortlessly.

The problem, of course, is that CPAN isn't always at your side. Sure, a
simple install Class::DBI might be enough to implement half of
your project, but when you're offline and stuck on the plane, good luck
getting to your usual mirror. I've found myself in that position a number
of times, and usually when I've most wanted to get some work done. On the
way home from conventions, I've sat in cars and planes, wishing I'd
remembered to install Test::Smoke or DBD::SQLite before
leaving home.

The solution, of course, is to just mirror the whole CPAN. It's only three gigs, and if you've got a week to spare on your dial-up, that's just
fine. After all, later rsyncs are just a few hours a week!

Other problems loom, not the least of which is the possibility of losing those
three gigs when your drive crashes on the road. You can always back up the mirror
to a DVD in case you need to re-mirror it quickly... but by this point the solution
to your problem has become tedious, and I know how you feel about solving tedious
problems.

A better solution to this problem was published a few years ago by Randal Schwartz:
mini-CPAN. Its guiding principle is an old programmer standard: "You aren't
going to need 90 percent of that crap."

Randal's script only retrieves the CPAN indices, and then the modules listed
in the package index -- basically, only the newest version of every distribution
-- the only files ever used by CPAN.pm and CPANPLUS to install modules. On subsequent
runs, only those distributions that have changed are updated.

With this miniature CPAN, you've cut CPAN down to about 400 MB. Not only
does it take a fraction of the time to mirror, but it fits nicely onto a CD.
You can stick it in your bag, right behind your rescue disk, and know that no
matter what happens, the CPAN will be right by your side.

With the script configured and run, you'll have your own personal CPAN
sitting on your machine, ready to be used. Pointing CPAN.pm at it is easy as
can be:

cpan> o conf urllist unshift file:///home/japh/minicpan

Alternately, just edit your CPAN::Config or CPANPLUS::Config.

The only problem left with mini-CPAN is that it was so hard to find. It's
been such a fantastic addition to my toolbox that I feel slighted, having spent
two years oblivious to its existence. To help others avoid this pain, I've
tweaked the script, shoved its guts into a module, and stuck it onto CPAN. Just
by installing CPAN::Mini,
you can have minicpan dropped into place and ready to go:

Aaron Trevena

Why would you use a bit mask in your database ?

They can be useful where you have a many-to-many relationship where one side
changes rarely and the other frequently. A good example is facilities provided by
a Tourist Resort, where the actual amenities change rarely but resorts are added
and updated often.

Normalization would add an intermediate table between them, but that can be
painfully slow if you have a significant number of amenities and frequently
queried records about many of them.

The same problem can occur to a lesser degree within a single table; perhaps
you are doing some statistical analysis on demographics with columns representing
gender, marital status, etc. For more than a few thousand records, querying
subsets of the table based on these criteria can become expensive very quickly.

How would you use bit masks?

Instead of holding information in a separate table or a group of columns, use
a bit mask with each bit representing a column in the table or a record in a
second table. For example, use 8 bits to represent gender, marital status, employment
status, smoker, drinker, driver, car-owner, and house-owner. A query to find
drivers under 25 who don't drink or smoke and own their own car contains
six conditions:

select id, age from people where age < 25 and employment_status =
'employed' and smoker = 'N' and drinker = 'N' and car_owner = 'Y'
and driver = 'Y'

While a bitmap would use two:

select id, age from people where age < 25 where bitmap_col & 00000110

To allow employment status to have values for child, student, unemployed, employed, and retired, you would add extra bits representing each value. This applies
to any column with a low number of potential values.

This is a simple bitmap index, and you can use all kinds of bitwise operations
on your queries against it. Complex queries can be simplified to a couple of
binary operations. Of course, there is a downside. It's harder to maintain,
and if mapping a many-to-many relationship, you need to ensure consistency between
the bit mask order and the contents of the other table. This can be enforced
using a trigger or within the application.

If you split the bitmap into individual columns and rotate it you can make
a compressed bitmap index that only stores ranges of records that are true for
each column or value of a column. Oracle provides this feature.

These compressed bitmap indexes are even faster to query when used together
in combination, and take up very little space. However, as multiple records'
values can be held in a single entry in the index, updates to indexed columns
can be slower and suffer from excessive locking on the index itself.

Well-designed bitmap indices can have a huge impact on the performance of queries,
as they are much smaller than standard b-tree indices and much faster on queries
where a large proportion of that dataset is being queried.

Simon Cozens

Let me first apologize for this personal note; I won't do it again.

Around the time I was busy being born, or maybe just a little after, Larry
Wall was planning to be a missionary with the Wycliffe Bible Translators. Working
on a degree in "Natural and Artificial Languages," and then in linguistics
graduate school, he learned about the ideas of tagmemics, semantics, and the cultural
and anthropological concepts that have found their expression in the Perl language
and culture. Unfortunately due to health problems, Larry couldn't do the
missionary thing, so he invented Perl. That's why we're all here.

It was the beginning of May 2001 when Mark Dominus asked me if I'd be interesting
in taking over the job of managing editor here at perl.com. I was delighted,
and excited about the thought of working with the Perl community and hopefully
producing and publishing some great resources for Perl programmers. I hope I've
done a fair bit of that over the last three years, but now my time is up. I'm
moving on, and starting next week the man simply known as "chromatic" will
become Mr. Perl.com. Please treat him as well as you've treated me!

I need to thank a bunch of people, who've done all the hard work behind
the scenes that you don't hear about: Mark, of course, for getting me involved
here; Chris Coleman and Bruce Stewart from the O'Reilly Network who've
had the curious experience of trying to manage me; Steve McCannell and Chris
Valdez have been the producers of perl.com, and worked incessantly to get articles
up on the site, often on shockingly short notice; Tara McGoldrick and others
have been the copy editors; and of course, I've worked with a wide range
of great authors and contributors. Thank you all. And thanks, of course, to
the Perl community -- that's you -- without whom this wouldn't
be half as much fun.

And about that missionary thing? Well, if Larry's not going to be able
to do it, someone has to. Like many Perl programmers, and indeed Larry himself,
I've been interested in Japan for a very long time. In fact, I lived in
Japan for a year, and was studying Japanese for my university major back when
I started at perl.com; last year I decided that the time was right to prepare
to move back to Japan, as a fulltime missionary.

So in two weeks I'll be going to All Nations University here in England
for a two-year course to get me ready, and then I shall be off! I'm sure
you won't have heard the last of me, though, and I certainly won't be
stopping programming -- missionaries have things they need to automate too...
But for now, farewell! It's been fun, and now it's going to be fun in
a different way.

Mattia Barbon asked for some clarification of how the calling conventions work for unprototyped subroutines in IMCC.
Specifically,
whether IMCC was behaving as documented in the Parrot Design Document (PDD3).
Leo reassured him that the TODO entry that had caused his disquiet was obsolete and that IMCC does the right thing.

Steve Fink asked how to go about returning something from dynamically compiled code.
He,
Leo, and Dan went over the issues.
An unnamed (and as yet unvolunteered) volunteer will probably write it all up and add it to the parrot compiler FAQ.

Stephane Peiry and Leo continued to work on getting NCI callbacks working with the GTK library.
They finally tracked the problem down,
but it looks awfully like Stephane's going to have to re-implement the gtk_main loop to get everything working.
Best of luck Stephane.
ParrotGTK would definitely be a cool thing.

As well as discussing basic types,
Dan opened a discussion of the proper semantics of Parrot's low-level maths (yeah,
he spelt it "math,"
but I'm English and I'll spell it correctly) operators.
In particular he proposed variants that throw exceptions if strange things happen.

Joshua Gatcomb posted some patches he'd made to F<parrotbench.pl> to
make it a little less fragile and more useful. A few days later he
announced a slightly more finished set of patches (Leo had checked his
original patches in rather sooner than expected).

Dan Sugalski looked upon mod_parrot and saw that it was cool, to the point of being laid out on a mortuary slab. He thought that the time had come to try and put some life back into the project, now that Parrot's so much more capable. He asked for volunteers and Jeff Horwitz seems to be (the only?) one of 'em.

In the usual freewheeling perl6-language way, a thread entitled "Return with no Expression" ended up with a discussion of how to disambiguate a ::, which is part of ... ?? ... :: ... expression and :: in its role as module sigil. (According to Luke it's easy; white space is required.)

Aaron Sherman wondered if Perl 6's incarnation of POD would have support for tables. Luke wasn't overly keen on Aaron's proposed syntax, and Larry pointed out that they'd managed to use Perl 5's POD and some clever processing to write tables for the third edition of Programming Perl. Once again, it seems that everyone agrees that POD is great, but could be greater, but nobody can agree on how to make it greater. Me, I think Larry got it right enough.

I am reliably informed that a lot of last week's links were broken. Which is odd because the code to generate them hasn't changed. It looks like the problem's down to the gateway between nntp.perl.org and groups.google.com being slightly radished at present.

However, I'm going to stick with the Google links in this summary in the hopes that normal service will be returned shortly and things will get back-filled. If anyone has a handy tool for getting from a message id to an archive URL that works then I'd be happy to hear from them.

If you find these summaries useful or enjoyable, please consider contributing to the Perl Foundation to help support the development of Perl. You might also like to send feedback or contributions to a "Getting Piers to OSCON 2005" fund to mailto:pdcawley@bofh.org.uk.