Tuesday, November 2, 2010

I am only midway through the migration, but there ain't no stopping this train we're on. I will no longer be publishing complete posts here at Blogger, only links to posts on the new site until I get the last few details in place. At that point, I will lock this blog, and it will no longer be used at all.

Sunday, October 24, 2010

In response to my previous post about my vision for the Product Management team, Parrot user shockwave posted a very long and thoughtful critique. I have a very large backlog of other topics that I want to post about, but I want to take the time to respond directly to shockwave right now. In a nutshell, for anybody who hasn't read his entire set of comments, he is trying to embed a Parrot interpreter for a custom language he's designing into a 3D game engine that he is working on. His comments were very long (worth the read!), so I will be quoting some of the most important parts, with minor corrections.

I started to have issues embedding Parrot as soon as I tried to embed it.

As much as I hate to admit it, this doesn't surprise me. We have very few projects attempting to use Parrot in an embedding situation. In fact, I can only really think of one: this one. Some people will mention mod_parrot. I'm not sure whether that counts as an embedding or an extending type of project, but I definitely don't think it's being actively maintained right now. Whatever, maybe two projects.

It's just a fact of the programming world: Features which are not often used, and aren't thoroughly tested are going to wither. As a community we could be more proactive and show a little bit more foresight to ensure that this interface is capable, powerful, elegant, and maintained, but we don't. Maybe that's the single most important thing that the Product Management team should focus on for now.

First, there's barely any documentation for embedding it. Most of the documentation I found was just the listing of function prototypes.

Damnit! I find this so infuriating. As I mentioned in a previous post about the state of the PDDs, listings of function prototypes do not qualify as acceptable user documentation. A documentation file which only lists function prototypes and maybe a short, abstract blurb about them, is not adequate documentation for users. Also, the kinds of documentation that users need is often far different from the kinds of documentation that developers need. So, there's another important thing that the Product Management team should be working on.

To be clear, Parrot is something designed to be embedded, and there is no documentation on how to do that. That's not good.

This is a sentiment that I really couldn't agree with more. This is a huge part of my personal vision for Parrot. The "real" final product that we ship, the most important part of our binary distribution is not the Parrot executable. That's just a thin argument-processing wrapper around the most important part: libparrot. The Parrot executable is just a small utility that embeds libparrot. At least, that's what it should be. libparrot is the product that we are developing. Everything else--all the command-line tools, the HLL compilers and fakecutables, extension libraries, everything--are just add-ons to libparrot.

Anything that the Parrot executable can do, all the interfaces that it calls from libparrot, should be made available and easily so from other embedding applications. And all of it should be well documented.

the documentation for that function says this:

"void Parrot_load_bytecode(PARROT_INTERP, STRING *path)
Reads and load Parrot bytecode ... . Due to the void return type, the behavior of this function on error is unclear."

Basically, if a file doesn't exist, the program state then becomes undefined. Not good.

"the behavior of this function on error is unclear"? Full stop. This is absolutely, unapologetically stupid and wrong. Close your eyes and imagine me saying a few cursewords before you continue reading the rest of this post because, trust me, I am saying them now.

The job of any application virtual machine, be it the JVM, the .NET CLR, Neko VM, or whatever, is to make these kinds of things clear. Virtual machines provide a consistent abstraction layer over the underlying platform, and provide a standard and reliable runtime for applications that run on it. What part of that definition allows for a key interface function to both allow errors and also to have undefined behavior when an error exists? Either that function needs to be fixed to have defined and consistent behavior on error, or a new function needs to be written that that goal explicitly in mind.

Parrot seems to be built with a command line mentality. Not all the actual users(end-users, not developers) of Parrot will be running the end product from a command line. The video game engine I'm using, for example, runs under Windows, MacOS, XBox 360, and PS3. I'm trying to run it from Windows 7, as a GUI application; the errors are outputed to nowhere. Parrot can't just spit out error messages to the command line and expect that they will always be seen. There's no built-in way to place the errors in buffer, so that I could choose how to print those errors.

Let me paraphrase, using common internet vernacular: Parrot IZ T3H FAILZ. Any Parrot hacker who reads this paragraph should immediately be able to extract a number of TODO items from it. libparrot should always assume that it is being embedded, and act accordingly. Again, the Parrot executable isn't our primary product, libparrot is. Libraries like this shouldn't be just dumping text out to STDERR or any standard handle without allowing some possibility of explicit overriding. If we dumped error text into a buffer, set a flag, and allowed the embedding application handle it appropriately, we would be much better off.

In fact, I suggest that all functions which execute code in Parrot should be modified to return a PMC. That PMC could be an exit code, a status message, an unhandled exception, some kind of callback, or whatever. If the PMC returned is an unhanded exception, the embedding application could chose to handle it and call back in to Parrot, or propagate that error further up the call chain, or whatever. It's not libparrot's job to determine that an unhandled exception causes the application to exit, or to force a dump of the exception text to STDERR. Both of these things are very wrong in many situations.

I want to use Parrot as the runtime, because of business and personal reasons. but I can't base my business and future on hopes and dreams alone. I'd really like to go with Parrot as the runtime for the game engine, but if it can't stand on its own feet, I'm gonna have to shelf it for some, possibly long, time.

And here it is, the heartbreaker in this whole situation. Here we have a person who likes Parrot and wants to use it. He has put in the effort to build a compiler and runtime for his language, but the embedding interface is so shitty shitty shitty that he can't use it and may have to go back and implement all his ideas in a different system. Excuse me while I go say a few more choice curse words.

Parrot's embedding/extending interface has never been any good because nobody has ever taken the time to define exactly what that interface is and what it should be. Everybody has been happy to only use Parrot from the command-line, and to break encapsulation at every step because it was easier and nobody wanted to do any differently.

What we need to do, and do it as immediately as possible, is to realize that there are two distinct pieces of software: libparrot and the parrot executable. We need to realize that the former is our primary product, and can be used in many situations where the later cannot. We also should realize that libparrot needs a complete, comprehensive, elegant, and properly-encapsulating API, which the Parrot executable and other embedding programs should use exclusively. We need to start making some tough decisions about what that API should contain, what it should not contain, and what kind of functionality our users are going to have access to.

We need to write up a proper API, test it, and document it. Anything less is failure and means we are ignoring the very real needs of our users.

Saturday, October 23, 2010

Friday was my last day at IONX, the first job I had out of school. IONX was a pretty cool place to work, not the ordinary kind of first job that some people get doing the "grunt" work that more senior programmers didn't want for themselves. It was a relatively small place, and though it's been growing quickly, I was still able to carve out a place for myself as a productive, influential, and respected programmer.

The time came eventually to start looking for something else. I started putting out resumes in a pretty non-committal way. Maybe one or two a month if I found really compelling job postings. For several months I hardly heard back from anybody. But this month, the rain began to pour. Ultimately I decided to take a position with Weblinc, an e-commerce company based in Philadelphia.

Starting on Monday I'll be taking the train to center-city Philadelphia every day, again. It's going to be a fun new adventure and I'm looking forward to getting started. I'll post more information about what exactly my new job entails when I learn a little bit more about it myself.

Thursday, October 21, 2010

Yesterday I talked about the new Teams concept for the Parrot community. The goal of which is to take some important jobs and assign them to small groups instead of individuals. For a team like the architecture team, this task was previously assigned to only one individual. For other teams like Project Management, Community, Product Management, and QA, these are pretty new and didn't previously belong to any one dedicated individual.

With that in mind, I need to start putting together a comprehensive vision for what the Product Management Team will be, since we haven't ever had a "Project Manager" before to show me the way forward. The description in the teams proposal goes as follows:

This team is responsible for the vision of Parrot as a user-facing product. They also act as an advocate for the needs of Parrot's users (e.g. HLLs and libraries such as Rakudo and Kakapo) and as an intermediary between Parrot and its users.

There are a few different tasks here (and I think a few more are implied), but it all really boils down to a single core concept: Making Parrot "work" for current and prospective users.

Who are Parrot's users? That's really a good first question to ask. John Q Bag-of-donuts, running an instance of the November Wiki at home, isn't really a Parrot user. Sure, his software is running on top of Parrot, but the reality is that he is using November and Rakudo, not Parrot directly. I think that relatively few people would be (or should be) using Parrot directly. As far as most people should ever know or care, Parrot is just another library dependency for some higher-level software that just works.

Parrot users are the developers of HLL compilers, PIR libraries, PMC extensions, and embedding hosts. In many cases, especially historically, the group of Parrot users overlapped pretty heavily with the group of Parrot's developers, so the two groups would often get conflated. We can't do that anymore.

Users need Parrot to do. Users need Parrot to be. Parrot's capabilities, it's external interfaces, it's execution speed and it's memory usage; our users need all these things and need them to be done correctly. Parrot, in turn, needs users. The relationship is cyclic: A better Parrot attracts and retains more satisfied users. Satisfied users, in turn, provide feedback to the development team and make Parrot appear to be a more popular and compelling platform. There's word-of-mouth advertising too that we want to bank on. Of course, all the advertising and good will in the world don't replace rich feature sets and improved performance.

In a nutshell, these are the tasks that I think the Product Management team needs to be focused on:

Communication: We need to stay in touch with the users, bring their concerns to the rest of the Parrot development team, and share their successes as well

Marketing: We need to advertise Parrot. We need to attract new users, and help encourage people to build awesome things on top of our platform.

Interfaces: We need to work to make and keep Parrot's external interfaces sane. Our command-line interface, extending interface, embedding interface, and PIR/PASM/PBC. These things all need to be sane and either stable or in the midst of dramatic improvements. We will elicit feedback from users to find out what features people use and what features people don't use. This will help facilitate feature deprecations and experimental feature promotions in the future. As everybody can imagine, I would love to get my hands around the throat of the deprecation policy and start squeezing.

Advocacy: We need to find out exactly what the users want and advocate for that to the rest of the developer team. Parrot isn't a product developed in a vacuum, we have real users with real needs, and we need to make sure that we meet those needs. If we don't meet the needs of our users, Parrot is nothing for nobody.

I'm looking for interested and dedicated members to join the Product Management team for Parrot. People who are interested in doing the kinds of things I talk about above, or people who have their own ideas for what this team should be responsible for, would make perfect additions. Ideally, I would like to have at least 3-4 people on the team besides myself, and more would be welcomed too.

Wednesday, October 20, 2010

Parrot contributor Christoph Otto (cotto) posted an email to the list a few days ago about a new Teams concept for the Parrot community. This idea is essentially an extension of an idea I had been kicking around in my own head, and I'm very enthusiastic about it.

The idea is to organize Parrot developers into particular teams, and give each team a particular area of focus and "authority" to get things done. We obviously don't want to get too overzealous and focus more on rigid organization and structures, but we do want to make sure that important jobs are being performed and some people get the chance to take personal responsibility for ensuring that things in the project get done.

There are five teams slated so far, though I'm sure the number will change as we see how teams work and we divide up tasks that need to get done.

One important team, for example, is the architectural team, which will be taking over for the position of architect. The idea is that we get more people involved in that role, increase our bus number, and become less reliant on a single person for such an important task. Let's all commend Allison for the great job she has done in this role for the past few years. However, let's also remember that it was a pretty significant drain on her time, and it's a hard job to do for just one person. It looks like Cotto will be the team leader of the architectural team, though I'm already a member of it as well.

I'm going to be the team leader for the new "Product Management" team. I'm still coming to grips myself with what this new team is supposed to be doing, because the proposal is still young and there are many roles and things that need defining. At the basic level, the Product Management team is supposed to interact with users, and form a comprehensive vision of Parrot as a user-facing product. This is very likely going to have implications in several areas of Parrot, including the embedding/extension interface, compilers for PIR/Lorito, tools, libraries, etc. My goal, at least initially, is to make sure that Parrot is serving the needs of our users, and that Parrot becomes a compelling platform for people to work on. You can damn sure expect more blog posts about these things!

I am going to try to solicit new membership to this team as best as I can. It's going to be important that we have some enthusiastic contributors here, especially people who have experience developing compilers, extensions, and other related parrot-dependent projects. We're also going to want to establish and maintain close contact with other developers of these kinds of projects, to ensure that we are meeting their needs and ensure that we are pushing to meet and exceed those needs in future releases.

Other teams include the Project Management team currently headed by Jim Keenan (kid51), the Community team headed by Jonathan Leto (dukeleto), and a Quality Assurance team, which currently has no team lead. All of these teams are going to have to take some time to ensure that they properly define their own roles, and find the best ways to perform them going forward. We also need to attract several developers to join all of these teams, to make sure they all have enough "staff" to get the necessary jobs done.

Signing up for a team is easy. Talk to the team lead and ask about how to get involved. You don't need to be a member of a team in order to be a committer for Parrot. I'm looking for people to join the Product Management Team, and I know other teams are going to be looking for new members as well, so definitely get involved if you are interested in something we are doing.

I'll try to put together another post soon with a more comprehensive vision for what the Product Management team will be and what it will do.

Know your fundamentals (algorithms, data structures, machine architecture, systems) and know several programming languages to the point where you can use them idiomatically.

Know some non-computer field of study well — math, biology, history, optics, whatever. Learn to communicate effectively in speech and in writing. Spend an unreasonable amount of time on some difficult topic to really master it. Try to do something that might make a difference in the world.

How many people do I know that don't know their fundamentals, who barely know one programming language and certainly don't know it idiomatically? The answer, unfortunately, is many.

Here's one more gem from the same article:

I’ll just note that I consider the idea of one language, one programming tool, as the one and only best tool for everyone and for every problem infantile. If someone claims to have the perfect language he is either a fool or a salesman or both.

Monday, October 11, 2010

I've been mostly offline for the past few days. I've been pretty busy and don't have the time to keep up with things the way I would like. There's been a bit of a nexus of things happening all at once:

My father has been in the hospital with a pretty serious illness. This has been eating up most of my time and my concentration

I'm moving to a new job. I'll post more details later

My kid is teething and has a bit of a stomach bug. He's just a little bundle of joy.

My dev laptop is broken and disassembled. I'm waiting for new unbroken hinges to be shipped in from Hong Kong. Apparently they are shipping it by camel.

I'll have more blog posts to write and more code to commit when things calm down a little bit.

Monday, September 27, 2010

I've wanted to get back into this habit for a while, and today is the day to do it. Following a short conversation with Parrot hacker plobsing yesterday, I've decided to tackle PDD03, the "Calling Conventions" design document first. In the coming days I would also like to take a close look at some other PDDs for systems which are receiving current developer attention. Quotes are all from the text of PDD03.

FAQ: Given Parrot's internal use of continuation-passing style ["CPS"], it
would be possible to use one pair of opcodes for both call and return, since
under CPS returns are calls. And perhaps someday we will have only two
opcodes. But for now, certain efficiency hacks are easier with four opcodes.

"Perhaps someday"? Is this an authoritative design document or a dumping ground for assorted wishful thinking? This document should say exactly what Parrot wants in the long run. Do we want two opcodes for simplicity, or do we want four opcodes because of optimization potential? I would suggest that we have two opcodes only, and we can implement optimizing behaviors by having multiple types of Continuation object, with at least one type specifically optimized for simple subroutine returns. We used to have the RetContinuation PMC type, and while it wasn't the right solution for the problem it did have a glimmer of a good idea buried in it.

It's hard to talk about needing to add extra opcodes to facilitate "efficiency hacks", and then saying that for every single parameter pass and retrieval that we need to parse integer values from a constant string. Mercifully, this isn't what we do anymore. In all these cases the first argument is a FixedIntegerArray of flags, which is computed by the assembler and serialized as a constant into the bytecode. At the very least, the design document should be updated to reflect that bit of sanity.

What's interesting here is the fact that these opcodes are variadic. That is, these opcodes (which, I believe, are unique among all 1200+ opcodes in Parrot) take an open-ended list of arguments. This makes traversing bytecode extremely difficult, which in turn makes tools for generating, reading, and analyzing bytecode extremely difficult, and needlessly so. Far superior to this would be to serialize information about the register indices into the same PMC that contains the flags for those parameters.

Right now, we use a special PMC called a CallContext to help facilitate subroutine calls. The CallContext is used to pack up all the arguments to the subroutine, and then it also serves as the dynamic context object for the subroutine. It contains the actual storage for the registers used, and handles unpacking arguments into those registers. It also manages some other things for context, like lexical associations between subs and other details.

In short, the CallContext stores some static data that we usually know about at compile time: argument lists and argument signatures, parameter signatures, etc. It also knows where arguments are coming from, whether they are coming from constants or registers, and which registers exactly are being used. All this information is needlessly calculated at runtime when it could easily be computed at compile time and stored in the constants table of the PBC. If we split CallContext up into a dynamic part (CallContext) and a static part (CallArguments and maybe CallParameters), we could serialize the static part at compile time and avoid a lot of runtime work.

Here's an example call:

set_args callargs # A CallArguments PMC, which can be loaded from PBC as a constant
invokecc obj, meth

Here, again, "callparams" is a PMC containing static information about the parameter signature of the subroutine, and can easily be serialized to bytecode to avoid runtime calculations.

With this kind of a system we have three PMCs which can work together in tandem to implement a calling sequence, and can be easily overridden by HLLs to implement all sorts of custom behaviors. Plus, we get good separation of concerns, which is a typical component of good design. We have CallContext, which serves as the runtime context of the subroutine and acts as storage for registers and lexical associations. CallArguments performs a mapping of callee registers into a call object. Then we have a CallParameters which performs the inverse mapping of call arguments into parameter registers. With these three things anybody could write their own calling conventions and have them be seamlessly integrated into Parrot without too much trouble. At any time you can pack arguments into a CallContext using a CallArguments PMC (which you can created at runtime or serialize to PBC), and then unpack them again using a CallParameters PMC (which can also be created at runtime or compile time). Tailcall optimizations, loop unwinding, recursion avoidance, and all sorts of other optimizations become trivially possible to implement at any level.

For documentation purposes we'll number the bits 0 (low) through 30 (high).
Bit 31 (and higher, where available) will not be used.

Is there a particular reason why bit 31 is off limits? Is this only because INTVAL is a signed quantity, and we don't want to be monkeying with the sign bit (because an optimizing compiler may monkey with it right back)? That would make sense, but is absolutely unexplained here.

The value is a literal constant, not a register. (Don't set this bit
yourself; the assembler will do it.)

This is something that upsets me to no end, and I would be extremely happy to change this particular behavior. Here's the current behavior, in a nutshell: Every single parameter has a flag that determines whether the parameter comes from a register or from the constants table in the bytecode. That means for every single parameter, on every single function call, we need to do a test for constantness and branch to appropriate behavior. For every single parameter, for every single function call. Let that sink in for a minute.

Consider an alternative. We have a new opcode like this:

register = get_constant_p_i index

The "_p_i" suffix indicates that the opcode returns a PMC and takes an integer argument. We could have variants for "_i_i", "_n_i" and "_s_i" too.

It looks like more opcodes total, but this second example would probably execute faster. Why?

The foo() function is called 100 times. In the current system, every single time the function is called, every single time, the PCC system must ask whether the first argument is a constant (it never is) and whether the second argument is a register (it never is). Then, every single time, it needs to load the value of the second argument from the constant table. This is also not to mention the fact that the "if" opcode at the bottom could be loading the value of it's second argument from the constants table if that argument was a string or a PMC. INTVAL arguments are stored directly in the PBC stream, so they don't need to be loaded from the constants table. Luckily, I think serialized PMCs are thawed once when the PBC is loaded and don't need to be re-thawed from the PBC each time that they are loaded.Of course, I could nit-pick and suggest we should only thaw PMCs lazily on-demand and cache them, but that's a detail that probably doesn't matter a huge amount in the long run (unless we start saving a hell of a lot more PMCs to bytecode).

Of course, for code generators we're going to need a way to get the unique integer index for each stored constant, which likely means we're going to need an improved assembler, but it is doable.

If this bit [:flat] is set on a PMC value, then the PMC must be an aggregate. The
contents of the aggregate, rather than the aggregate itself, will be passed.
If the C bit is also set, the aggregate will be used as a hash; its
contents, as key/value pairs, will be passed as named arguments. The PMC
must implement the full hash interface. {{ TT #1288: Limit the required interface. }}

If the C bit is not set, the aggregate will be used as an array; its
contents will be passed as positional arguments.

The meaning of this bit is undefined when applied to integer, number, and
string values.

I understand what this passage is trying to say, but it's still pretty confusing. Plus, I always find it funny when our design documents contain references to tickets (and in this case, a ticket that hasn't drawn a single comment in 10 months from anybody who could make a reasonable decision on the issue). There's probably a bigger discussion to be had about what it means when a PMC declares that it "provides hash". Parrot does have some support for roles, but that support is thin and mostly untested. Plus, nowhere do we define what any of our built-in roles like "hash" and "array" actually mean. That's a different problem for a different blog post, but it is worth mentioning here.

Here's a slightly better description: If you use the ":flat" flag, Parrot is going to create an iterator for that PMC and iterate over all elements in the aggregate, passing each to the subroutine. If the ":named" flag is used, that iteration will be done like hash iteration (values returned from the iterator are keys). Otherwise, the iteration will be normal array-like iteration. If the given PMC does not provide a get_iter VTABLE, an exception will be thrown. There's no sense talking about how the PMC must satisfy any kind of interface, since the only thing we require is that the aggregate is iterable.

It's worth noting that after looking at the code I don't think Parrot follows this design. I don't think the presence of the :named flag with the :flat flag changes the behavior at all. It appears from reading the code that hashes or hash-like PMCs are always iterated as hashes and the contents will always be passed as name/value pairs. Arrays and array-like PMCs are always iterated as arrays and their contents are always passed individually. Where a PMC is both array-like and hash-like at the same time, it's contents are iterated as an array and passed individually. I do not know whether this behavior is acceptable (and the design should be updated) or whether the implementation is lacking and the design is to be followed. I may try to put together some tests for this behavior later to illustrate.

If you're C-savvy, take a look at the "dissect_aggregate_arg" function in src/call/args.c file for the actual implementation.

As the first opcode in a subroutine that will be called with
invokecc or a method that will be called with call_methodcc, use
the get_params opcode to tell Parrot where the subroutine's or
method's arguments should be stored and how they should be expanded.

It's interesting to me that there would be any requirement on this being the first opcode. It seems to me that we should be able to unpack the call object in any place, at any time. That's a small nit, things work reasonably well with this weird restriction in place. I think we can support a wider range of behaviors, though I won't say anything about the cost/benefit ratio of the effort needed to do it.

Similarly, just before (yes, before) calling such a subroutine or
method, use the get_results opcode to tell Parrot where the return
values should be stored and how to expand them for your use.

I don't think this is the case anymore, but I do need to double-check the code that IMCC is currently generating. Obviously in a pure-CPS system we don't want to be going through this nonsense. Either Parrot does the sane thing and we need to update the docs, or Parrot doesn't do the sane thing and we need to update the design. Either way, kill this passage.

If this bit [:slurpy] is set on a P register, then it will be populated with an
aggregate that will contain all of the remaining values that have not already
been stored in other registers.

...

If the named bit is not set, the aggregate will be the HLL-specific array
type and the contents will be all unassigned positional arguments.

Which array type? We have several of them. If we mean ResizablePMCArray, we should say "ResizablePMCArray" so people know how to do the overriding.

An I register with this bit set is set to one if the immediately preceding
optional register received a value; otherwise, it is set to zero. If the
preceding register was not marked optional, the behavior is undefined; but
we promise you won't like it.

Undefined behavior? In my Parrot? Why can't we define what the behavior is? We can promise that the behavior will be bad, but we can't even hint about what that behavior must be? That's pretty generous of us!

We could trivially identify these kinds of issues at PIR compile time, or we could catch these situations at runtime and throw an exception. Undefined behavior is precisely the kind of thing that design documents should be looking to clear up, not institutionalize. I am interested to know whether we have any tests for this, or if we could start writing some.

If this bit is set on a P register that receives a value, Parrot will ensure
that the final value in the P register is read-only (i.e. will not permit
modification). If the received value was a mutable PMC, then Parrot will
create and set the register to a {not yet invented} read-only PMC wrapper
around the original PMC.

Future Notes: Parrot's algorithm for deciding what is writable may be
simplistic. In initial implementations, it may assume that any PMC not of a
known read-only-wrapper type is mutable. Later it may allow the HLL to
provide the test. But we must beware overdesigning this; any HLL with a truly
complex notion of read-only probably needs to do this kind of wrapping itself.

Ah, something that looks pretty smart, though it's clearly listed in the PDD as "XXX - PROPOSED ONLY - XXX", which is not really a good sign. I've never been too happy with Parrot's current mechanism for marking PMCs as read-only anyway. This is a pretty interesting feature, though in current Parrot I'm not sure it could be implemented to any great effect. I may also like to see something like a ":clone" flag that forces a copy to be passed instead of a reference, or a ":cow" flag which produces a copy-on-write reference. Either way, we would probably like some kind of mechanism to specify that a caller will not be playing with data referenced by a passed PMC. This is especially true when you start to consider alternate object metamodels, or complex HLL type mappings: we don't want libraries modifying objects that the don't understand, and creating results that are going to destabilize the HLL. Having a guarantee that mistakes in the callee can't be propagated back through references to the caller would be a nice feature to have. Eventually.

Named values (arguments, or values to return) must be listed textually after
all the positional values. fla and non-flat values may be mixed in any
order.

Is this true? I see no reason in the code why named arguments must be passed after positional arguments. I do see a reason why named parameters must be specified after positional parameters, however. Consistency is good, but I tend to prefer that Parrot not implement unnecessary restrictions. Plus, it should be very possible for an HLL to override the default CallContext and other related PMC types and implement their own behaviors for things like ordering, overflow, and underflow.

That brings me to a point of particular unhappiness with this PDD: It is extremely focused on the behavior of the combination of IMCC, PIR, and built-in data types. It's not hard, with all the Lorito talk flying around, to imagine that in a relatively short time Parrot could be PIR free: System languages like NQP and Winxed could compile directly down to Lorito bytecode, and not involve PIR at all. We could be using HLL mapped types for CallContext and other things to completely change all this behavior. Explaining what the defaults are is certainly important, and suitable for in-depth documentation. Explaining how the defaults work and interoperate with HLL overriding types, and how the system should be extremely dynamic and pluggable is absolutely missing from PDD 03.

Named targets can be filled with either positional or named values.
However, if a named target was already filled by a positional value, and
then a named value is also given, this is an overflow error.

I find this a little bit confusing. Why can a named parameter be filled with a positional argument, but not the other way around? I suggest that a named parameter only takes a named argument, and a positional parameter only takes a positional argument. We should either provide other types (like :lookahead) if we want other behaviors, or we should allow the user to subclass the PMCs that implement this behavior and allow them to put in all the crazy, complicated rules that they want. Parrot defaults should be sane and general. Everything else should be subclassable or overridable.

The details included in this PDD are almost as troubling as some of the details omitted. Information about signature strings, which are used everywhere and are the only way to call a method from an extension or embedding program, are completely omitted. Being able to specify a signature as a string is a central part of the current PCC implementation, so that makes no sense to me. Information about central PMC types, like CallContext is nowhere to be found either, much less information about how to override these types, and the interfaces that they are expected to implement. Making calls from extension or embedding programs using variadic argument lists is completely missing. The differences between method and subroutine invocations, the difference between invoke and invokecc opcodes (And the implications of CPS in general) is missing. MMD is never mentioned. Tailcalls and optimizations related to them is missing. Details about passing arguments and extracting parameters from continuations and exceptions is missing. New and proposed flags like :call_sig and :invocant are not mentioned.

In my previous blog post, I mentioned four common problems that our PDDs suffered from. This document suffers from several. First, it's more descriptive than prescriptive, doing it's best to document what the defaults in Parrot were in 2008. Second, this document is rapidly losing touch with reality as changes the the PCC system are pushing the capabilities of Parrot beyond what the document accounts for. Third, it has an extremely narrow focus on IMCC/PIR, and is vague or is completely silent about any other possibilities, especially those (such as Lorito) that may play a dramatic role in the implementation of this system in the future.

PDD 03 doesn't tell our current users how to effectively use the calling conventions system of Parrot, and does nothing to direct our developers on how to improve it going forward. It really needs to be completely deleted and rewritten from the ground up. When it is, I think we will find some gold, both in terms of exposing virtues of the current implementation and describing plenty of opportunities for drastic improvement.

Kakapo is a development framework for the NQP language. It provides a unit testing library which PLA uses to implement its test suite. A special version of Kakapo is required for PLA release 1. You can use this sequence of commands to get and install that version:

If any of these things interest you, or if you have other cool ideas, please feel free to let me know and get involved!

New Participants Welcome!

Interested in PLA or linear algebra in general? You can fork a copy of PLA and start contributing right now! I'm happy to talk to interested contributors, and to add patches for bugfixes and new features into the core repository. Please let me know if you're interested in helping out.

If you have specific feature requests that you would like to see added to PLA, but aren't able to implement them yourself, please let me know.

Thanks

I want to offer a big thanks to the greater Parrot community for helping in various ways. PLA relies on several tools and libraries, the result of many man-hours of work by Parrot community contributors.

Tuesday, September 21, 2010

Several days ago I wrote a sizable blog entry about the Parrot project, many of the ideas therein were discussed at length with fellow Parrot Foundation board member Jim Keenan when he came down to my area for a face-to-face meeting. Today I'm going to expand on some of those ideas, especially with respect to Parrot's design and roadmap.

Earlier this year I wrote a pretty negative review of PDD23, the exceptions system design document. Actually, that's something I really should repeat with some of the other design documents as well. There are certainly a few that deserve the treatment. Maybe I'll try to do that again soon.

Anyway, there is nothing special about PDD23; it isn't an especially bad example. That's telling, actually. Looking through that directory, I find that the documents generally suffer from four common problems:

Vague or Incomplete. Some PDDs are so incomplete, vague, or filled with holes that they are absolutely unusable for forming the decisions of our developers. PDD25 (concurrency) immediately comes to mind as one that contains very little actual design content. The majority of text in PDD25 consists of descriptions of technology, and reads more like an excerpt for "Threading for Dummies" than a real design document. PDD05, PDD06, PDD10, PDD14, PDD16, and maybe even PDD30 would also get put into this category. Notice that some of these PDDs are labeled as "draft" (and have been for years). I'm not really sure what a "draft" designation means, or how we go about getting them out of draft. I'll expand on that later.

Not Forward-Thinking. There are many cases of PDDs which act as little more than copies of function-level documentation. PDD09 is one that I am eminently familiar with which is in this category, and unfortunately I've written much of the text in that document myself. At the time I rewrote it, I really didn't have much of a concept for what a design document should be, and what it could be. PDD09 describes what the GC was about a year ago, and things have changed significantly since then (and are changing rapidly now). PDDs which are not forward thinking provide absolutely no direction to developers doing new work. PDD08, PDD11, PDD20, PDD24 and PDD27 are good examples of this.

Not In Sync with Reality. Some PDDs do not match the current implementation of things, and never really have (unlike the case above, where the implementation once matched the design, but has surpassed it). We have to ask in these situations why the design does not match the implementation: Is the design a lofty goal which the implementation is approaching by increments? Is the design unsuitable, and practical concerns have dictated a different approach? Did the implementation pre-date the design, and no attempt has ever been made to change it? PDD17 and PDD23 have some of this. PDD18 is also, by virtue of having never really been implemented.

Not Good Design. Some PDDs really just don't represent the kind of good design that Parrot needs in the long term. Think about how often and how loudly people complain about certain topics: The object metamodel (PDD15). The lexicals implementation (PDD20). Parrot's bytecode format (PDD13). Other PDDs may fall into this category too, as the implementation approaches the design and we start to find the flaws.

The first question I suppose we need to ask ourselves as a community is this: What is the purpose of these design documents? The second question we might want to ask is: Do we want to keep these documents at all? A good third question is, depending on the answer to the previous question: How do we want to go about improving the design documents to be what we want and need them to be?

The Parrot design documents can really be one of two things. The first is a form of summary documentation. Basically, the design documents would be a set of documents that distills what Parrot is currently capable of, and how it can be used. In other words, "interface documentation", or "man pages". A variant of this is to use the design docs as an a posteriori way to justify decisions that are made off-the-cuff by developers after they've already made their commits. A second possibility is that the design documents should be the forward-thinking technical goals of the project, the lofty goals that every commit and every release strives to reach, even if we never quite attain it perfectly. I think there is far more value in the later option, and I'll explain why.

First off, we have lots of function-level documentation. We have automated tests which read our source code files and verify that we have documented (at least in a minimal way) every single function. We also have lots of tests, although admittedly tests make for pretty lousy documentation in a general sense. They can be used as a kind of reference, of course, to see how something works, but you often need to know what you're looking for before you go out to find it. We have tons of code examples too. We also have an entire book to teach some things, though it could use some work in it's own right. Our documentation for how to use Parrot is not always great, but we do have it in plenty of other places that we don't need to use the PDDs for that purpose. If documentation is lacking, we should improve the documentation, not subvert the design documents to serve as another layer of it.

Forward-thinking, lofty designs are extremely valuable. Consider the example of a coder who finds a missing feature in Parrot and no design for such a feature available. So she takes it upon herself to come up with a design and implement it. Weeks later she emerges and shows the fruits of her labors to the community with a request to merge her work into trunk. We as a community look it over, find a few problems and, with good cause, we reject it as bad design. Not just something that we can tweak to become suitable, but something that is fundamentally wrong for us. Thanks but no thanks. Smell you later, Alligator. That contributor will probably storm off and will never be heard from again. Also, with good cause.

Now let's look at a similar example, except we as a community have done the work ahead of time of writing out our intended designs for this fancy new feature. We describe exactly what we want, and what any implementation is going to need. Our same intrepid developer follows this design, and when she emerges with her labored fruit, it is much more acceptable. With some feedback and small tweaks, it is approved and merged to trunk.

I don't want to say this is super common, but it isn't unheard of either for people to show up to an open source project, unannounced, with a gigantic patch for a new feature. It also isn't completely unheard of for those gigantic patches to be summarily rejected. The more common case is where we have interested and energetic developers showing up to the Parrot project looking for problems to tackle. Saying "We don't have JIT, we could really use one" is far more daunting than "we don't have a JIT, but we have a design, and a list of prior art that we would like to model on." Following a map to your destination is much easier than having to design your own map first and then try to follow it.

There's an issue of motivation too. A person is much more willing to start working on a project when they have certainty that they are doing the correct thing, and that the software they produce will be usable and desirable. It's much easier to follow a path, even a very bare one, than it is to cut your own. Not to mention that the destination is much more certain.

This year, Parrot had 5 GSoC students assigned to it. Of those five, four of them contacted me personally about specific projects I discussed here on this very blogbefore submitting applications. I don't take any credit for anything, I'm certain Parrot would have had several high-quality applications and projects without me. But I do know that people are more quickly and easily able to latch onto fully-formed ideas more than they can attach to nebulous and vague ideas. Also, people may not even be aware that their interests and skills align with things that Parrot needs until they know exactly what Parrot needs.

If we--as a community-driven open source project--want to increase the size of our developer pool (and I suggest that we should always want that), we need to communicate what we need and help prospective developers align themselves with those needs.

When a new person comes to the Parrot chatroom, or the Parrot mailing list, and says "I would like to get involved in a Parrot development project, what can I do?", we can say something stupid like "Look around and try to determine for yourself what needs to be done", or "We need everything". That's not helpful and not encouraging, even if it's the truth! Instead, we can say "Look, we have a list of projects that we've designed and prepared for, but we haven't been able to implement yet. Want to take a stab at it?" The former usually leads to a confused developer who never comes back. The later can lead to a new active, empowered, permanent member of our development team.

For the sake of this discussion, we'll accept the axiom that more active developers in the project is generally a good thing, and losing existing developers, or raising the barrier to entry so high that new developers do not join is a bad thing. I'll argue the point till I'm blue in the face, if anybody wants to take me up on it.

In this sense, having better designs and plans means a lower barrier to entry for new users since it's easier for them to find a project to work on and they can begin work with more surety that what they are doing will ultimately be desirable and acceptable to the community. It's also a good motivator for existing community members. When I finish a project and have some (rare) spare time on my hands, it's better for me to be able to go right down to the next item on a checklist instead of having to look around blindly trying to find something that needs to be done. Sure, there are ways I could focus my search, but I still have to hope that something obvious appears in my focused search that I can work on.

All that said, I think I can answer the next few questions pretty quickly:

Do we need these documents at all? Yes, I think we do. They can serve as an important tool to guide new and old developers alike. They can help inform and populate specific tasklists on the wiki and elsewhere, and serve as an organizational focus for teams of developers looking to improve specific areas of Parrot. Good design documents can also be used as a tool to initiate a bidirectional communications with our consumers: projects that use Parrot as an integral part of their tool chains. I'll expand on this issue in particular in a later blog post.

How do we want to improve these documents? The time when we can all sit back and wait for a design to magically appear from nowhere is over. Good riddance. We have plenty of people in our project who are not only capable developers, but actually pretty great software engineers and software designers. Beyond that, we have lots of people, both in our core project and throughout the ecosystem, who know all the kinds of things that are going to be necessary for projects built on Parrot to be successful. Getting enough smart minds together to tackle a design challenge should be trivial.

My suggestion is this, though it's only one possible suggestion and I am not going to argue at all about details. For each design, we form a team of dedicated developers who could be considered experts on the particular topic. It would be trivial, for almost any design document we have, to put together a team of 3-4 people with some expertise in that subject. With a team, we could go through a regular checklist to produce a decent design document:

Survey existing research, and find papers and prior art that we like.We could do this as a community before even creating the design team.

Get input from the ecosystem, and maybe a special advisory panel, to get an idea for what kinds of features are required, which features would be nice, and what kinds of things to avoid.

Give the design team some time to prepare and present a draft to the community

We all paint the bikeshed for a few days

We accept the design (assuming we actually accept it), and start developing towards it

If we have Parrot developer design meetings (PDS) every 6 months, that would seem like a great demarcation for this kind of process. At one PDS we identify an area that needs design help and assemble an initial team to pursue it. At the next PDS we check out the findings, maybe approve them, and start the concrete development work. If we need to look things over and push approval off, we can do it at a subsequent weekly #parrotsketch meeting.

In any 6-month stretch, we are working on a set of development priorities which have already been properly designed, and we are preparing designs to work on for the next six month stretch. Our last PDS meeting was April 2010. That means we're due for one coming up in October or November if we want to stick to a 6 month schedule. I definitely think we should try to have a meeting thereabouts so we can re-focus on our current development and current design priorities.

TL;DR: Parrot's PDDs are in a bad state, but they really do serve an important purpose and we need to make sure they get updated. PDDs should not be short summaries of other existing documentation. Instead, they should be forward-looking documents that describe what goals we are trying to reach. We need to get input from developers and also Parrot's consumers and end-users in shaping those PDDs. I'll post more about this later.

Monday, September 20, 2010

I received two quick comments to my blog post from yesterday. One of which was from Parrot developer chromatic, who always makes good points and always deserves a thoughtful reply. We did have a quick back-and-forth in the comments of that post, and he wrote a blog post of his own on the topic. Given all that, I wanted to write a follow up and try to be a little bit more clear about things. First, here's a part of my original post that drew his ire in particular:

Name me one other mature, production-quality application VM or language interpreter which does not support threading. This isn't an empty set, but it's not a particularly large list either. Name me one other application VM that does not currently sport an aggressive, modern JIT.

His first comment reply, in full, is:

I can name lots of VMs which fit your "Woe is Parrot!" criteria (Python, Perl 5, Ruby, PHP). Consider also the first few years of the JVM.

These are great examples and are extremely true, in no small part because I was not nearly specific enough in the point I was trying to make. Mea Culpa.

Before going any further, I want to clearly state my thesis with this series of posts and comments. It consists of three parts:

One, Parrot isn't nearly as competitive now in the dynamic language runtime realm as it could (and maybe should) be. Two, we can increase our rate of improvement with focused objectives and more developer motivation. Three, if we do that I project Parrot to be among the top-tier of dynamic language runtime environments within the next 1-5 years.

If you don't agree with that, stop reading. Everything that I say after this point is in direct support of that statement. Where it might appear that something I say contradicts that, I've probably mistyped something.

If we look at the case of multithreading, three of the examples that chromatic listed above do have some support. Python, Perl5, and Ruby all do support some variety of threading, of some usability level. PHP is the odd man out in this regard, though the argument could easily be made that the webserver niche where PHP primarily lives really doesn't need and doesn't want threading. I won't expand on that topic here, but I will say this: PHP does not support threading, and is definitely a production-quality language interpreter in use by many companies. Point made.

In terms of JIT, I was drawing an unnecessary and mostly fictitious line between programs which have far more similarities than differences, and I did little to clarify what I meant. So, I'll throw away that entire question.

The second part of chromatic's statement is a little bit easier to respond to. The first Java Virtual Machine was released in 1995. That's 15 years ago, and even though that initial release did not stand the test of time it was temporarily considered to be the state of the art. No, Java 1.0 did not support threading as we would expect it now. But then again, in 1995 it would have been much harder to find a multicore processor capable of exploiting the scalability of threaded application. In a single-processor environment, cheap green threads were definitely a competitive and acceptable alternative to true native threading. 15 years later, multicore processors are the norm, and a threading strategy based completely on green threads is hardly acceptable by itself. This is all not to mention a dozen other subsystems in the original JVM that were immature then and would be absolutely laughable now.

Trying to compare Parrot in 2010 to Java in 1995 is both telling and depressing. Sure, it's a victory for Parrot, but not one that we should ever mention again. I'm bigger and stronger than my 9 month old son. That doesn't mean I want to get into a fist fight with him (even though I would totally win).

Let me pose a question though that should provoke a little bit of thought (and, I'm sure, more anger): Consider a world that had no Perl language. Larry Wall got busy and worked on other projects for 25 years, and never released Perl to the world. Then, in a flash of light and unicorns in 2010 he releases, from nowhere, a fully-formed Perl 5.12 interpreter as we know it today. Like Athena springing fully-formed from the head of Zeus. Would you use that Perl today?

Would you use a language like Perl 5.12 if you hadn't already been using it, if your job wasn't using because it had been using it for years, if neither you nor anybody in your company had prior expertise in it and were not demonstrably able to work miracles with it? Would you use Perl 5.12 today if there wasn't a huge preexisting community and if there wasn't the miracle of CPAN? Would you use Perl 5.12 knowing about some of it's idiosyncrasies, the weaknesses of it's default exceptions system, the uninspiring nothingness of it's default object metamodel, or the sheer fact that in 2010 you still can't define function parameters with a meaningful function signature? Is that an investment of time, energy, and maybe money that you would make considering some of the other options available today?

Now, I'm not ragging on Perl. I want to state that very clearly before chromatic buys a plane ticket, travels to Pennsylvania, and punches me in the back of the head. Perl 5 is a fine language and obviously doesn't exist in the kind of vacuum that I contemplate above. There is a huge community of Perl programmers. There are vast amounts of institutional knowledge. There is the entirety of the CPAN. There are modules like Try::Tiny, and Moose, and Method::Signatures which can be used to build some pretty impressive modern (and even "post-modern") things on top of Perl 5's default tabula rasa. On top of all that, Perl is demonstrably stable. Robust. Flexible. Usable. Coders in other languages invent terms like "Rapid Application Development" and "Rapid Prototyping" to describe what the Perl people call "a slow day at the office". People everywhere may debate the aesthetics of sigils and the multitude of operators, but nobody questions the fitness of Perl for use as a programming tool. It's competency and utility are unassailable.

Here's my point: Take a look at the Perl 5 base technology. Take a serious, hard look at it. At the very most, the stand-alone Perl 5 interpreter is flexible, but technologically unimpressive. Nothing that the base Perl interpreter provides is the jaw-dropping, nerd-orgasming state-of-the-art. I could point to a dozen performance benchmarks that pit modern Perl 5 against modern Python, Ruby, Java, and whatever else, and Perl 5 almost always comes in dead last (notice that we're not benchmarking the time it takes to write the code). That's fine. It is in the context of history, community and ecosystem that Perl 5 becomes a strong competitive force in the world of computer programming. We know that a great Perl coder can write more functionality in one line of apparent gibberish than a Java coder can write in a whole page of code. We know that the same great Perl coder can write his solution faster. It is because people have written the tools and modules, that people have identified best practices, that people can do so much in so little time, and because people have taken the time to distill down to the elegant essence of Modern Perl, that we love and use Perl.

The problem I identify in Parrot is a bootstrapping problem. Perl 5 has plenty of reasons to use it besides just performance and a technical feature set. Parrot does not. Parrot needs to provide a massive, overwhelming technological impetus for new developers to use it and to get involved with. Attracting those new developers further accelerates the pace of development, both of the core VM and the ecosystem. All these improved components, in turn, attract more people. It's a bootstrapping problem, and Parrot needs something compelling to get the cycle started.

Make a great product. Attract more minds. Develop a bigger ecosystem. Build a better product. Repeat.

In his blog post, chromatic takes exception to the word "mature" I used in the previous post. I won't use that word any more. In his comments, he also expressed a dislike of the word "enterprise". I won't use that word either. They were probably bad choices.

In his blog post, chromatic says :

His argument is that a focus on threading and a focus on JIT is necessary for enterprises or language communities to consider Parrot a useful platform.
I can see his point, and yet (as usual) I challenge the terms of the debate itself.

Do challenge it. That's not really what I said. I mentioned the two particular cases of threading and JIT as things that I think Parrot is going to need to be competitive in the world of 2010. Perl 5, and Python, and Ruby, and all sorts of other things that chromatic mentions don't have both these things, so the counter-argument appears that none of these are suitable, competitive platforms either. That's not what I said either. Keeping up with the comparison I've been trying to make between Perl 5 and Parrot, here is a summary view of what both bring to the table:

Perl 5: Reasonable, but not blockbuster performance. Huge ecosystem of modules, add-ins, tools, and applications written in Perl. Huge preexisting developer base with large amounts of institutional knowledge. Long and storied history of robustness, stability and fitness. Institutional inertia (people using Perl next year, at least partially because they have been using it this year).

Parrot: Reasonable, but not blockbuster performance. Extremely small (but growing) ecosystem of dependent projects. Extremely small (but growing) preexisting developer base. Very little history of Parrot being stable and robust, considering the huge changes that the project has to make on a regular basis to improve itself.

So if you're a developer, or a manager, a graduate student, or a hobbyist, or anybody else who has a great idea and is looking for a platform on which to implement it, which of these two would you choose? I'll give you a hint: If you reach into your pocket for a coin to flip, or reach to the shelf for a magic 8-ball to help answer the question, you probably need to re-read the choices more carefully.

JIT is a feature that Parrot can use to set itself apart from the pack, not something that's a necessary requirement to join in. JIT is a leg up that Parrot can use to gain some traction against another runtime environmnt like Perl 5, Python 3, Ruby, or PHP 5 which have so many compelling stand-apart features of their own. Stable and scalable threading is another one. And Parrot needs to be fast. When groups like Facebook are talking about compiling PHP code down to C, you know that performance is an issue in the world of dynamic languages. It is foolhardy to think Parrot can succeed (for any definition thereof) without dramatically improved performance over what it offers now.

In the end, this is really a fallacious argument anyway. I'm sure chromatic has pointed that out by now. Parrot isn't a language like Perl 5 is a language, so the two aren't really comparable in a direct way. Parrot doesn't target the same kind of audience that Perl 5 targets. Parrot targets people like the ones who make Perl 5.

The idea of porting Perl 5 to run on top of Parrot was once kicked around in a semi-serious kind of way. I don't remember what number it was exactly, I think it was something like the 5.14 release was supposed to be running happily on top of Parrot. Let's revive that discussion a little bit. What kind of feature set would Parrot need to have to make a compelling argument for the Perl 5 development team to focus their energy whole-hog on porting to Parrot instead of improving Perl 5 in place? What kinds of tools would Parrot need to provide to smooth the way? If I want to see Perl 5.98 be released on Parrot, what do we need to do to make that happen? In answering this, I'm more interested in hearing about the shortcomings of Parrot (which I can work to fix) than the shortcomings of Perl (which I will not).

Rumors have been floating around for over a year now about a complete rewrite of PHP called PHP 6. They want unicode support built-in. We have unicode support built-in. What do we need to do to make a compelling argument in favor of building a new PHP 6 language on top of Parrot? What do the PHP designers and developers need to see to be convinced that Parrot is the way to go, instead of pulling out the old C compiler and starting from scratch?

In 5 years when maybe Python and Ruby people are looking to rewrite their languages, what do we need to have on the table to convince them to use Parrot as a starting point.

These are the important questions. Nothing else really matters. If language designers and compiler developers don't use Parrot and don't want to use Parrot, we've lost.

Parrot needs honest, constructive criticism. It is neither offensive nor overly aggressive to provide it. We need to set aggressive, but realistic goals as a team. There are several planned parts of Parrot that need to be implemented, and several existing parts that need to be re-implemented. Good goals will help to inform those designs and tune those implementations. Eventually, our wildest dreams can become reality.

Sunday, September 19, 2010

Yesterday I met with Jim Keenan, fellow neophyte on the Parrot Foundation board. We got together for an informal get-together yesterday at a Barnes and Nobel, and had a highly productive little chat about Parrot: the foundation, the culture, the community, and the software. Obviously no "official" business happened at this little meetup, but we did get to know each other better and discussed a number of things. In this post, and others in the coming days, I'm going to talk about some of the points that came up in this meeting.

I've said many times, on this blog and elsewhere, that I don't think Parrot is currently a mature platform. It is certainly not suitable for use in a professional, production environment. I mentioned this sentiment to Jim, and he asked for some clarification. What did I mean by that?

Let me illustrate with a few examples.

You're employed as a system designer, and are preparing sketches for your companies next-generation software product. The success or failure of your design will have extreme effects on your current position, and maybe even on your career in the long term. In short, the design needs to be solid, flexible, expandible, robust, and all sorts of other good things. So here's the question: Do you choose Parrot as the basis for your new system, right now? If not, why?

Thought about it? Let me share a few answers with you, in the form of more questions. Name me one other mature, production-quality application VM or language interpreter which does not support threading. This isn't an empty set, but it's not a particularly large list either. Name me one other application VM that does not currently sport an aggressive, modern JIT.

Here's another example. Go to Google Scholar and search for papers and patents involving virtual machines. What percentage of the resulting papers use the Java Hotspot VM as the basis for their research? The .NET CLI? Smalltalk? Now take a closer look and count up how many results are based on Parrot? How many papers even mention Parrot in the footnotes?

You're a graduate student pursuing a PhD in CompSci. You want to spend the next three years of your life researching some new feature on virtual machines, the results of which will have major effects on your career and maybe will even influence whether you graduate or not. Do you choose to implement and study your fancy new feature on Java Hotspot, or on Parrot? Why?

Here's another example from a different direction. You work as the president of a large company which is reasonably sympathetic to the goals of the Parrot project. A Parrot contributor comes forward with a grant proposal to implement an exciting new feature. Do you cut a grant a sizable grant to support that developer through the production of that feature on Parrot, or do you spend your money elsewhere? In other words, do you expect to see a return on your investment, and do you expect that money spent on Parrot as it currently is will be well and efficiently spent? If you were doing your research and read over the long-term Parrot design, and if you were looking at the current state of the Parrot community and the community leadership, would you feel comfortable and confident to invest in it? Why or why not?

Turning that same example around a little bit, pretend that you're me.As a foundation board member, I'm going to take that grant request from the developer, put together all the necessary paperwork and approach a philanthropist with it. What do you think are the odds that I would get laughed out of the room and told never to come back?

Even though it's been 10 years since the start of it, Parrot really is a young project. Our long-term designs and goals are lacking. We have some extremely talented, enthusiastic, and energetic contributors, but we don't always do a great job of organizing and motivating them. There are plenty of areas where we do pretty well, but I can't think of a single aspect of the project or the community that we couldn't tune and improve. How much better do you want Parrot to be?

I want to be very clear about one thing here: I am not being insulting or disparaging about Parrot. It is not an insult to say that Parrot is not ready for enterprise-level production deployment. It is not disparaging to say that Parrot isn't a sure bet to make when careers and livelihoods are on the line. What we do need is honest self-assessment, and to use that as a basis for making long term plans and goals.

Starting from that honest self-assessment, we can start asking the important questions. Where do we go from here?

Hypothetically, we approach the Python Foundation and say "in 5 years, we want Parrot to be at a level where your premier, standard Python interpreter implementation could be implemented on top of it. What would you need to see in order to comfortably make that kind of decision in favor of Parrot?" Ask the same question of the Ruby, PHP, Perl 6, and Perl 5 interpreters. What does Parrot need to do in order to convince the leaders, architects, and developers of these projects that Parrot is a modern, competitive and even a desirable platform on which to build the next generation of their software? How long do people reasonably think it would take for Parrot to get into that condition? 1 year? 3 years? 5 years? There are no wrong answers, but the more honest we are with ourselves, the more certain we can be laying out a comprehensive roadmap to get us there.

In the coming days I'll address some of these issues. The point of this post was to get people thinking, dreaming, and doing both big.

Wednesday, September 15, 2010

Parrot's smolder server, previously hosted by plusthree.com, has been down for some days. Today, due to some concerted effort by particle, dukeleto, and the friendly folks at OSUOSL got a new instance of smolder set up for use by Parrot.

Reports for Parrot proper are already flying in, but what makes this smolder server so special is that we can add support for other projects as well. Half a dozen other projects are also able to tracked on Smolder: PLA, Lua, PLParrot, Partcl, Rakudo, and Plumage. Not all of these projects have the necessary infrastructure to perform the actual uploads yet, but within a few days I'm sure they all will.

Tonight I updated the PLA setup program and test harness to support uploads to smolder, and a few minutes ago I posted the first automated report. We posted a few test reports manually before that too. Everything is looking good so far, though I do have a few tweaks to make. Specifically, I want to include more information about the version of BLAS and LAPACK that are used in the report, which should be easy enough.

Friday, September 10, 2010

I've finally gotten some online documentation for PLA up at Github Pages. The colors are the same standard dark scheme that I use for everything. There are some things in this world that I consider myself very good at, but graphic design and effective use of colors are not in that list. If somebody with a better grasp of the color wheel would like to take a stab at a redesign, I would be very happy to accept patches.

Documentation for PLA had been written in POD, and embedded directly in the PMC source files. This is a decent system, and is what Parrot uses, but I was becoming unhappy with it. The problem I was having was that even though converting POD to HTML is supposed to be a well-understood and oft-solved problem, I couldn't find a converter I liked that produced good-looking output. Also, I was having a hell of a time finding a tool that would give me any kind of flexibility with how the output HTML was generated, formatted, or organized. The output of the standard pod2html is horrendous, and I've found it to be very difficult to style without modifying the generated HTML by hand.

I also am not really too happy with the way POD embeds in source code. It's too abusive of vertical space, and causes files to become completely bloated. Not to mention the fact that I don't think it's really the right solution to the problem. I could have looked into something like Doxygen, but that's only marginally better. Sure, Doxygen uses less vertical space in a file, but it's still embedded documentation that attempts to do more than it should. Documentation for prospective users should be different than documentation for prospective hackers. If it's not different, I can guarantee that one of the two groups is getting the short end of the stick. If you have documentation for hackers (as you would expect to find embedded in the C source code file), the users are going to be stuck sifting through page after page of internals minutia. What the users really want to see is information about the interface and how to use the tools, not details about every function in the C code file.

I thought about writing a POD parser in NQP that would do what I wanted it to do, but by the time I realized how much effort that would take me, I had basically decided against using POD entirely. I do think that a standalone POD parser written in a Parrot-targetting language (NQP comes to mind immediately) would be a good thing, but I am not inclined to make it myself just yet.

I thought about adding separate POD files to form user documentation, separate from the hacker documentation embedded in the source code. However, this didn't sit well with me either. First, as I mentioned above, the generated HTML I could get always looked terrible (and I couldn't find any compelling alternate tool which might generate better-looking pages) and it didn't really give me the flexibility that I wanted to have. It got to the point that it would have taken me more effort to write the tool I needed to get the resulting output that I wanted than it would have taken me just to rewrite the documentation using a different markup language.

Finally I decided to embrace Github Pages wholesale. Github Pages uses the Jekyll preprocessor, which takes input in Textile, Markdown, or HTML. It gives me a lot more flexibility to break documentation up into arbitrary little chunks and keeps pages themed in a consistent way. I decided on Textile, which in my mind is easier to read and write than I ever found POD to be. So I rewrote most of the documentation in Textile with some Jekyll processor magic thrown in, and I'm pretty happy with the result.

So happy in fact that I've considered maybe changing my entire blog format over to using Jekyll and Textile. I'm not ready for that kind of changeover just yet, but it is something I probably want to look into eventually.

Along with some of the fixes I've made to the PLA test harness and test coverage in the last few days, this is basically one of the last requirements I had for putting out a new release of PLA. I now feel like I'm prepared to cut the release shortly after Parrot 2.8.0 comes out.

Tuesday, September 7, 2010

I added in support for HLL mapping of autboxed types in Parrot-Linear-Algebra, and with that I feel like I'm getting pretty close to a good point for cutting a release. I don't yet have any tests for the autoboxing behavior, so I do need to write those first. Shortly thereafter I think I can get to work on the release.

HLL mapping, for readers who may not be familiar with the term, is a very cool feature in Parrot. It allows the user to manually re-map basic types to user-defined types instead of the built-in varieties. When the VM would normally create an Integer PMC, for instance, it would instead create a custom "MyInteger" type (or whatever you called it). You can use the HLL mapping functionality to override many built-in types in many operations. What's super-cool about HLL mapping is that the mapping is defined in a particular HLL namespace, so programs with modules written in different languages could allow each module to define it's own type mappings that do not conflict with each other.

In Parrot, an "HLL namespace" is a special type of top-level namespace object which allows the use of type mappings, among other things. In PIR code, defining an HLL is simple with the .HLL directive:

.HLL "MyLanguage"

All regular namespaces defined below that directive will be inside the HLL namespace. This means that so long as we make proper use of .HLL directives, we can maintain almost perfect encapsulation between modules written in different languages, which can be an extremely valuable thing for proper interoperation.

The default HLL is called "parrot". If you don't specify an HLL directive, you will automatically be inside the "parrot" HLL namespace. To get back to there, you can type (in PIR):

.HLL "parrot"

I started writing some tests on Saturday, and discovered two problems that brought me to a halt: First, NQP doesn't have any built-in way to specify an HLL namespace. I also wasn't able to find any crafty, sneaky way to inject one either. Second, HLL type mapping doesn't work in the parrot namespace.

The second problem turned out to be the most frustrating because the test programs I was writing were silently failing for no visible reason. The mapping method appeared to execute and return correctly, but none of my types were being mapped. Dutifully, I filed a bug about it. It turns out that this is by design, not accident, although I wish that it had been a little bit better documented. HLL mapping lookup operations can be a little bit expensive, so we don't want to be doing that all the time. Also, the parrot HLL is supposed to be a default, neutral space and one module shouldn't be able to break encapsulation and modify that HLL namespace in a way that would adversely affect other modules written in other languages. NotFound put together a commit that throws an exception when a type mapping is attempted in this HLL, so that allays my concerns that not only was it failing, but it was failing silently.

I also started putting together a patch in my fork of NQP-RX that adds HLL support, but the patch isn't very mature yet. If I can get this patch ready and merged into NQP-RX master in time for the 2.8.0 release, I will write up the last remaining PLA tests for this feature in NQP. Otherwise, I will write them in PIR.

The PLA release is going to target Parrot 2.8.0. There are a few things that I want to do first, before the release is out the door:

Finish writing the tests for the HLL mapping behavior, which might involve finishing up that patch to NQP

Write up some decent public documentation. The default output of pod2html is pretty ugly looking, so I may end up writing a custom converter. I've started experimenting with Github Pages, though my experiments so far in using the pod2html output there have not been too attractive. I may go through and reorganize all the POD documentation source anyway. I definitely want to expand documentation and examples of certain features.

Get a release, or pseudo-release of Kakapo that targets Parrot 2.8.0. If Austin isn't able to get a working release of that software that's up to his standards in the next two weeks, I may pick a revision that works well for my purposes and tag it on my Github fork. It won't be the same as a real release of Kakapo, but at least it won't be a stumbling block for me.

I need to check and double-check that the setup script for PLA is doing all the correct things with respect to releasing. I need to check that the generated Plumage metadata is correct and allows complete and functional installations using Plumage. I also need to check that I can generate correct .deb and .rpm packages for those systems.

I want to look into creating a windows installer, but I make absolutely no promises about that. I certainly haven't done any testing whatsoever on Windows so far, and I do not have high hopes that it will work at all there. This may be a task for the next release, or later.

In the span of about two weeks, we could have a release of a high-performance linear algebra toolkit for Parrot. It obviously doesn't have a huge amount of functionality yet, but it is a good start and provides a solid base of some important standard operations. I've got a lot of plans for the future of this little project, but we're at a good point right now and I think it will make for a very nice release.

Monday, September 6, 2010

I've only been a member of the Parrot Foundation board for a few days now, and I already feel exhausted by it. I've read through dozens of pages in IRS Publication 557, I've been pouring over several incomplete drafts of Form 1023. I'm not certain that either of these two documents are written completely in English.

Luckily for me, because of the time I spent in the Wikimedia Chapters Committee, and because of some of the work I did with Wikimedia NYC and the now-defunct Wikimedia Pennsylvania, I'm not unfamiliar with this documentation or the process of gaining tax exempt status for a US non-profit corporation. It has been at least a year or two since I've really looked at this kind of paperwork, however, so I'm taking the crash-course to get back up to speed with it all. Some of it does seem more daunting than I remember!

I've also been digging through invoices and financial records, and lots of other paperwork as well. Some of the information and documentation that I need is readily available. Some of it might not be (or, I may be looking in all the wrong places). Either way, I'm setting a break-neck pace trying to get through it all and form a comprehensive view of the current state of the Parrot Foundation.

My goal is to put together a report about the current legal and financial state of the Foundation, and present that report to the other directors at the end of the month so we can start coming up with a plan of action for the coming year. Shortly thereafter I would love to send a status update to the general foundation membership, so we can make sure everybody is well informed about the state of the foundation and also so we can solicit some input about the future direction of it.

I will definitely post updates, either here or to the members mailing list as I get more information. However, don't expect too much until I am ready with my full report, probably around the end of September. As a member of the board I have a lot of plans for the year, but I don't want to start anything until I make sure the foundation is on a solid footing with all the right documents properly filled out and filed with all the right organizations. It might be a big job, but it's one that I hope to complete by the end of 2010.

Tuesday, August 31, 2010

Today at the weekly Parrotsketch meeting there was also a meeting of members of the Parrot Foundation, with the aim of electing board members for the coming year. All but one of the current board members chose not to run for re-election, so it was a pretty open field.

Four candidates were nominated, and today all of them were elected by simple majority: Jerry Gay (particle), Jim Keenan (kid51), Jonathan Leto (dukeleto), and myself.

Here's a brief discussion of what I want to do this year as a Parrot Foundation board member:

Re-read and re-reread the foundation bylaws. I've read through them a while ago back when the foundation was first forming, but don't think I've looked at them since. Plus, I've read through so many sets of bylaws for so many young organizations in my time, the details all run together in my mind.

Money. The foundation doesn't have a whole ton of money, of course I can't think of too many non-profits of this size which do. There are lots of things that can be done with money: grants and bounties being two that immediately come to mind, but there are plenty of other things. Finding ways to raise money and increase the PaFo coffers should probably be a pretty big priority for us.

Create a proper membership committee. Parroteer "smash" has been running the elections and doing a fantastic job, but some problems have been exposed in the membership process. Many people do not know whether they are members or not, and many people are confused by how a person becomes a member. Clearing up confusion in this area will help everybody.

Recruit new people. People are the fuel that runs an open-source project, and you can never have too many people. Parrot is by no means a small open source project, but it is far from being a big one. More developers create more/better software, which attracts more end users, which increases the prestige of the project, which attracts more developers, which.... It's a feedback loop that we should be trying much harder to feed.

These are just four things that I would like to focus on this term, however this is not a definitive list. I am hoping to get, within the next few days, some information from the current board members about what kinds of tasks they have left unfinished, or what kinds of things they would try to accomplish if given another term. Since there is so much new blood on the committee, we need to make sure to thoroughly pick the brains of the current board so that we can get this coming year off to a running start.

Sunday, August 29, 2010

Friday night I wrote an introduction to a site called Code Anthem, which tests the skills of a programmer and uses those results to help employers find talented individuals. As I said then, I think the idea is a pretty awesome one, and I think it has a lot of potential to help employers with the tricky technical screens that so many companies get wrong.

Today I'm going to talk a little bit about what I think Code Anthem can do differently, and how they can grow the service into something that would have real, industry-wide value.

The tests at Code Anthem tend to focus pretty narrowly on algorithms: Determine whether one array is a subset of another array, Determine if a string is a palindrome, Calculate the perimeter of a polygon, Validate a password. These are all relatively simple things and once you know the necessary algorithm it tends to be a simple matter of translating it into the target language. With a little bit of a brush up for me to re-familiarize myself with the core classes and methods, I could probably do as well in the Java test as I did in the C# test, and I absolutely would not consider my mad Java skillz to be at the same level that my C# abilities are. I was decent with the language in college when I was using it, but skills do fade if you don't use them and I have certainly not been using Java.

A test for problem-solving and basic language syntax tests exactly that: problem solving and basic language syntax. One question involved calculating the Fibonacci sequence, and while my solution was pretty efficient, I could easily have used the classic, naive recursive implementation and thrown performance considerations to the wind. Would this provide me the correct answers to test input? yes. Would this be an acceptable answer? Absolutely not. So, testing that I can solve a problem which has several known solution algorithms doesn't necessarily mean that I will pick a good one, and doesn't necessarily mean that I've solved the problem well.

Looking at this another way, I know that the recursive solution (which is perfectly acceptable input) starts to go hell-crazy above a certain input level. Assuming that Code Anthem isn't evaluating solutions on a supercomputer at the NSA, it's pretty reasonable to assume that they aren't going to be testing the solution with inputs above 30ish, and absolutely no inputs above 45. With this devious knowledge in mind, I don't need an algorithmic solution at all, all I need to do is provide an answer that must work for the set of inputs that they can reasonably use to test:

Again, not an acceptable solution to the problem under any circumstances, but it does work and it does require a little bit more insight than the recursive solution does.

Any programmer who really thinks that the naive recursive solution to the Fibonacci problem is an acceptable one really should fail the test no matter how accurate the results are. It may be technically correct, but it is an example of extremely poor problem solving, lazy coding, and a complete disregard for real-world performance issues. It would be relatively trivial to run tests in a sandbox thread with a timed kill-switch: If the timer goes off before a solution is found, the complexity must have been too high and the result is failure.

I've digressed a little bit: If you're only testing problem-solving and an ability to dump a solution from your brain into the syntax of the target language, you're not testing much. I've known a lot of really lousy programmers who would be able to solve many of these problems, even if the quality of the final solution was severely lacking.

When I taught Embedded C coding in college, I used to hear from students all the time that their grades should have been higher because "it compiles", and somehow that always seems good enough. The reality is that compilation is the milestone that separates a file of source code from a file of unintelligible gibberish. Just compiling means that what you wrote is software, but not that it's correct, robust, maintainable, or even acceptable. To earn these other distinctions, more work above simply getting the syntax right is required. In the Code Anthem test, they're testing a slightly higher level, but not by much. Yes they can show that the code compiles and that it correctly produces the correct results, but they cannot show that the code is "good" nor that the programmer is "competent", or "capable" or "worth hiring".

C# is a pretty rich language, even if you stick with the popular 3.5 version and not the new 4.0. Of course, 4.0 doesn't add a whole hell of a lot to the core language, so we can really ignore it for now. Features like contravariance of generic types are certainly nice, but the new version doesn't have huge market penetration yet so we can't expect most people to know about it. Testing for a basic ability to write functions and loops, and a basic understanding of core types really misses out on many of the important features of C#, and really does nothing to separate the coders who don't know the language well from those that do.

A C# or Java test which does nothing to test knowledge of object-oriented concepts, such as classes and inheritance, really isn't a comprehensive test of those languages. If you can't use inheritance, don't understand the difference between is-a and has-a relationships, and don't know what keywords like "interface", "abstract" and "virtual" do, you really don't know C#.

In C#, if you don't understand delegates, you could get in trouble pretty quickly. Understanding the difference between a method group identifier and a delegate can be a simple and confusing mistake for inexperienced programmers. Not understanding the difference between normal and multicast delegates can lead to some very weird runtime effects. Not understanding anonymous delegates, closures, and dynamic invocation really can really separate out the entry-level C# coders from from the advanced programmers and gurus.

Beginner C# coders will know the basics about syntax: classes, functions, loops, variables, operators. Median C# coders will be using interfaces, inheritance, delegates, exceptions, and built-in generics. Good C# coders will be doing all the previous things correctly and at a higher level, plus mixing and matching all these tools and techniques to solve complex problems.

All of the tests in Code Anthem (at least those that I saw) gave the test-taker a function prototype and some instructions, and asked for the body of the function to be filled in. A different, more comprehensive type of problem would give an interface definition and ask the user to implement an entire class. An extremely comprehensive test would present many types of questions:

Implement a function to have a particular behavior, given a signature and a description.

Implement a class to provide a specific interface, given the interface and a description of behavior.

Given a piece of pre-written code with bugs, identify and fix the bugs.

Multiple choice questions about language features which would be hard to test practically.

You'd probably want to put some kind of time limit on those multiple choice questions, but there are some things that you're not going to really be able to test without them.

There are plenty of ways to run this kind of a test. You could do it like a game show, such as "Who wants to be a Millionaire": People start with a score of zero, and gradually move up the scale as they answer ever harder and harder questions. The really good ones might have to spend some time on the test, so it would be good to break it up into stages that can be tackled at different times.

Another option would be to break up the test into several smaller tests focusing on different issues. Some tests could even be language-agnostic, such as figuring out how to solve particular problems where language-specific syntax or semantics doesn't play much of a role in the solution.

What Code Anthem does now is a basic test to weed the programming bozos from non-bozos. This is certainly a good first step, but really doesn't do anything to differentiate between the coders who are good at what they do from the true creme de la creme of potential applicants (and everybody in between). Weeding out the obvious bozos is still a good service to render, but it really falls short of a potential that a system like this has.