September 1999 Archives

Introduction

One of the more interesting talks at the O'Reilly 1999
Open Source Convention was by Chip Salzenberg, one of the core developers of
Perl. He described his work on Topaz, a new effort to completely re-write
the internals of Perl in C++. The following article is an abridged version
of the transcript of this talk that provide the basic context for Topaz and the
objectives for this new project. You can also listen to the complete
85-minute talk using the RealPlayer.

Topaz is a project to re-implement all of Perl in C++. If it
comes to fruition, if it actually works, it's going to be Perl 6. There
is, of course, the possibility that for various reasons, things may change
and it may not really work out, so that's why I'm not really calling
it Perl 6 at this point. Occasionally I have been known to say,
"It will be fixed in Perl 6," but I'm just speaking through my hat when
I say that.

Who's doing it? Well, it's me mostly for now because when you're starting on something like this, there's really not a lot of room to fit more than one or two people. The core design decisions can't be done in a bazaar fashion (with an "a"), although obviously they can be bizarre (with an
"i").

When? The official start was last year's Perl conference. I
expected to have something, more or less equivalent to Perl 4 by, well,
now. That was a little optimistic.

So how will it be done? Well, it's being done in C++, and there
are some reasons for that, one of which is, of course, I happen to
like C++. Actually the very first discussion/argument on the Perl 6
porter's mailing list was what language to use. We had some runners-up
that actually were under serious consideration.

Choosing A Systems Programming Language

Objective C has some nice characteristics. It's simple and,
with a GNU implementation, it is pretty much available everywhere. The
downside is that Objective C has no equivalent of inline functions, so
you'd have to resort to heavy use of macros again, which is something I'd
like to get away from. Also, it doesn't have any support for namespaces,
which means that the entire mess we currently have would have to be
carried forward: maintaining a separate list of external functions
that need to be renamed by the preprocessor during compilation so that
you don't conflict with somebody else when you embed it in another
program. I really hate that part. Even though it's well done,
it's just one of those things you wish you didn't have to do.

In C++ you solve that problem by saying "namespace Perl open curly brace," and the rest is automatic. So that is the reason why Objective C
fell out of the running.

Eiffel actually was a serious contender for a long time.
That is, until I realized that to get decent performance, Eiffel
compilersor I should say the free Eiffel compiler, because there
are multiple implementationsneeded to do analysis at link-time as
to all the classes that were actually in the program. Eiffel
has no equivalent of declaring member functionsI'm using the C++
terminologydeclaring them to be virtual or nonvirtual. It intuits
this by figuring out the equivalent of the Java characteristic final,
i.e., I have no derived classes, at link-time. And so it says, well,
if there are no derived classes, then therefore I can just inline this
function call. Which is clever and all, but the problem is that Topaz
must be able to load classes dynamically at run time and incorporate
them into the class structure, and so obviously anything that depends
on link-time analysis is right out. So that was the end of Eiffel.

Ada, actually as a language, has much to recommend it. Conciseness
is not one of them, but it does have some good characteristics.
I do secretly tend toward the bondage and discipline style of programming,
i.e., the more you tell the compiler, the more it can help you to enforce
the things you told it. However, the only free implementation of Ada,
at least the only one I'm aware of, GNAT, is written in Ada. This
is an interesting design decision and it obviously helped them.
They obviously like Ada so they use it, right? The problem is that if
Perl 6 were written in Adait would require people to bootstrap GNAT
before they could even get to Perl. That's too much of a burden to put
on anybody.

So, we're left with C++. It's rather like the comment that I believe
Winston Churchill is reported to have said about democracy: It's
absolutely the worst system except for all the others. So, C++ is the
worst language we could have chosen, except for all the others.

So, where will it run? The plan is for it to run anywhere
that there's an ANSI-C++ compiler. Those of you who have seen the
movie the mini-series Shogun might remember when the pilot is suppose
to learn Japanese, and if he doesn't learn it the entire village will
be killed. He can't stand the possibility of all these deaths being on
his head so he's about to commit suicide and finally the shogun says,
"Well, whatever you learn, it will be considered enough," and so then he's
okay with it. Well, that's kind of how I feel about Visual C++. Whatever
Visual C++ implements, we shall call that "enough," because I really don't
think that we can ignore Windows as a target market. If nothing else,
we need the checklist itemworks on Windows. Otherwise the people
who don't understand what's going on will refuse to Perl in situations
where they really need to.

So, you know, unless there's an overriding reason why it's absolutely
impossible, although we will use ANSI features as much as possible
because ANSI C++ really is a well-done description and a well-done
specification for C++ with a few minor things I don't like. Visual C++
is so common we really just can't afford to ignore it.

As for non-Windows platforms, and even for Windows platforms for some
people, EGCS (which actually has now been renamed to GCC 2.95) is a
really amazingly good implementation of the C++ language. The kind of
bugs, the kind of features that they're working on the mailing list,
are so esoteric that actually it takes me two or three times to read
through just the description of the bug before I understand it. The
basic stuff is no problem at all.

The ANSI C++ library for EGCS/GCC is really not all that good at this
point, but work is under way on that. I expected them to be more or less
done by now, but obviously they're not. I still expect them to
be done by the next conference. It's just that the next conference is
now the conference 4.0. By then I hope that we'll be able to use that
library in the Topaz implementation.

Now, the big question:

Why in the world would I do such a thing? Or rather start the ball
rolling? Well the primary reason was difficulty in maintenance. Perl's
guts are, well, complicated. Nat Torkington described them well. I
believe he said that they are "an interconnected mass of livers and
pancreas and lungs and little sharp pointy things and the occasional
exploding kidney." It really is hard to maintain Perl 5. Considering
how many people have had their hands in it; it's not surprising that
this is the situation. And you really need indoctrination in all the
mysteries and magic structures and so onbefore you can really hope
to make significant changes to the Perl core without breaking more things
than you're adding.

Some design decisions have made certain bugs really hard to get rid
of. For example, the fact that things on the stack do not have the
reference counts incremented has made it necessary to fake the reference
counting in some circumstances, � la the mortality concept, for those
of you who have been in there.

Really, when you think about it, the number of people who can do that
sort of deep work because they're willing to or have been forced to put
enough time into understanding it, is very limited, and that's bad for
Perl, I think. It would be better if the barrier to entry to working on
the core were lower. Right now the only thing that's really accessible
to everyone is the surface language, so anytime anybody has the feeling
that they want to contribute to Perl, the only thing they know how to
do is suggest a new feature. I hope in the future they'll be
able to do things like suggest an improvement in the efficiency layer
or something like that.

The secondary reason actually is new features. There are some features
there where people say, "Yeah, I want that just cuz it's cool." First of
all, dynamic loading of basic typesand I'll give an example of that
laterthe basic concept is if you want to invent a new thing like a
B-tree hash, you shouldn't have to modify the Perl core for that.
You should just be able to create an add-on that's dynamically loaded
and inserts itself and then you'd be able to use it.

Robust byte-code compilation is another such feature. Now, in complete
honesty, I don't know. I haven't looked at the existing byte-code
compilation output, but I do know from examining how the internals
work that retrofitting something like that is quite difficult. If you
incorporate it into the structure of the OP-tree (for those of you
who know what that is, the basic operations), there's the concept of
a separation between designing the semantic tree (as in "this is what
I want") versus designing the runtime representation for efficient
execution. Once you've made that separation, now you can also have a
separate implementation of the semantic tree, which is, say, just a list
of byte codes that would be easy to write to a file and then read back
later. So, separation of representing the OP-tree statically versus what
you use dynamically is an important part of that part the internals.

Also, something that could be done currently but nobody's gotten around
to itMicro Perl. Now if you built Perl, you've noticed that there's
a main Perl, and then there's Mini Perl, which you always to expect to
have a little price tag hanging off of, and then there's the concept of
Micro Perl, which is even smaller than Mini Perl. The idea here is: What
parts of Perl can you build without any knowledge that Configure would
give you. Or perhaps, only very, very, very little configure tests. For
example, we could assume ANSI or we could assume pseudo-POSIX. In
any case, even if you limit yourself to ANSI, you've got quite a bit
of stuff. You, of course, have all the basic internal data structures
in the language. You can make a call to system, to spawn children,
and a few other things, and that basically gives you Perl as your
scripting language. Then you can write the configure in Micro
Perl. I don't know about you, but I'd much rather use Micro Perl as a
language for configuration than sh,
because who knows what particular weird variant of sh you're going to have, and really isn't it kind of a pain
to have to spawn an external text program just to see if two strings are
equal? Come on. Okay, so that's also part of the plan. We could do this
with Perl 5, who knows maybe now that I've mentioned it somebody will,
but that's also something I have in mind.

Why not use C? Certainly C does have a lot to recommend
it. The necessity of using all those weird macros for namespace
manipulation, which I'd rather just use the namespace operator for, and
the proliferation of macros are all disadvantages. Stroustrup makes
the persuasive argument that every time you can eliminate a macro and
replace it with an inline function or a const declaration or something
or that sort, you are benefiting yourself because the preprocessor is
so uncontrolled and all of the information from it is lost when you get
to the debugger. So I'd prefer to use C++ for that reason.

Would it be plausible to use Perl, presumably Perl 5 to automatically
generate parts of Perl 6? And the answer is yes, that absolutely
will be done. The equivalent of what is now opcode.pl will still exist,
and it will be generating a whole bunch of C++ classes to implement the
various types of OPs.

A perfect Perl doesn't have systems programming as part of its target
problem domain. That's what C++ and C and those other languages are
for. Those are systems programming languages. Perl is an application
language, and in fact one of the things that I really felt uncomfortable
about Eiffel was that it also is really an applications programming
language. The whole concept of pointers and pointer arithmetic and
memory managementif you read Meyer's new book, the chapter on
memory management begins with "Ideally, we would like to completely
forget about memory management." And I thought to myself, well that's
great if you're doing applications, but for systems programming, that's
nuts. It's an example of what the language is for. When I was trying to
figure out how to be persuasive on this subject, I finally realized that
Perl may be competing with Java in the problem space, but when you're
writing Perl, implementing the Perl runtime, really what you're doing
is something equivalent to writing a JVM. You're writing the equivalent
of a Java Virtual Machine. Now, would you write a JVM in Eiffel? I don't
think so. No, so neither would you write the Perl runtime in Java or in
Eiffel.

How or Why Perl Changes

The language changes only when Larry says so. What he has said
on this subject is that anything that is officially deprecated is
fair game for removal. Beyond that I really need to leave things
as is. He's the language designer. I'm the language implementer, at
least for this particular project. It seems like a good separation of
responsibilities. You know, I've got enough on my plate without trying
to redesign the language.

Larry is open to suggestions, and in fact that was an interesting
discussion we had recently on the Perl 5 Porters mailing list. Was the
syntax appropriate for declaring variables to give appropriate hints to
a hypothetical compiler? That is to say MY INT $X or MY STR $Y --
and I thought that the INT and the STR and the NUM should be suffixes,
something like MY $X:NUMand, in fact, that suffix syntax is something
that Larry officially has blessed, but just not for this purpose. That's
the instinct of the language designer coming to the fore saying that
something that is a string or a number should not be so hard to type. It
should read better.

Meanwhile, if you want to declare something as being a reference to a
class - MY DOG SPOTthat's going to work. You can say that $SPOT
when it has a defined value will be a reference to an object of type
DOG or at least of a type that is compatible with DOG, and the syntax is
already permitted in the Perl parser; it doesn't do very much yet but
that will be more fully implemented in the future as well. Many of the
detailed aspects of this came about not just springing fully formed from
Larry's forehead but as a result of discussion. So yes, he absolutely
is taking suggestions.

Getting into the Internals

Now I'd like to ask how many of you do not know anything about C++? Okay,
a fair number, so I'm going to have to explaineveryone else is
lying. Two kinds of people: people who say that they know C++ and the
truthful. Okay. C++ is complicated, definitely. Actually that reminds me,
I'm doing this in C++ and I use EMACS. Tom Christiansen asked me, "Chip,
is there anything that you like that isn't big and complicated?" C++,
EMACS, Perl, Unix, Englishno, I guess not.

At this point, Chip begins to dive rather deep into a discussion
of the internals. You can listen to the rest of his talk if you
are interested in these details.

Overview

August 21 - 24, 1999

My main reason for attending the Open Source Conference is to observe Open
Source developments and to gather business intelligence for Chevron. I
learned Python. I also concentrated on understanding the business case for
Open Source and understanding and interpreting correctly events in the
industry.

Learning Python and Python for Windows.

Armed with my recent experience of ploughing through the most obfuscated
Perl code, I chose to learn Python, a well-constructed, object-oriented
language. Python was created by Guido von Rossum who named it after his
favorite TV show, Monty Python's Flying Circus.
Python is easy to read and handles Object Oriented Programming in a natural
and easy to learn way. The development time of any project in Python is
fast. It also tends to encourage clarity in human communication as its very
execution depends on use of white space and indentation. The Python
Development Environment (IDLE) also, rather neatly, enables "grep" searches
for strings in any Unix or NT files
I attended an excellent Windows Python tutorial which emphasized using
Python with the array of Windows functions from COM, the register, as a
macro language and as a test-harness for other systems...in addition to its
"normal" function of data processing and as systems "glue".
Much of the tutorial attended to COM processing with Excel and Word
examples, databases, systems administration, C++ and DLLs. Python
Programming on Win32 will be published in November 1999 - see here.
I also asked about Python and LDAP and was able to locate sites at:
www.it.uq.edu.au/~leonard/dc-prj/ldapmodule and
http://sites.inka.de/ms/python/ldap-client.cgi
Python LDAP calls are well constructed and clearer to read than Perl. ...but
more research is needed here.

Keynote Speech"Rules for Revolutionaries"

Guy Kawasaki, a venture capitalist and previously an evangelist for Apple,
gave the keynote "Rules for Revolutionaries". His speech was both funny and
inspirational. He suggested ten things to do to succeed, starting with
examples of change in food preservation and regaled us with stories drawn
from personal experience and the computer industry. Guy gave good advice for
anyone attempting change. I have a video copy of this speech should anyone
like to see it. See also book references, "Crossing the Chasm" by Geoffrey
Moore and "Rules for Revolutionaries" by Guy Kawasaki.
The mechanism in a revolution, he reminded us, was not "a rising tide floats
all boats"...but..."in a tornado even turkeys can fly". He emphasized, it is
our objective as revolutionaries to create the tornado.

People and Information.

I spoke with VA/Linux to clarify their confidence in the small-margin market
of selling hardware with pre-loaded Linux. They had been in operation since
1993 and had a lot of Linux turnkey experience particularly service and
support. Among their talents is Beowulf installation - I thought they might
be of interest to our high-end computing people. Certainly their $899
personal computers are blindingly fast and I consider they have good
prospects when they go public. According to John Vrolyk of SGI (Silicon
Graphics), VA/Linux has a strong business model. SGI has invested a large
amount and is co-developing software.
Armed with knowledge of Unilever's success in managing a Sendmail backbone
and IMAP connectivity with Outlook desktops, I considered it worth asking
about the economics and possibility of running a similar system at Chevron.
The Sendmail people were not able to present any base figures and told me
their Director of Corporate Accounts would contact me. I said there was only
a point in doing that if we could work up some comparative figures with
which to work. Point pending.
I spoke with Derrick Story of O'Reilly and Bob McMillan of Linux Magazine,
both of whom were encouraging in wanting to keep in contact and to consider
any articles I might care to write. I have been published before so it's a
reasonable stretch. I also met with Andrew Leonard of Salon who wants to
include the story of Open Source at Chevron in a new book he's writing on
the movement.

"The State of Python" Address

Guido von Rossum gave the keynote address indicating increased interest in
Python. During August alone over 8,000 Windows versions have been downloaded
and the Python website has had over 63,000 hits.
Guido reviewed the recent successes of Python in Web Development Packages
(ZOPE), Mailman, JPYTHON, Windows (COM and ASP), XML, Open Classroom,
Industrial Light and Magic (Star Wars), Yahoo and Lawrence Livermore Labs.
He then referred to CP4E (Computer Programming for Everyone) and outlined
why he expected Python will take over from Pascal in education. The next
release of Python will be issued in 2000 and Python 2.0 in 2001. DARPA is
supporting further development of the IDLE developer environment.

Linux in Wearable Computer Research

Thad Starner from Georgia Tech is a most friendly, intelligent and
innovative man. He described the status of wearable research and how he
personally uses a wearable for all his computing needs.
He described why Linux is an ideal choice for research and alluded to the
advantages of Linux:

Research needs several tries to do anything good...and since, at the
start you don't know what you are doing (that's research), you need to be
able to make changes quickly.

Market-driven research is a fallacy. Consumers don't know what they
want and even though they may express interest, don't know enough to express
what is reasonable.

It's a real problem when research degenerates into struggling with
the interface problem of a proprietary system.

Commercial packages create balkanization of projects where one
groups find it difficult to talk to one another. Code is "idea" exchange.

Other factors he mentioned were low fixed cost base for embedded devices,
great community, easy porting to other platforms.
Dr Starner then spoke of how he runs his research administration and
announced the Wearable Computer Conference (ISWC) to be held in Oct 18 - 19,
1999 at the Cathedral Hill Hotel in San Francisco. The web site is
http://iswc.gatech.com.
Other subjects he covered were wearable research to support the deaf or
blind...and gambling, particularly the work of Shannon and Thorp who did
research with shoe-based computers in Las Vegas running simulations timing
ball, rotation of wheel etc., giving them a 44% advantage over the house.
Dr Starner gave me names for "blind" research. Collins, who created a
camera/tactile blind navigation system, John Goldwaite from Georgia Tech and
David Ascher, who taught me Python, at a San Francisco sight research
organization - address <mailto:"da@python.net">da@python.net.
Getting back to wearables, Thad described the keyboard, twiddler (a
combination keyboard and mouse enabling sixty wpm input), the retinal
projection system and the now credit card sized processor. Very short-range
wireless and IR communication are used for communication.
He described the non-intrusive collaboration that is enabled and a nice
remembrance agent that works under EMACS...also how he used this to sit for
his Ph.D. This technology has much significance for Chevron in supporting
the disabled in computing and, more mainstream, in refinery and pipeline
"hands-off" computer work.
Current research is tracking fingertips, glasses that attach links to
physical reality, messaging and a form of active badges, baseball cap
mounted sign-language to English translator, circuits sewn in clothing.
The future will include wearables. There are 8 billion computers on the
planet, two percent only of which are desktops. Cray-like power can be
carried and very short-range wireless communications used. Cell phones will
run an OS...lots of other stuff...smart dust, etc.
References:http://www.gvu.gatech.edu/ccg and
http://www.media.mit.edu/wearables.
Andy Barrow has a tape recording of this talk should you wish to hear it.

Making the Business Case to Management for Open Source

Barry Caplin, a manager of USWest, gave a presentation on how to make the
case for open source to management. The detail slides are at
www.users.uswest.net/~bcaplin/talks

Barry spoke of management fears about Open Source, the problems with
proprietary systems, and how to make the case. In particular, that deserves
a summary here. Making the case consists of:

Gather the information

Journal the current situation

Journal the company's current capabilities and skills

Determine company's needs and goals

Identify players and allies

Identify top-tech minds

Get feedback from a sympathetic manager

Identify people you have to convince and target the presentation accordingly

Publish a White Paper

There was considerable debate during question time about the economic
viability of Open Source. Issues were discussed but hardly resolved (see Keynote"Extreme Business" by John Vrolyk of SGI for a more definitive
process).

The White Paper should be a "living document", address itself to the core
purpose, vision and corporate culture, should contain some degree of
"comfort factor" and must contain:

The White Paper must not contain too much opinion or technical depth -
details can be discussed later.

Keynote Speech - "Sun and Open Source"

Bill Joy of Sun described the BSD Unix and vi editor developments and the
difficulties and successes he had experienced before he joined Sun. He
emphasized the strength of copyright in enforcing "good behavior" rather
than contract law implying that the GNU Public License created by Richard
Stallman in 1981 was a particularly good method of ensuring that Open Source
would not fragment.

"Extreme Business" - Is Linux Economically Viable?

This address, given by John Vrolyk, a senior VP of Silicon Graphics, was
very impressive. I have an audio tape should anyone wish to listen to it.
Mr Vrolyk considered the next phase of Linux development would be the
alignment around brands.
He had some interesting things to say:

The OS is a commodity, no end-user really cares what it is.

Microsoft should concentrate on desktop applications.

SGI had released their IRIX file journalling system to Open Source.

SGI, HP and Intel are guaranteeing smooth transfer of Linux to 64bit
chips.

Vrolyk made the example of economic sustainability by using water as an
example. Water is free. But Perrier and Pellegrino seem to do very well.
Case closed.

Regarding business models, he alluded to those of the VA/Linux (turnkey),
Sun (envelop), IBM (just throw money), SGI (hardware and service), O'Reilly
(publishing), Stonehenge (training) or Red Hat (GIL-like distribution and
service) type and said it was unclear which would succeed well, but SGI were
investing in VA/Linux as one with good potential.
He also added that the industry in general is turning from having to go
cap-in-hand for compliance testing for Redmond's proprietary and often
secret standards.
This is a revolution. "Stupid ideas" he said "only last for any time in
large corporations".
I reflected on recent events in Chevron. We have positioned ourselves
reasonably well for the Tsunami about to hit.

The Perl Conference brought us the first "White
Camel" awards. The White Camel Awards were created to honor individuals who
devote remarkable creativity, energy, and time to the non-technical work that
supports Perl's active and loyal user community. The awards were conceived of
and administered by Perl Mongers, a not-for-profit
organization whose mission is to establish Perl user groups. In addition to Perl
Mongers, O'Reilly and
sourceXchange, sponsored
the awards.

I recently had the chance to talk with the winners and ask them a few
questions about the White Camel awards and their contributions to the Perl
community.

Kevin Lenzo

You won the "Perl Community"
White Camel Award for YAPC. Briefly (50 pages or less), what was YAPC and what
went into planning it?

Yet Another Perl Conference is a grassroots Perl conference, and the first
one (YAPC 1999) was hosted at Carnegie Mellon University. YAPC 19100 will stay
in Pittsburgh, incubating here for another year, and then I'd like to find
another host city somewhere in Eastern North Americaprobably somewhere
there's a Perl Monger's stronghold near a university.

The conference was cheap: $60 conference cost, including food, covering the
two-day event. We actually ran a little short on the budget in the end, but this
was pried out of me in the closing session, and the community had $1000 on the
stage in about 30 minutes! You don't see that very often. The next one will
probably cost a wee bit more, just to avoid the shortfall, but the whole thing
is intended to be zero profit.

The aim of YAPC is to be an affordable, regional conference, that people can
get to, and that students, hobbyists and enthusiasts can get to without
selling their computers. We had some great speakers show up and volunteer their
time, alsoI think the speakers did a great job, even when the mechanics of
the conference occasionally broke down.

Next year, we have some interesting challengeswe're not the best-kept
secret anymore. Even with double the capacity, I think we will have to turn
people away, and we'll surely get more talk submissions than we'll be able to
show. One goal I have is to bring some discussion of the internals and
directions of Perl itself more to the next YAPC, and from the response so far,
I'd say we'll be seeing it.

What does receiving the White Camel award mean to you
personally?

It's pretty overwhelmingand I've felt that way ever since the conference.
In a way it's a physical symbol of the goodwill that almost broke me up at the
end of YAPC. It's not something one can scheme to get!

Does the award mean more to you because you were chosen by your
peers?

Quite. I feel like I've been elected as a Tribune by my peers, and it's my
responsibility to carry that trust, and use any power it may grant to battle
evil magistrates.

Has receiving the award renewed your enthusiasm or have you always been
active in the Perl community?

I've had increasing activity in the Perl community over the last five or so
years. I'd like to point out that Internet Relay Chat, newsgroups, and mailing
lists have been really important to me, and I'd say the EFNet IRC channel #perl
really helped catalyze YAPC. If I have any great involvement, that was the
gateway to it.

As far as renewing my enthusiasmit had never waned. We do have some
serious issues ahead of us in carrying Perl to the next level, as a language and
as a platform. Perhaps my enthusiasm is tempered by the looming due diligence.

How do you think the White Camels will affect the Perl
community?

I think the awards certainly help raise awareness, both inside and out, of
the Perl communitywhich appears to be becoming self-aware. This sort of
recongition certainly helps me when I ask major speakers to come and speak at
YAPC. The award has, in some ways, legitimized me with respect to Perl, so that
I can ask about things and speak of it as "for Perl." I think these awards are
strong indicators to the places folks work, for instance, that their involvement
with Perl is justified, and I know Carnegie Mellon University took it quite
seriously. It has both freed me to officially focus, but also has brought
certain responsibilties. The award made me look "presidential," as they say in
election years, and that apparent legitimacy helps me work for Perl.

Do you have any exciting plans for your new found fame and
fortune?

Well, just gearing up for the next YAPC. If you've seen any of the Infobot
work, you know I have an interest in group communication and discussion. Well,
I'd really like to expose the planning process of YAPC itself through
interactive means and web sites that can be group-authored. I'd like to see the
community helping to structure the conference, and give feedback during the
planning stagesto bring groupware to the whole conference planning process.
It is a community event, after all.

I'd like to mention one other thing I'm working on here at CMU, though
we're about to release Sphinx, a major speech recognition engine as open
source, and I'm intending to make a Perl module to go with it, so you can start
talking to your desktops. Speech isn't good for everything, and there are lots
of things it's inappropriate for, but it's nice to say "turn on the lights in
the kitchen." Desktop agents, global communication using the net, and speech
interaction are going to change the way we work and talk, and I'd like to see
speech technology available in Perl.

Adam Turoff

You won the "Perl User Group"
White Camel Award. What were some of the things you
did for the Perl community to earn this award?

Founding member of NY.pm

Founded PHL.pm, with the monthly "social" dinners

Started the monthly tech meeting series for PHL.pm

Started the biweekly perl reading group for PHL.pm(at this point, we
meet 4 weeks/month)

I think it means that the Perl Community is hitting the next level. Two years
ago, the Perl Community wasn't very well formed, except for groups like p5p and
other people interested in extending Perl technically. Today, we have three
categories where people can be rewarded for making non-technical contributions
to the Perl Community.

Taking Perl to the next level means bringing Perl to new audiences and in new
directions. One way to do that is to promote Perl user groups where we are
interested in discussing common CGI idioms, venting about Microsoft, swapping
Twilight Zone stories, talking about Postmodernism or sharing really cool
observations about Perl. All at the same time, of course.

Has receiving the award renewed your enthusiasm or have you always been
active in the Perl community?

I've been active in the Perl Community since TPC 1.0. [TPC 1.0 is The Perl
Conference 1.0]

How do you think the White Camels will affect the Perl
community?

The Perl community is both quite social, very eclectic. I hope that the White
Camels (specifically the user groups and community awards) will help to
highlight and encourage user groups everywhere to continue these traditions.

One of the themes that has been circulating since the conference is that of
the quiet majority of Perl users. We may not be as big or as vocal as the Java
or Win* communities, but we use Perl and get the job done without making lots of
noise. That makes it more difficult to advocate Perl and help it to grow in new
directions. I hope that over the next few years, Perl advocacy grows and becomes
more effective and more visible. I can't wait to see the White Camel awards for
Advocacy over the next few years.

Do you have any exciting plans for your new found fame and
fortune?

Taking phl.pm out to dinner, and buying an exotic computer or two.

These were some of the first people who distinguished themselves enough in the
Perl community to earn the White Camel Awards. These awards were an excellent
idea, the presence of peer awards such as these will undoubtably affect the Perl
community. Any Perl programmer out there would be very proud to display their
own White Camel award.

Introduction

Damian Conway is the author of the newly released Object Oriented Perl,
the first of a new series of Perl books from Manning.

Object-oriented programming in Perl is easy. Forget the heavy theory and
the sesquipedalian jargon: classes in Perl are just regular packages, objects
are just variables, methods are just subroutines. The syntax and semantics
are a little different from regular Perl, but the basic building blocks
are completely familiar.

The one problem most newcomers to object-oriented Perl seem to stumble
over is the notion of references and referents, and how the two combine
to create objects in Perl. So let's look at how references and referents
relate to Perl objects, and see who gets to be blessed and who just gets
to point the finger.

Let's start with a short detour down a dark alley...

References and referents

Sometimes it's important to be able to access a variable indirectly
to be able to use it without specifying its name. There are two obvious
motivations: the variable you want may not have a name (it may be
an anonymous array or hash), or you may only know which variable you want
at run-time (so you don't have a name to offer the compiler).

To handle such cases, Perl provides a special scalar datatype called
a reference. A reference is like the traditional Zen idea of the
"finger pointing at the moon". It's something that identifies a variable,
and allows us to locate it. And that's the stumbling block most people
need to get over: the finger (reference) isn't the moon (variable); it's
merely a means of working out where the moon is.

Making a reference

When you prefix an existing variable or value with the unary \
operator you get a reference to the original variable or value. That original
is then known as the referent to which the reference refers.

For example, if $s is a scalar variable,
then \$s is a reference to that scalar variable
(i.e. a finger pointing at it) and $s is that
finger's referent. Likewise, if @a in an array,
then \@a is a reference to it.

In Perl, a reference to any kind of variable can be stored in another
scalar variable. For example:

Once you have a reference, you can get back to the original thing it
refers toit's referentsimply by prefixing the variable containing
the reference (optionally in curly braces) with the appropriate variable
symbol. Hence to access $s, you could write
$$slr_ref or ${$slr_ref}.
At first glance, that might look like one too many dollar signs, but it
isn't. The $slr_ref tells Perl
which variable has the reference; the extra $
tells Perl to follow that reference and treat the referent as a scalar.

Similarly, you could access the array @a
as @{$arr_ref}, or the hash %h
as %{$hsh_ref}. In each case, the $whatever_ref
is the name of the scalar containing the reference, and the leading @
or % indicates what type of variable the referent
is. That type is important: if you attempt to prefix a reference with the
wrong symbol (for example, @{$slr_ref} or ${$hsh_ref}),
Perl produces a fatal run-time error.

Figure 1: References and their referents

Figure 2: A reference that is its own referent

The "arrow" operator

Accessing the elements of an array or a hash through a reference can be
awkward using the syntax shown above. You end up with a confusing tangle
of dollar signs and brackets:

${$arr_ref}[0] = ${$hsh_ref}{"first"}; # i.e. $a[0] = $h{"first"}

So Perl provides a little extra syntax to make life just a little less
cluttered:

$arr_ref->[0] = $hsh_ref->{"first"}; # i.e. $a[0] = $h{"first"}

The "arrow" operator (->) takes a reference
on its left and either an array index (in square brackets) or a hash key
(in curly braces) on its right. It locates the array or hash that the reference
refers to, and then accesses the appropriate element of it.

Identifying a referent

Because a scalar variable can store a reference to any kind of data, and
because dereferencing a reference with the wrong prefix leads to fatal
errors, it's sometimes important to be able to determine what type of referent
a specific reference refers to. Perl provides a built-in function called
ref that takes a scalar and returns a description
of the kind of reference it contains. Table 1 summarizes the string that
is returned for each type of reference.

If $slr_ref contains...

then ref($slr_ref)
returns...

a scalar value

undef

a reference to a scalar

"SCALAR"

a reference to an array

"ARRAY"

a reference to a hash

"HASH"

a reference to a subroutine

"CODE"

a reference to a filehandle

"IO" or "IO::Handle"

a reference to a typeglob

"GLOB"

a reference to a precompiled pattern

"Regexp"

a reference to another reference

"REF"

Table 1: What ref returns

As Table 1 indicates, you can create references to many kinds of Perl
constructs, apart from variables.

If a reference is used in a context where a string is expected, then
the ref function is called automatically to
produce the expected string, and a unique hexadecimal value (the internal
memory address of the thing being referred to) is appended. That means
that printing out a reference:

print $hsh_ref, "\n";

produces something like:

HASH(0x10027588)

since each element of print's argument list
is stringified before printing.

The ref function has a vital additional
role in object-oriented Perl, where it can be used to identify the class
to which a particular object belongs. More on that in a moment.

References, referents, and objects

References and referents matter because they're both required when you
come to build objects in Perl. In fact, Perl objects are just referents
(i.e. variables or values) that have a special relationship with a particular
package. References come into the picture because Perl objects are always
accessed via a reference, using an extension of the "arrow" notation.

But that doesn't mean that Perl's object-oriented features are difficult
to use (even if you're still unsure of references and referents). To do
real, useful, production-strength, object-oriented programming in Perl
you only need to learn about one extra function, one straightforward piece
of additional syntax, and three very simple rules. Let's start with the
rules...

Rule 1: To create a class, build a package

Perl packages already have a number of class-like features:

They collect related code together;

They distinguish that code from unrelated code;

They provide a separate namespace within the program, which keeps subroutine
names from clashing with those in other packages;

They have a name, which can be used to identify data and subroutines defined
in the package.

In Perl, those features are sufficient to allow a package to act like a
class.

Suppose you wanted to build an application to track faults in a system.
Here's how to declare a class named "Bug" in Perl:

package Bug;

That's it! In Perl, classes are packages. No magic, no extra syntax, just
plain, ordinary packages. Of course, a class like the one declared above
isn't very interesting or useful, since its objects will have no attributes
or behaviour.

That brings us to the second rule...

Rule 2: To create a method, write a subroutine

In object-oriented theory, methods are just subroutines that are associated
with a particular class and exist specifically to operate on objects that
are instances of that class. In Perl, a subroutine that is declared in
a particular package is already associated with that package. So
to write a Perl method, you just write a subroutine within the package
that is acting as your class.

For example, here's how to provide an object method to print Bug objects:

package Bug;

sub print_me
{
# The code needed to print the Bug goes here
}

Again, that's it. The subroutine print_me is
now associated with the package Bug, so whenever Bug is used as a class,
Perl automatically treats Bug::print_me as
a method.

Invoking the Bug::print_me method involves
that one extra piece of syntax mentioned abovean extension to the existing
Perl "arrow" notation. If you have a reference to an object of class Bug,
you can access any method of that object by using a ->
symbol, followed by the name of the method.

For example, if the variable $nextbug holds
a reference to a Bug object, you could call Bug::print_me
on that object by writing:

$nextbug->print_me();

Calling a method through an arrow should be very familiar to any C++ programmers;
for the rest of us, it's at least consistent with other Perl usages:

$hsh_ref->{"key"}; # Access the hash referred to by $hashref
$arr_ref->[$index]; # Access the array referred to by $arrayref
$sub_ref->(@args); # Access the sub referred to by $subref
$obj_ref->method(@args); # Access the object referred to by $objref

The only difference with the last case is that the referent (i.e. the object)
pointed to by $objref has many ways of being
accessed (namely, its various methods). So, when you want to access that
object, you have to specify which particular waywhich methodshould
be used. Hence, the method name after the arrow.

When a method like Bug::print_me is called,
the argument list that it receives begins with the reference through which
it was called, followed by any arguments that were explicitly given to
the method. That means that calling Bug::print_me("logfile")
is not the same as calling $nextbug->print_me("logfile").
In the first case, print_me is treated as a
regular subroutine so the argument list passed to Bug::print_me
is equivalent to:

( "logfile" )

In the second case, print_me is treated as
a method so the argument list is equivalent to:

( $objref, "logfile" )

Having a reference to the object passed as the first parameter is vital,
because it means that the method then has access to the object on which
it's supposed to operate. Hence you'll find that most methods in Perl start
with something equivalent to this:

package Bug;

sub print_me
{
my ($self) = shift;
# The @_ array now stores the arguments passed to &Bug::print_me
# The rest of &print_me uses the data referred to by $self
# and the explicit arguments (still in @_)
}

or, better still:

package Bug;

sub print_me
{
my ($self, @args) = @_;
# The @args array now stores the arguments passed to &Bug::print_me
# The rest of &print_me uses the data referred to by $self
# and the explicit arguments (now in @args)
}

This second version is better because it provides a lexically scoped copy
of the argument list (@args). Remember that
the @_ array is "magical"changing any element
of it actually changes the caller's version of the corresponding
argument. Copying argument values to a lexical array like @args
prevents nasty surprises of this kind, as well as improving the internal
documentation of the subroutine (especially if a more meaningful name than
@args is chosen).

The only remaining question is: how do you create the invoking object
in the first place?

Rule 3: To create an object, bless a referent

Unlike other object-oriented languages, Perl doesn't require that an object
be a special kind of record-like data structure. In fact, you can use any
existing type of Perl variablea scalar, an array, a hash, etc.as
an object in Perl.

Hence, the issue isn't how to create the object, because you
create them exactly like any other Perl variable: declare them with a my,
or generate them anonymously with a [...]
or {...}. The real
problem is how to tell Perl that such an object belongs to a particular
class. That brings us to the one extra built-in Perl function you need
to know about. It's called bless, and its only
job is to mark a variable as belonging to a particular class.

The bless function takes two arguments:
a reference to the variable to be marked, and a string containing the name
of the class. It then sets an internal flag on the variable, indicating
that it now belongs to the class.

For example, suppose that $nextbug actually
stores a reference to an anonymous hash:

And, once again, that's it! The anonymous array referred to by $nextbug
is now marked as being an object of class Bug. Note that the variable $nextbug
itself hasn't been altered in any way; only the nameless hash it refers
to has been marked. In other words, bless sanctified
the referent, not the reference. Figure 3 illustrates where the
new class membership flag is set.

You can check that the blessing succeeded by applying the built-in ref
function to $nextbug. As explained above,
when ref is applied to a reference, it normally
returns the type of that reference. Hence, before $nextbug
was blessed, ref($nextbug) would have returned
the string 'HASH'.

Once an object is blessed, ref returns the
name of its class instead. So after the blessing, ref($nextbug)
will return 'Bug'. Of course the object itself
still is a hash, but now it's a hash that belongs to the
Bug class. The various entries of the hash become the attributes of the
newly created Bug object.

Figure 3: What changes when an object is blessed

Creating a constructor

Given that you're likely to want to create many such Bug objects, it would
be convenient to have a subroutine that took care of all the messy, blessy
details. You could pass it the necessary information, and it would then
wrap it in an anonymous hash, bless the hash, and give you back a reference
to the resulting object.

And, of course, you might as well put such a subroutine in the Bug package
itself, and call it something that indicates its role. Such a subroutine
is known as a constructor, and it generally looks like this:

Note that the middle bits of the subroutine (in bold) look just like the
raw blessing that was handed out to $nextbug
in the previous example.

The bless function is set up to make writing
constructors like this a little easier. Specifically, it returns the reference
that's passed as its first argument (i.e. the reference to whatever referent
it just blessed into object-hood). And since Perl subroutines automatically
return the value of their last evaluated statement, that means that you
could condense the definition of Bug::new to
this:

This version has exactly the same effects: slot the data into an anonymous
hash, bless the hash into the class specified first argument, and return
a reference to the hash.

Regardless of which version you use, now whenever you want to create
a new Bug object, you can just call:

$nextbug = Bug::new("Bug", $id, $type, $description);

That's a little redundant, since you have to type "Bug" twice. Fortunately,
there's another feature of the "arrow" method-call syntax that solves this
problem. If the operand to the left of the arrow is the name of a class
rather than an object referencethen the appropriate method of that
class is called. More importantly, if the arrow notation is used, the first
argument passed to the method is a string containing the class name. That
means that you could rewrite the previous call to Bug::new
like this:

$nextbug = Bug->new($id, $type, $description);

There are other benefits to this notation when your class uses inheritance,
so you should always call constructors and other class methods this way.

Method enacting

Apart from encapsulating the gory details of object creation within the
class itself, using a class method like this to create objects has another
big advantage. If you abide by the convention of only ever creating new
Bug objects by calling Bug::new, you're guaranteed
that all such objects will always be hashes. Of course, there's nothing
to prevent us from "manually" blessing arrays, or scalars as Bug objects,
but it turns out to make life much easier if you stick to blessing
one type of object into each class.

For example, if you can be confident that any Bug object is going to
be a blessed hash, you can (finally!) fill in the missing code in the Bug::print_me
method:

Now, whenever the print_me method is called
via a reference to any hash that's been blessed into the Bug class, the
$self variable extracts the reference that
was passed as the first argument and then the print
statements access the various entries of the blessed hash.

Till death us do part...

Objects sometimes require special attention at the other end of their lifespan
too. Most object-oriented languages provide the ability to specify a subroutine
that is called automatically when an object ceases to exist. Such subroutines
are usually called destructors, and are used to undo any side-effects
caused by the previous existence of an object. That may include:

deallocating related memory (although in Perl that's almost never necessary
since reference counting usually takes care of it for you);

closing file or directory handles stored in the object;

closing pipes to other processes;

closing databases used by the object;

updating class-wide information;

anything else that the object should do before it ceases to exist (such
as logging the fact of its own demise, or storing its data away to provide
persistence, etc.)

In Perl, you can set up a destructor for a class by defining a subroutine
named DESTROY in the class's package. Any such
subroutine is automatically called on an object of that class, just before
that object's memory is reclaimed. Typically, this happens when the last
variable holding a reference to the object goes out of scope, or has another
value assigned to it.

For example, you could provide a destructor for the Bug class like this:

Now, every time an object of class Bug is about to cease to exist, that
object will automatically have its DESTROY
method called, which will print an epitaph for the object. For example,
the following code:

That's because, at the end of each iteration of the while
loop, the lexical variable $bug goes out of
scope, taking with it the only reference to the Bug object created earlier
in the same loop. That object's reference count immediately becomes zero
and, because it was blessed, the corresponding DESTROY
method (i.e. Bug::DESTROY) is automatically
called on the object.

Where to from here?

Of course, these fundamental techniques only scratch the surface of object-oriented
programming in Perl. Simple hash-based classes with methods, constructors,
and destructors may be enough to let you solve real problems in Perl, but
there's a vast array of powerful and labor-saving techniques you can add
to those basic components: autoloaded methods, class methods and class
attributes, inheritance and multiple inheritance, polymorphism, multiple
dispatch, enforced encapsulation, operator overloading, tied objects, genericity,
and persistence.

Perl's standard documentation includes plenty of good materialperlref,
perlreftut, perlobj, perltoot, perltootc, and
perlbot to get you started. But if you're looking for a comprehensive
tutorial on everything you need to know, you may also like to consider
my new book, Object Oriented
Perl, from which this article has been adapted.